From sdinardo at ebi.ac.uk Wed Nov 5 10:15:59 2014 From: sdinardo at ebi.ac.uk (Salvatore Di Nardo) Date: Wed, 05 Nov 2014 10:15:59 +0000 Subject: [gpfsug-discuss] maybe a silly question about "old school" gpfs Message-ID: <5459F8DF.2090806@ebi.ac.uk> Hello again, to understand better GPFS, recently i build up an test gpfs cluster using some old hardware that was going to be retired. THe storage was SAN devices, so instead to use native raids I went for the old school gpfs. the configuration is basically: 3x servers 3x san storages 2x san switches I did no zoning, so all the servers can see all the LUNs, but on nsd creation I gave each LUN a primary, secondary and third server. with the following rule: STORAGE primary secondary tertiary storage1 server1 server2 server3 storage2 server2 server3 server1 storage3 server3 server1 server2 looking at the mmcrnsd, it was my understanding that the primary server is the one that wrote on the NSD unless it fails, then the following server take the ownership of the lun. Now come the question: when i did from server 1 a dd surprisingly i discovered that server1 was writing to all the luns. the other 2 server was doing nothing. this behaviour surprises me because on GSS only the RG owner can write, so one server "ask" the other server to write to his own RG's.In fact on GSS can be seen a lot of ETH traffic and io/s on each server. While i understand that the situation it's different I'm puzzled about the fact that all the servers seems able to write to all the luns. SAN deviced usually should be connected to one server only, as paralled access could create data corruption. In environments where you connect a SAN to multiple servers ( example VMWARE cloud) its softeware task to avoid data overwriting between server ( and data corruption ). Honestly, what i was expecting is: server1 writing on his own luns, and data traffic ( ethernet) to the other 2 server , basically asking *them* to write on the other luns. I dont know if this behaviour its normal or not. I triied to find a documentation about that, but could not find any. Could somebody tell me if this _/"every server write to all the luns"/_ its intended or not? Thanks in advance, Salvatore -------------- next part -------------- An HTML attachment was scrubbed... URL: From viccornell at gmail.com Wed Nov 5 10:22:38 2014 From: viccornell at gmail.com (Vic Cornell) Date: Wed, 5 Nov 2014 10:22:38 +0000 Subject: [gpfsug-discuss] maybe a silly question about "old school" gpfs In-Reply-To: <5459F8DF.2090806@ebi.ac.uk> References: <5459F8DF.2090806@ebi.ac.uk> Message-ID: <3F74C441-C25D-4F19-AD05-04AD897A08D3@gmail.com> Hi Salvatore, If you are doing the IO on the NSD server itself and it can see all of the NSDs it will use its "local? access to write to the LUNS. You need some GPFS clients to see the workload spread across all of the NSD servers. Vic > On 5 Nov 2014, at 10:15, Salvatore Di Nardo wrote: > > Hello again, > to understand better GPFS, recently i build up an test gpfs cluster using some old hardware that was going to be retired. THe storage was SAN devices, so instead to use native raids I went for the old school gpfs. the configuration is basically: > > 3x servers > 3x san storages > 2x san switches > > I did no zoning, so all the servers can see all the LUNs, but on nsd creation I gave each LUN a primary, secondary and third server. with the following rule: > > STORAGE > primary > secondary > tertiary > storage1 > server1 > server2 server3 > storage2 server2 server3 server1 > storage3 server3 server1 server2 > > looking at the mmcrnsd, it was my understanding that the primary server is the one that wrote on the NSD unless it fails, then the following server take the ownership of the lun. > > Now come the question: > when i did from server 1 a dd surprisingly i discovered that server1 was writing to all the luns. the other 2 server was doing nothing. this behaviour surprises me because on GSS only the RG owner can write, so one server "ask" the other server to write to his own RG's.In fact on GSS can be seen a lot of ETH traffic and io/s on each server. While i understand that the situation it's different I'm puzzled about the fact that all the servers seems able to write to all the luns. > > SAN deviced usually should be connected to one server only, as paralled access could create data corruption. In environments where you connect a SAN to multiple servers ( example VMWARE cloud) its softeware task to avoid data overwriting between server ( and data corruption ). > > Honestly, what i was expecting is: server1 writing on his own luns, and data traffic ( ethernet) to the other 2 server , basically asking them to write on the other luns. I dont know if this behaviour its normal or not. I triied to find a documentation about that, but could not find any. > > Could somebody tell me if this "every server write to all the luns" its intended or not? > > Thanks in advance, > Salvatore > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at gpfsug.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From stijn.deweirdt at ugent.be Wed Nov 5 10:25:07 2014 From: stijn.deweirdt at ugent.be (Stijn De Weirdt) Date: Wed, 05 Nov 2014 11:25:07 +0100 Subject: [gpfsug-discuss] maybe a silly question about "old school" gpfs In-Reply-To: <5459F8DF.2090806@ebi.ac.uk> References: <5459F8DF.2090806@ebi.ac.uk> Message-ID: <5459FB03.8080801@ugent.be> yes, this behaviour is normal, and a bit annoying sometimes, but GPFS doesn't really like (or isn't designed) to run stuff on the NSDs directly. the GSS probably send the data to the other NSD to distribute the (possible) compute cost from the raid, where there is none for regular LUN access. (but you also shouldn't be running on the GSS NSDs ;) stijn On 11/05/2014 11:15 AM, Salvatore Di Nardo wrote: > Hello again, > to understand better GPFS, recently i build up an test gpfs cluster > using some old hardware that was going to be retired. THe storage was > SAN devices, so instead to use native raids I went for the old school > gpfs. the configuration is basically: > > 3x servers > 3x san storages > 2x san switches > > I did no zoning, so all the servers can see all the LUNs, but on nsd > creation I gave each LUN a primary, secondary and third server. with the > following rule: > > STORAGE > primary > secondary > tertiary > storage1 > server1 > server2 server3 > storage2 server2 server3 server1 > storage3 server3 server1 server2 > > > > looking at the mmcrnsd, it was my understanding that the primary server > is the one that wrote on the NSD unless it fails, then the following > server take the ownership of the lun. > > Now come the question: > when i did from server 1 a dd surprisingly i discovered that server1 was > writing to all the luns. the other 2 server was doing nothing. this > behaviour surprises me because on GSS only the RG owner can write, so > one server "ask" the other server to write to his own RG's.In fact on > GSS can be seen a lot of ETH traffic and io/s on each server. While i > understand that the situation it's different I'm puzzled about the fact > that all the servers seems able to write to all the luns. > > SAN deviced usually should be connected to one server only, as paralled > access could create data corruption. In environments where you connect a > SAN to multiple servers ( example VMWARE cloud) its softeware task to > avoid data overwriting between server ( and data corruption ). > > Honestly, what i was expecting is: server1 writing on his own luns, and > data traffic ( ethernet) to the other 2 server , basically asking *them* > to write on the other luns. I dont know if this behaviour its normal or > not. I triied to find a documentation about that, but could not find any. > > Could somebody tell me if this _/"every server write to all the luns"/_ > its intended or not? > > Thanks in advance, > Salvatore > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at gpfsug.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > From kgunda at in.ibm.com Wed Nov 5 10:25:07 2014 From: kgunda at in.ibm.com (Kalyan Gunda) Date: Wed, 5 Nov 2014 15:55:07 +0530 Subject: [gpfsug-discuss] maybe a silly question about "old school" gpfs In-Reply-To: <5459F8DF.2090806@ebi.ac.uk> References: <5459F8DF.2090806@ebi.ac.uk> Message-ID: In case of SAN connectivity, all nodes can write to disks. This avoids going over the network to get to disks. Only when local access isn't present either due to connectivity or zoning will it use the defined NSD server. If there is a need to have the node always use a NSD server, you can enforce it via mount option -o usensdserver=always If the first nsd server is down, it will use the next NSD server in the list. In general NSD servers are a priority list of servers rather than a primary/secondary config which is the case when using native raid. Also note that multiple nodes accessing the same disk will not cause corruption as higher level token mgmt in GPFS will take care of data consistency. Regards Kalyan C Gunda STSM, Elastic Storage Development Member of The IBM Academy of Technology EGL D Block, Bangalore From: Salvatore Di Nardo To: gpfsug main discussion list Date: 11/05/2014 03:44 PM Subject: [gpfsug-discuss] maybe a silly question about "old school" gpfs Sent by: gpfsug-discuss-bounces at gpfsug.org Hello again, to understand better GPFS, recently i build up an test gpfs cluster using some old hardware that was going to be retired. THe storage was SAN devices, so instead to use native raids I went for the old school gpfs. the configuration is basically: 3x servers 3x san storages 2x san switches I did no zoning, so all the servers can see all the LUNs, but on nsd creation I gave each LUN a primary, secondary and third server. with the following rule: |-------------------+---------------+--------------------+---------------| |STORAGE |primary |secondary |tertiary | |-------------------+---------------+--------------------+---------------| |storage1 |server1 |server2 |server3 | |-------------------+---------------+--------------------+---------------| |storage2 |server2 |server3 |server1 | |-------------------+---------------+--------------------+---------------| |storage3 |server3 |server1 |server2 | |-------------------+---------------+--------------------+---------------| looking at the mmcrnsd, it was my understanding that the primary server is the one that wrote on the NSD unless it fails, then the following server take the ownership of the lun. Now come the question: when i did from server 1 a dd surprisingly i discovered that server1 was writing to all the luns. the other 2 server was doing nothing. this behaviour surprises me because on GSS only the RG owner can write, so one server "ask" the other server to write to his own RG's.In fact on GSS can be seen a lot of ETH traffic and io/s on each server. While i understand that the situation it's different I'm puzzled about the fact that all the servers seems able to write to all the luns. SAN deviced usually should be connected to one server only, as paralled access could create data corruption. In environments where you connect a SAN to multiple servers ( example VMWARE cloud) its softeware task to avoid data overwriting between server ( and data corruption ). Honestly, what? i was expecting is: server1 writing on his own luns, and data traffic ( ethernet) to the other 2 server , basically asking them to write on the other luns. I dont know if this behaviour its normal or not. I triied to find a documentation about that, but could not find any. Could somebody? tell me if this "every server write to all the luns" its intended or not? Thanks in advance, Salvatore_______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss From sdinardo at ebi.ac.uk Wed Nov 5 10:33:57 2014 From: sdinardo at ebi.ac.uk (Salvatore Di Nardo) Date: Wed, 05 Nov 2014 10:33:57 +0000 Subject: [gpfsug-discuss] maybe a silly question about "old school" gpfs In-Reply-To: <3F74C441-C25D-4F19-AD05-04AD897A08D3@gmail.com> References: <5459F8DF.2090806@ebi.ac.uk> <3F74C441-C25D-4F19-AD05-04AD897A08D3@gmail.com> Message-ID: <5459FD15.3070105@ebi.ac.uk> I understand that my test its a bit particular because the client was also one of the servers. Usually clients don't have direct access to the storages, but still it made think, hot the things are supposed to work. For example i did another test with 3 dd's, one each server. All the servers was writing to all the luns. In other words a lun was accessed in parallel by 3 servers. Its that a problem, or gpfs manage properly the concurrency and avoid data corruption? I'm asking because i was not expecting a server to write to an NSD he doesn't own, even if its locally available. I thought that the general availablity was for failover, not for parallel access. Regards, Salvatore On 05/11/14 10:22, Vic Cornell wrote: > Hi Salvatore, > > If you are doing the IO on the NSD server itself and it can see all of > the NSDs it will use its "local? access to write to the LUNS. > > You need some GPFS clients to see the workload spread across all of > the NSD servers. > > Vic > > > >> On 5 Nov 2014, at 10:15, Salvatore Di Nardo > > wrote: >> >> Hello again, >> to understand better GPFS, recently i build up an test gpfs cluster >> using some old hardware that was going to be retired. THe storage was >> SAN devices, so instead to use native raids I went for the old school >> gpfs. the configuration is basically: >> >> 3x servers >> 3x san storages >> 2x san switches >> >> I did no zoning, so all the servers can see all the LUNs, but on nsd >> creation I gave each LUN a primary, secondary and third server. with >> the following rule: >> >> STORAGE >> primary >> secondary >> tertiary >> storage1 >> server1 >> server2 server3 >> storage2 server2 server3 server1 >> storage3 server3 server1 server2 >> >> >> >> looking at the mmcrnsd, it was my understanding that the primary >> server is the one that wrote on the NSD unless it fails, then the >> following server take the ownership of the lun. >> >> Now come the question: >> when i did from server 1 a dd surprisingly i discovered that server1 >> was writing to all the luns. the other 2 server was doing nothing. >> this behaviour surprises me because on GSS only the RG owner can >> write, so one server "ask" the other server to write to his own >> RG's.In fact on GSS can be seen a lot of ETH traffic and io/s on each >> server. While i understand that the situation it's different I'm >> puzzled about the fact that all the servers seems able to write to >> all the luns. >> >> SAN deviced usually should be connected to one server only, as >> paralled access could create data corruption. In environments where >> you connect a SAN to multiple servers ( example VMWARE cloud) its >> softeware task to avoid data overwriting between server ( and data >> corruption ). >> >> Honestly, what i was expecting is: server1 writing on his own luns, >> and data traffic ( ethernet) to the other 2 server , basically asking >> *them* to write on the other luns. I dont know if this behaviour its >> normal or not. I triied to find a documentation about that, but could >> not find any. >> >> Could somebody tell me if this _/"every server write to all the >> luns"/_ its intended or not? >> >> Thanks in advance, >> Salvatore >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at gpfsug.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at gpfsug.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From jonathan at buzzard.me.uk Wed Nov 5 10:38:48 2014 From: jonathan at buzzard.me.uk (Jonathan Buzzard) Date: Wed, 05 Nov 2014 10:38:48 +0000 Subject: [gpfsug-discuss] maybe a silly question about "old school" gpfs In-Reply-To: <5459F8DF.2090806@ebi.ac.uk> References: <5459F8DF.2090806@ebi.ac.uk> Message-ID: <1415183928.3474.4.camel@buzzard.phy.strath.ac.uk> On Wed, 2014-11-05 at 10:15 +0000, Salvatore Di Nardo wrote: [SNIP] > Now come the question: > when i did from server 1 a dd surprisingly i discovered that server1 > was writing to all the luns. the other 2 server was doing nothing. > this behaviour surprises me because on GSS only the RG owner can > write, so one server "ask" the other server to write to his own > RG's.In fact on GSS can be seen a lot of ETH traffic and io/s on each > server. While i understand that the situation it's different I'm > puzzled about the fact that all the servers seems able to write to all > the luns. The difference is that in GSS the NSD servers are in effect doing software RAID on the disks. Therefore they and they alone can write to the NSD. In the traditional setup the NSD is on a RAID device on SAN controller and multiple machines are able to access the block device at the same time with token management in GPFS preventing corruption. I guess from a technical perspective you could have the GSS software RAID distributed between the NSD servers, but that would be rather more complex software and it is no surprise IBM have gone down the easy route. JAB. -- Jonathan A. Buzzard Email: jonathan (at) buzzard.me.uk Fife, United Kingdom. From viccornell at gmail.com Wed Nov 5 10:42:22 2014 From: viccornell at gmail.com (Vic Cornell) Date: Wed, 5 Nov 2014 10:42:22 +0000 Subject: [gpfsug-discuss] maybe a silly question about "old school" gpfs In-Reply-To: <5459FD15.3070105@ebi.ac.uk> References: <5459F8DF.2090806@ebi.ac.uk> <3F74C441-C25D-4F19-AD05-04AD897A08D3@gmail.com> <5459FD15.3070105@ebi.ac.uk> Message-ID: <75801708-F65D-4B39-82CA-6DC4FB5AA6EB@gmail.com> > On 5 Nov 2014, at 10:33, Salvatore Di Nardo wrote: > > I understand that my test its a bit particular because the client was also one of the servers. > Usually clients don't have direct access to the storages, but still it made think, hot the things are supposed to work. > > For example i did another test with 3 dd's, one each server. All the servers was writing to all the luns. > In other words a lun was accessed in parallel by 3 servers. > > Its that a problem, or gpfs manage properly the concurrency and avoid data corruption? Its not a problem if you use locks. Remember the clients - even the ones running on the NSD servers are talking to the filesystem - not to the LUNS/NSDs directly. It is the NSD processes that talk to the NSDs. So loosely speaking it is as if all of the processes you are running were running on a single system with a local filesystem So yes - gpfs is designed to manage the problems created by having a distributed, shared filesystem, and does a pretty good job IMHO. > I'm asking because i was not expecting a server to write to an NSD he doesn't own, even if its locally available. > I thought that the general availablity was for failover, not for parallel access. Bear in mind that GPFS supports a number of access models, one of which is where all of the systems in the cluster have access to all of the disks. So parallel access is most commonly used for failover, but that is not the limit of its capabilities. Vic > > > Regards, > Salvatore > > > > On 05/11/14 10:22, Vic Cornell wrote: >> Hi Salvatore, >> >> If you are doing the IO on the NSD server itself and it can see all of the NSDs it will use its "local? access to write to the LUNS. >> >> You need some GPFS clients to see the workload spread across all of the NSD servers. >> >> Vic >> >> >> >>> On 5 Nov 2014, at 10:15, Salvatore Di Nardo > wrote: >>> >>> Hello again, >>> to understand better GPFS, recently i build up an test gpfs cluster using some old hardware that was going to be retired. THe storage was SAN devices, so instead to use native raids I went for the old school gpfs. the configuration is basically: >>> >>> 3x servers >>> 3x san storages >>> 2x san switches >>> >>> I did no zoning, so all the servers can see all the LUNs, but on nsd creation I gave each LUN a primary, secondary and third server. with the following rule: >>> >>> STORAGE >>> primary >>> secondary >>> tertiary >>> storage1 >>> server1 >>> server2 server3 >>> storage2 server2 server3 server1 >>> storage3 server3 server1 server2 >>> >>> looking at the mmcrnsd, it was my understanding that the primary server is the one that wrote on the NSD unless it fails, then the following server take the ownership of the lun. >>> >>> Now come the question: >>> when i did from server 1 a dd surprisingly i discovered that server1 was writing to all the luns. the other 2 server was doing nothing. this behaviour surprises me because on GSS only the RG owner can write, so one server "ask" the other server to write to his own RG's.In fact on GSS can be seen a lot of ETH traffic and io/s on each server. While i understand that the situation it's different I'm puzzled about the fact that all the servers seems able to write to all the luns. >>> >>> SAN deviced usually should be connected to one server only, as paralled access could create data corruption. In environments where you connect a SAN to multiple servers ( example VMWARE cloud) its softeware task to avoid data overwriting between server ( and data corruption ). >>> >>> Honestly, what i was expecting is: server1 writing on his own luns, and data traffic ( ethernet) to the other 2 server , basically asking them to write on the other luns. I dont know if this behaviour its normal or not. I triied to find a documentation about that, but could not find any. >>> >>> Could somebody tell me if this "every server write to all the luns" its intended or not? >>> >>> Thanks in advance, >>> Salvatore >>> _______________________________________________ >>> gpfsug-discuss mailing list >>> gpfsug-discuss at gpfsug.org >>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >> >> >> >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at gpfsug.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at gpfsug.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From sdinardo at ebi.ac.uk Wed Nov 5 10:46:52 2014 From: sdinardo at ebi.ac.uk (Salvatore Di Nardo) Date: Wed, 05 Nov 2014 10:46:52 +0000 Subject: [gpfsug-discuss] maybe a silly question about "old school" gpfs In-Reply-To: References: <5459F8DF.2090806@ebi.ac.uk> Message-ID: <545A001C.1040908@ebi.ac.uk> On 05/11/14 10:25, Kalyan Gunda wrote: > Also note that multiple nodes accessing the same disk will not cause > corruption as higher level token mgmt in GPFS will take care of data > consistency. This is exactly what I wanted to be sure. Thanks! From ewahl at osc.edu Wed Nov 5 13:56:38 2014 From: ewahl at osc.edu (Ed Wahl) Date: Wed, 5 Nov 2014 13:56:38 +0000 Subject: [gpfsug-discuss] maybe a silly question about "old school" gpfs In-Reply-To: <545A001C.1040908@ebi.ac.uk> References: <5459F8DF.2090806@ebi.ac.uk> , <545A001C.1040908@ebi.ac.uk> Message-ID: You can designate how many of the nodes do token management as well. mmlscluster should show which are "manager"s. Under some circumstances you may want to increase the defaults on heavily used file systems using mmchnode, especially with few NSDs and many writers. Ed Wahl OSC ________________________________________ From: gpfsug-discuss-bounces at gpfsug.org [gpfsug-discuss-bounces at gpfsug.org] on behalf of Salvatore Di Nardo [sdinardo at ebi.ac.uk] Sent: Wednesday, November 05, 2014 5:46 AM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] maybe a silly question about "old school" gpfs On 05/11/14 10:25, Kalyan Gunda wrote: > Also note that multiple nodes accessing the same disk will not cause > corruption as higher level token mgmt in GPFS will take care of data > consistency. This is exactly what I wanted to be sure. Thanks! _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss From pavel.pokorny at datera.cz Fri Nov 7 11:15:34 2014 From: pavel.pokorny at datera.cz (Pavel Pokorny) Date: Fri, 7 Nov 2014 12:15:34 +0100 Subject: [gpfsug-discuss] GPFS - pagepool data protection? Message-ID: Hello to all, I would like to ask question about pagepool and protection of data written through pagepool. Is there a possibility of loosing data written to GPFS in situation that data are stored in pagepool but still not written to disks? I think that for regular file system work this can be solved using GPFS journal. What about using GPFS as a NFS store for VMware datastores? Thank you for your answers, Pavel -- Ing. Pavel Pokorn? DATERA s.r.o. | Ovocn? trh 580/2 | Praha | Czech Republic www.datera.cz | Mobil: +420 602 357 194 | E-mail: pavel.pokorny at datera.cz -------------- next part -------------- An HTML attachment was scrubbed... URL: From lhorrocks-barlow at ocf.co.uk Wed Nov 5 10:47:06 2014 From: lhorrocks-barlow at ocf.co.uk (Laurence Horrocks- Barlow) Date: Wed, 5 Nov 2014 10:47:06 +0000 Subject: [gpfsug-discuss] maybe a silly question about "old school" gpfs In-Reply-To: <5459FD15.3070105@ebi.ac.uk> References: <5459F8DF.2090806@ebi.ac.uk> <3F74C441-C25D-4F19-AD05-04AD897A08D3@gmail.com> <5459FD15.3070105@ebi.ac.uk> Message-ID: <545A002A.4080301@ocf.co.uk> Hi Salvatore, GSS and GPFS systems are different beasts. In a traditional GPFS configuration I would expect any NSD server to write to any/all LUN's that it can see as a local disk providing it's part of the same FS. In GSS there is effectively a software RAID level added on top of the disks, with this I would expect only the RG owner to write down to the vdisk. As for corruption, GPFS uses a token system to manage access to LUN's, Metadata, etc. Kind Regards, Laurence Horrocks-Barlow Linux Systems Software Engineer OCF plc Tel: +44 (0)114 257 2200 Fax: +44 (0)114 257 0022 Web: www.ocf.co.uk Blog: blog.ocf.co.uk Twitter: @ocfplc OCF plc is a company registered in England and Wales. Registered number 4132533, VAT number GB 780 6803 14. Registered office address: OCF plc, 5 Rotunda Business Centre, Thorncliffe Park, Chapeltown, Sheffield, S35 2PG. This message is private and confidential. If you have received this message in error, please notify us and remove it from your system. On 11/05/2014 10:33 AM, Salvatore Di Nardo wrote: > I understand that my test its a bit particular because the client was > also one of the servers. > Usually clients don't have direct access to the storages, but still it > made think, hot the things are supposed to work. > > For example i did another test with 3 dd's, one each server. All the > servers was writing to all the luns. > In other words a lun was accessed in parallel by 3 servers. > > Its that a problem, or gpfs manage properly the concurrency and avoid > data corruption? > I'm asking because i was not expecting a server to write to an NSD he > doesn't own, even if its locally available. > I thought that the general availablity was for failover, not for > parallel access. > > > Regards, > Salvatore > > > > On 05/11/14 10:22, Vic Cornell wrote: >> Hi Salvatore, >> >> If you are doing the IO on the NSD server itself and it can see all >> of the NSDs it will use its "local? access to write to the LUNS. >> >> You need some GPFS clients to see the workload spread across all of >> the NSD servers. >> >> Vic >> >> >> >>> On 5 Nov 2014, at 10:15, Salvatore Di Nardo >> > wrote: >>> >>> Hello again, >>> to understand better GPFS, recently i build up an test gpfs cluster >>> using some old hardware that was going to be retired. THe storage >>> was SAN devices, so instead to use native raids I went for the old >>> school gpfs. the configuration is basically: >>> >>> 3x servers >>> 3x san storages >>> 2x san switches >>> >>> I did no zoning, so all the servers can see all the LUNs, but on nsd >>> creation I gave each LUN a primary, secondary and third server. with >>> the following rule: >>> >>> STORAGE >>> primary >>> secondary >>> tertiary >>> storage1 >>> server1 >>> server2 server3 >>> storage2 server2 server3 server1 >>> storage3 server3 server1 server2 >>> >>> >>> >>> looking at the mmcrnsd, it was my understanding that the primary >>> server is the one that wrote on the NSD unless it fails, then the >>> following server take the ownership of the lun. >>> >>> Now come the question: >>> when i did from server 1 a dd surprisingly i discovered that server1 >>> was writing to all the luns. the other 2 server was doing nothing. >>> this behaviour surprises me because on GSS only the RG owner can >>> write, so one server "ask" the other server to write to his own >>> RG's.In fact on GSS can be seen a lot of ETH traffic and io/s on >>> each server. While i understand that the situation it's different >>> I'm puzzled about the fact that all the servers seems able to write >>> to all the luns. >>> >>> SAN deviced usually should be connected to one server only, as >>> paralled access could create data corruption. In environments where >>> you connect a SAN to multiple servers ( example VMWARE cloud) its >>> softeware task to avoid data overwriting between server ( and data >>> corruption ). >>> >>> Honestly, what i was expecting is: server1 writing on his own luns, >>> and data traffic ( ethernet) to the other 2 server , basically >>> asking *them* to write on the other luns. I dont know if this >>> behaviour its normal or not. I triied to find a documentation about >>> that, but could not find any. >>> >>> Could somebody tell me if this _/"every server write to all the >>> luns"/_ its intended or not? >>> >>> Thanks in advance, >>> Salvatore >>> _______________________________________________ >>> gpfsug-discuss mailing list >>> gpfsug-discuss at gpfsug.org >>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >> >> >> >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at gpfsug.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: From dhildeb at us.ibm.com Fri Nov 7 22:42:06 2014 From: dhildeb at us.ibm.com (Dean Hildebrand) Date: Fri, 7 Nov 2014 23:42:06 +0100 Subject: [gpfsug-discuss] GPFS - pagepool data protection? In-Reply-To: References: Message-ID: Hi Paul, GPFS correctly implements POSIX semantics and NFS close-to-open semantics. Its a little complicated, but effectively what this means is that when the application issues certain calls to ensure data/metadata is "stable" (e.g., fsync), then it is guaranteed to be stable. It also controls ordering between nodes among many other things. As part of making sure data is stable, the GPFS recovery journal is used in a variety of instances. With VMWare ESX using NFS to GPFS, then the same thing occurs, except the situation is even more simple since every write request will have the 'stable' flag set, ensuring it does writethrough to the storage system. Dean Hildebrand IBM Almaden Research Center From: Pavel Pokorny To: gpfsug-discuss at gpfsug.org Date: 11/07/2014 03:15 AM Subject: [gpfsug-discuss] GPFS - pagepool data protection? Sent by: gpfsug-discuss-bounces at gpfsug.org Hello to all, I would like to ask question about pagepool and protection of data written through pagepool. Is there a possibility of loosing data written to GPFS in situation that data are stored in pagepool but still not written to disks? I think that for regular file system work this can be solved using GPFS journal. What about using GPFS as a NFS store for VMware datastores? Thank you for your answers, Pavel -- Ing. Pavel Pokorn? DATERA s.r.o.?|?Ovocn? trh 580/2?|?Praha?|?Czech Republic www.datera.cz?|?Mobil:?+420 602 357 194?|?E-mail:?pavel.pokorny at datera.cz _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: From jamiedavis at us.ibm.com Sat Nov 8 23:13:17 2014 From: jamiedavis at us.ibm.com (James Davis) Date: Sat, 8 Nov 2014 18:13:17 -0500 Subject: [gpfsug-discuss] Hi everybody Message-ID: Hey all, My name is Jamie Davis and I work for IBM on the GPFS test team. I'm interested in learning more about how customers use GPFS and what typical questions and issues are like, and I thought joining this mailing list would be a good start. If my presence seems inappropriate or makes anyone uncomfortable I can leave the list. --- I don't know how often GPFS users look in /usr/lpp/mmfs/samples...but while I'm sending a mass email, I thought I'd take a moment to point anyone running GPFS 4.1.0.4 to /usr/lpp/mmfs/samples/ilm/mmfind. mmfind is basically a find-esque wrapper around mmapplypolicy that I wrote in response to complaints I've heard about the learning curve associated with writing policies for mmapplypolicy. Since it's in samples, use-at-your-own-risk and I make no promise that everything works correctly. The -skipPolicy and -saveTmpFiles flags will do everything but actually run mmapplypolicy -- I suggest you double-check its work before you run it on a production system. Please send me any comments on it if you give it a try! Jamie Davis GPFS Test IBM -------------- next part -------------- An HTML attachment was scrubbed... URL: From chair at gpfsug.org Mon Nov 10 16:18:24 2014 From: chair at gpfsug.org (Jez Tucker) Date: Mon, 10 Nov 2014 16:18:24 +0000 Subject: [gpfsug-discuss] SC 14 and storagebeers events this week Message-ID: <5460E550.8020705@gpfsug.org> Hi all Just a quick reminder that the IBM GPFS User Group is at SC '14 in New Orleans Nov 17th. Also, there's a social in London W1 - #storagebeers on Nov 13th. For more info on both of these, please see the main website: www.gpfsug.org Best, Jez -------------- next part -------------- An HTML attachment was scrubbed... URL: From chair at gpfsug.org Tue Nov 11 13:59:38 2014 From: chair at gpfsug.org (Jez Tucker (Chair)) Date: Tue, 11 Nov 2014 13:59:38 +0000 Subject: [gpfsug-discuss] storagebeers postponed Message-ID: <5462164A.70607@gpfsug.org> Hi all I've just received notification that #storagebeers, due to happen 13th Nov, has unfortunately had to be postponed. I'll update you all with a new date when I receive it. Very best, Jez From jez at rib-it.org Tue Nov 11 16:49:48 2014 From: jez at rib-it.org (Jez Tucker) Date: Tue, 11 Nov 2014 16:49:48 +0000 Subject: [gpfsug-discuss] Hi everybody In-Reply-To: References: Message-ID: <54623E2C.2070903@rib-it.org> Hi Jamie, You're indeed very welcome. A few of the IBM devs are list members and their presence is appreciated. I suggest if you want to know more regarding use cases etc., ask some pointed questions. Discussion is good. Jez On 08/11/14 23:13, James Davis wrote: > > Hey all, > > My name is Jamie Davis and I work for IBM on the GPFS test team. I'm > interested in learning more about how customers use GPFS and what > typical questions and issues are like, and I thought joining this > mailing list would be a good start. If my presence seems inappropriate > or makes anyone uncomfortable I can leave the list. > > --- > > I don't know how often GPFS users look in /usr/lpp/mmfs/samples...but > while I'm sending a mass email, I thought I'd take a moment to point > anyone running GPFS 4.1.0.4 to /usr/lpp/mmfs/samples/ilm/mmfind. > mmfind is basically a find-esque wrapper around mmapplypolicy that I > wrote in response to complaints I've heard about the learning curve > associated with writing policies for mmapplypolicy. Since it's in > samples, use-at-your-own-risk and I make no promise that everything > works correctly. The -skipPolicy and -saveTmpFiles flags will do > everything but actually run mmapplypolicy -- I suggest you > double-check its work before you run it on a production system. > > Please send me any comments on it if you give it a try! > > Jamie Davis > GPFS Test > IBM > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at gpfsug.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From pavel.pokorny at datera.cz Wed Nov 12 12:20:57 2014 From: pavel.pokorny at datera.cz (Pavel Pokorny) Date: Wed, 12 Nov 2014 13:20:57 +0100 Subject: [gpfsug-discuss] GPFS - pagepool data protection? Message-ID: Hi, thanks. A I understand the write process to GPFS filesystem: 1. Application on a node makes write call 2. Token Manager stuff is done to coordinate the required-byte-range 3. mmfsd gets metadata from the file?s metanode 4. mmfsd acquires a buffer from the page pool 5. Data is moved from application data buffer to page pool buffer 6. VSD layer copies data from the page pool to the send pool and so on. What I am looking at and want to clarify is step 5. Situation when data is moved to page pool. What happen if the server will crash at tjis point? Will GPFS use journal to get to stable state? Thank you, Pavel -- Ing. Pavel Pokorn? DATERA s.r.o. | Ovocn? trh 580/2 | Praha | Czech Republic www.datera.cz | Mobil: +420 602 357 194 | E-mail: pavel.pokorny at datera.cz On Sat, Nov 8, 2014 at 1:00 PM, wrote: > Send gpfsug-discuss mailing list submissions to > gpfsug-discuss at gpfsug.org > > To subscribe or unsubscribe via the World Wide Web, visit > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > or, via email, send a message with subject or body 'help' to > gpfsug-discuss-request at gpfsug.org > > You can reach the person managing the list at > gpfsug-discuss-owner at gpfsug.org > > When replying, please edit your Subject line so it is more specific > than "Re: Contents of gpfsug-discuss digest..." > > > Today's Topics: > > 1. Re: GPFS - pagepool data protection? (Dean Hildebrand) > > > ---------------------------------------------------------------------- > > Message: 1 > Date: Fri, 7 Nov 2014 23:42:06 +0100 > From: Dean Hildebrand > To: gpfsug main discussion list > Subject: Re: [gpfsug-discuss] GPFS - pagepool data protection? > Message-ID: > < > OF1ED92A57.DD700837-ONC1257D89.007C4EF1-88257D89.007CB453 at us.ibm.com> > Content-Type: text/plain; charset="iso-8859-1" > > > Hi Paul, > > GPFS correctly implements POSIX semantics and NFS close-to-open semantics. > Its a little complicated, but effectively what this means is that when the > application issues certain calls to ensure data/metadata is "stable" (e.g., > fsync), then it is guaranteed to be stable. It also controls ordering > between nodes among many other things. As part of making sure data is > stable, the GPFS recovery journal is used in a variety of instances. > > With VMWare ESX using NFS to GPFS, then the same thing occurs, except the > situation is even more simple since every write request will have the > 'stable' flag set, ensuring it does writethrough to the storage system. > > Dean Hildebrand > IBM Almaden Research Center > > > > > From: Pavel Pokorny > To: gpfsug-discuss at gpfsug.org > Date: 11/07/2014 03:15 AM > Subject: [gpfsug-discuss] GPFS - pagepool data protection? > Sent by: gpfsug-discuss-bounces at gpfsug.org > > > > Hello to all, > I would like to ask question about pagepool and protection of data written > through pagepool. > Is there a possibility of loosing data written to GPFS in situation that > data are stored in pagepool but still not written to disks? > I think that for regular file system work this can be solved using GPFS > journal. What about using GPFS as a NFS store for VMware datastores? > Thank you for your answers, > Pavel > -- > Ing. Pavel Pokorn? > DATERA s.r.o.?|?Ovocn? trh 580/2?|?Praha?|?Czech Republic > www.datera.cz?|?Mobil:?+420 602 357 194?|?E-mail:?pavel.pokorny at datera.cz > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at gpfsug.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -------------- next part -------------- > An HTML attachment was scrubbed... > URL: < > http://gpfsug.org/pipermail/gpfsug-discuss/attachments/20141107/ecec5a47/attachment-0001.html > > > -------------- next part -------------- > A non-text attachment was scrubbed... > Name: graycol.gif > Type: image/gif > Size: 105 bytes > Desc: not available > URL: < > http://gpfsug.org/pipermail/gpfsug-discuss/attachments/20141107/ecec5a47/attachment-0001.gif > > > > ------------------------------ > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at gpfsug.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > End of gpfsug-discuss Digest, Vol 34, Issue 7 > ********************************************* > -------------- next part -------------- An HTML attachment was scrubbed... URL: From kraemerf at de.ibm.com Wed Nov 12 14:05:03 2014 From: kraemerf at de.ibm.com (Frank Kraemer) Date: Wed, 12 Nov 2014 15:05:03 +0100 Subject: [gpfsug-discuss] IBM Software Defined Infrastructure Roadshow 2014 (Frankfurt, London & Paris) Message-ID: FYI: IBM Software Defined Infrastructure Roadshow 2014 (Frankfurt 02. Dec 2014, London 03. Dec 2014 & Paris 04. Dec 2014) https://www-950.ibm.com/events/wwe/grp/grp019.nsf/v17_events?openform&lp=platform_computing_roadshow&locale=en_GB P.S. The German GPFS technical team will be available for discussions in Frankfurt. Feel free to contact me. -frank- Frank Kraemer IBM Consulting IT Specialist / Client Technical Architect Hechtsheimer Str. 2, 55131 Mainz mailto:kraemerf at de.ibm.com voice: +49171-3043699 IBM Germany From dhildeb at us.ibm.com Sat Nov 15 20:31:53 2014 From: dhildeb at us.ibm.com (Dean Hildebrand) Date: Sat, 15 Nov 2014 12:31:53 -0800 Subject: [gpfsug-discuss] GPFS - pagepool data protection? In-Reply-To: References: Message-ID: Hi Pavel, You are more or less right in your description, but the key that I tried to convey in my first email is that GPFS only obey's POSIX. So your question can be answered by looking at how your application performs the write and does your application ask to make the data live only in the pagepool or on stable storage. By default posix says that file create and writes are unstable, so just doing a write puts it in the pagepool and will be lost if a crash occurs immediately after. To make it stable, the application must do something in posix to make it stable, of which there are many ways to do so, including but not limited to O_SYNC, DIO, some form of fsync post write, etc, etc... Dean Hildebrand IBM Almaden Research Center From: Pavel Pokorny To: gpfsug-discuss at gpfsug.org Date: 11/12/2014 04:21 AM Subject: Re: [gpfsug-discuss] GPFS - pagepool data protection? Sent by: gpfsug-discuss-bounces at gpfsug.org Hi, thanks. A I understand the write process to GPFS filesystem: 1.?Application on a node makes write call 2.?Token Manager stuff is done to coordinate the required-byte-range 3.?mmfsd gets metadata from the file?s metanode 4.?mmfsd acquires a buffer from the page pool 5.?Data is moved from application data buffer to page pool buffer 6. VSD layer copies data from the page pool to the send pool ?and so on. What I am looking at and want to clarify is step 5. Situation when data is moved to page pool. What happen if the server will crash at tjis point? Will GPFS use journal to get to stable state? Thank you, Pavel -- Ing. Pavel Pokorn? DATERA s.r.o.?|?Ovocn? trh 580/2?|?Praha?|?Czech Republic www.datera.cz?|?Mobil:?+420 602 357 194?|?E-mail:?pavel.pokorny at datera.cz On Sat, Nov 8, 2014 at 1:00 PM, wrote: Send gpfsug-discuss mailing list submissions to ? ? ? ? gpfsug-discuss at gpfsug.org To subscribe or unsubscribe via the World Wide Web, visit ? ? ? ? http://gpfsug.org/mailman/listinfo/gpfsug-discuss or, via email, send a message with subject or body 'help' to ? ? ? ? gpfsug-discuss-request at gpfsug.org You can reach the person managing the list at ? ? ? ? gpfsug-discuss-owner at gpfsug.org When replying, please edit your Subject line so it is more specific than "Re: Contents of gpfsug-discuss digest..." Today's Topics: ? ?1. Re: GPFS - pagepool data protection? (Dean Hildebrand) ---------------------------------------------------------------------- Message: 1 Date: Fri, 7 Nov 2014 23:42:06 +0100 From: Dean Hildebrand To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] GPFS - pagepool data protection? Message-ID: ? ? ? ? < OF1ED92A57.DD700837-ONC1257D89.007C4EF1-88257D89.007CB453 at us.ibm.com> Content-Type: text/plain; charset="iso-8859-1" Hi Paul, GPFS correctly implements POSIX semantics and NFS close-to-open semantics. Its a little complicated, but effectively what this means is that when the application issues certain calls to ensure data/metadata is "stable" (e.g., fsync), then it is guaranteed to be stable.? It also controls ordering between nodes among many other things.? As part of making sure data is stable, the GPFS recovery journal is used in a variety of instances. With VMWare ESX using NFS to GPFS, then the same thing occurs, except the situation is even more simple since every write request will have the 'stable' flag set, ensuring it does writethrough to the storage system. Dean Hildebrand IBM Almaden Research Center From:? ?Pavel Pokorny To:? ? ?gpfsug-discuss at gpfsug.org Date:? ?11/07/2014 03:15 AM Subject:? ? ? ? [gpfsug-discuss] GPFS - pagepool data protection? Sent by:? ? ? ? gpfsug-discuss-bounces at gpfsug.org Hello to all, I would like to ask question about pagepool and protection of data written through pagepool. Is there a possibility of loosing data written to GPFS in situation that data are stored in pagepool but still not written to disks? I think that for regular file system work this can be solved using GPFS journal. What about using GPFS as a NFS store for VMware datastores? Thank you for your answers, Pavel -- Ing. Pavel Pokorn? DATERA s.r.o.?|?Ovocn? trh 580/2?|?Praha?|?Czech Republic www.datera.cz?|?Mobil:?+420 602 357 194?|?E-mail:?pavel.pokorny at datera.cz _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: < http://gpfsug.org/pipermail/gpfsug-discuss/attachments/20141107/ecec5a47/attachment-0001.html > -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: < http://gpfsug.org/pipermail/gpfsug-discuss/attachments/20141107/ecec5a47/attachment-0001.gif > ------------------------------ _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss End of gpfsug-discuss Digest, Vol 34, Issue 7 ********************************************* _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: From seanlee at tw.ibm.com Mon Nov 17 09:49:39 2014 From: seanlee at tw.ibm.com (Sean S Lee) Date: Mon, 17 Nov 2014 17:49:39 +0800 Subject: [gpfsug-discuss] GPFS - pagepool data protection? In-Reply-To: References: Message-ID: Hi Pavel, Most popular filesystems work that way. Write buffering improves the performance at the expense of some risk. Today most applications and all modern OS correctly handle "crash consistency", meaning they can recover from uncommitted writes. If you have data which absolutely cannot tolerate any "in-flight" data loss, it requires significant planning and resources on multiple levels, but as far as GPFS is concerned you could create a small file system and data (VMDK's) or serve GPFS or cNFS (mount GPFS with "syncfs", mount NFS with sync,no_wdelay) to VM clients from those filesystems. Your VM OS (VMDK) could be on a regular GPFS file system and your app data and logs could be on a small GPFS with synchronous writes. Regards Sean -------------- next part -------------- An HTML attachment was scrubbed... URL: From pavel.pokorny at datera.cz Mon Nov 17 12:49:26 2014 From: pavel.pokorny at datera.cz (Pavel Pokorny) Date: Mon, 17 Nov 2014 13:49:26 +0100 Subject: [gpfsug-discuss] GPFS - pagepool data protection? Message-ID: Hello, thanks you for all the answers, It is more clear now. Regards, Pavel -- Ing. Pavel Pokorn? DATERA s.r.o. | Ovocn? trh 580/2 | Praha | Czech Republic www.datera.cz | Mobil: +420 602 357 194 | E-mail: pavel.pokorny at datera.cz On Mon, Nov 17, 2014 at 1:00 PM, wrote: > Send gpfsug-discuss mailing list submissions to > gpfsug-discuss at gpfsug.org > > To subscribe or unsubscribe via the World Wide Web, visit > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > or, via email, send a message with subject or body 'help' to > gpfsug-discuss-request at gpfsug.org > > You can reach the person managing the list at > gpfsug-discuss-owner at gpfsug.org > > When replying, please edit your Subject line so it is more specific > than "Re: Contents of gpfsug-discuss digest..." > > > Today's Topics: > > 1. GPFS - pagepool data protection? (Sean S Lee) > > > ---------------------------------------------------------------------- > > Message: 1 > Date: Mon, 17 Nov 2014 17:49:39 +0800 > From: Sean S Lee > To: gpfsug-discuss at gpfsug.org > Subject: [gpfsug-discuss] GPFS - pagepool data protection? > Message-ID: > < > OF20A72494.9E59B93F-ON48257D93.00350BA6-48257D93.0035F912 at tw.ibm.com> > Content-Type: text/plain; charset="us-ascii" > > > Hi Pavel, > > Most popular filesystems work that way. > > Write buffering improves the performance at the expense of some risk. > Today most applications and all modern OS correctly handle "crash > consistency", meaning they can recover from uncommitted writes. > > If you have data which absolutely cannot tolerate any "in-flight" data > loss, it requires significant planning and resources on multiple levels, > but as far as GPFS is concerned you could create a small file system and > data (VMDK's) or serve GPFS or cNFS (mount GPFS with "syncfs", mount NFS > with sync,no_wdelay) to VM clients from those filesystems. > Your VM OS (VMDK) could be on a regular GPFS file system and your app data > and logs could be on a small GPFS with synchronous writes. > > Regards > Sean > -------------- next part -------------- > An HTML attachment was scrubbed... > URL: < > http://gpfsug.org/pipermail/gpfsug-discuss/attachments/20141117/1eb905cc/attachment-0001.html > > > > ------------------------------ > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at gpfsug.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > End of gpfsug-discuss Digest, Vol 34, Issue 13 > ********************************************** > -------------- next part -------------- An HTML attachment was scrubbed... URL: From orlando.richards at ed.ac.uk Wed Nov 19 16:35:44 2014 From: orlando.richards at ed.ac.uk (Orlando Richards) Date: Wed, 19 Nov 2014 16:35:44 +0000 Subject: [gpfsug-discuss] GPFS inside OpenStack guests Message-ID: <546CC6E0.1010800@ed.ac.uk> Hi folks, Does anyone have experience of running GPFS inside OpenStack guests, to connect to an existing (traditional, "bare metal") GPFS filesystem owning cluster? This is not using GPFS for openstack block/image storage - but using GPFS as a "NAS" service, with openstack guest instances as as a "GPFS client". --- Orlando -- -- Dr Orlando Richards Research Facilities (ECDF) Systems Leader Information Services IT Infrastructure Division Tel: 0131 650 4994 skype: orlando.richards The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336. From S.J.Thompson at bham.ac.uk Wed Nov 19 18:36:30 2014 From: S.J.Thompson at bham.ac.uk (Simon Thompson (Research Computing - IT Services)) Date: Wed, 19 Nov 2014 18:36:30 +0000 Subject: [gpfsug-discuss] GPFS inside OpenStack guests In-Reply-To: <546CC6E0.1010800@ed.ac.uk> References: <546CC6E0.1010800@ed.ac.uk> Message-ID: I was asking this question at the gpfs forum on Monday at sc, but there didn't seem to be much in how wr could do it. One of the suggestions was to basically use nfs, or there is the Manilla compnents of Openstack coming, but still that isn't really true gpfs access. I did wonder about virtio, but whether that would work with gpfs passed from the hosting system. Simon ________________________________________ From: gpfsug-discuss-bounces at gpfsug.org [gpfsug-discuss-bounces at gpfsug.org] on behalf of Orlando Richards [orlando.richards at ed.ac.uk] Sent: 19 November 2014 16:35 To: gpfsug-discuss at gpfsug.org Subject: [gpfsug-discuss] GPFS inside OpenStack guests Hi folks, Does anyone have experience of running GPFS inside OpenStack guests, to connect to an existing (traditional, "bare metal") GPFS filesystem owning cluster? This is not using GPFS for openstack block/image storage - but using GPFS as a "NAS" service, with openstack guest instances as as a "GPFS client". --- Orlando -- -- Dr Orlando Richards Research Facilities (ECDF) Systems Leader Information Services IT Infrastructure Division Tel: 0131 650 4994 skype: orlando.richards The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss From oehmes at gmail.com Wed Nov 19 19:00:50 2014 From: oehmes at gmail.com (Sven Oehme) Date: Wed, 19 Nov 2014 11:00:50 -0800 Subject: [gpfsug-discuss] GPFS inside OpenStack guests In-Reply-To: <546CC6E0.1010800@ed.ac.uk> References: <546CC6E0.1010800@ed.ac.uk> Message-ID: technically there are multiple ways to do this. 1. you can use the NSD protocol for this, just need to have adequate Network resources (or use PCI pass trough of the network adapter to the guest) 2. you attach the physical disks as virtio block devices 3. pass trough of the Block HBA (e.g. FC adapter) into the guest. if you use virtio you need to make sure all caching is disabled entirely or you end up with major issues and i am not sure about official support for this, 1 and 3 are straight forward ... Sven On Wed, Nov 19, 2014 at 8:35 AM, Orlando Richards wrote: > Hi folks, > > Does anyone have experience of running GPFS inside OpenStack guests, to > connect to an existing (traditional, "bare metal") GPFS filesystem owning > cluster? > > This is not using GPFS for openstack block/image storage - but using GPFS > as a "NAS" service, with openstack guest instances as as a "GPFS client". > > > --- > Orlando > > > > > -- > -- > Dr Orlando Richards > Research Facilities (ECDF) Systems Leader > Information Services > IT Infrastructure Division > Tel: 0131 650 4994 > skype: orlando.richards > > The University of Edinburgh is a charitable body, registered in Scotland, > with registration number SC005336. > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at gpfsug.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: From S.J.Thompson at bham.ac.uk Wed Nov 19 19:03:55 2014 From: S.J.Thompson at bham.ac.uk (Simon Thompson (Research Computing - IT Services)) Date: Wed, 19 Nov 2014 19:03:55 +0000 Subject: [gpfsug-discuss] GPFS inside OpenStack guests In-Reply-To: References: <546CC6E0.1010800@ed.ac.uk>, Message-ID: Yes, what about the random name nature of a vm image? For example I spin up a new vm, how does it join the gpfs cluster to be able to use nsd protocol? And how about attaching to the netowkrk as neutron networking uses per tenant networks, so how would you actually get access to the gpfs cluster? Simon ________________________________________ From: gpfsug-discuss-bounces at gpfsug.org [gpfsug-discuss-bounces at gpfsug.org] on behalf of Sven Oehme [oehmes at gmail.com] Sent: 19 November 2014 19:00 To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] GPFS inside OpenStack guests technically there are multiple ways to do this. 1. you can use the NSD protocol for this, just need to have adequate Network resources (or use PCI pass trough of the network adapter to the guest) 2. you attach the physical disks as virtio block devices 3. pass trough of the Block HBA (e.g. FC adapter) into the guest. if you use virtio you need to make sure all caching is disabled entirely or you end up with major issues and i am not sure about official support for this, 1 and 3 are straight forward ... Sven On Wed, Nov 19, 2014 at 8:35 AM, Orlando Richards > wrote: Hi folks, Does anyone have experience of running GPFS inside OpenStack guests, to connect to an existing (traditional, "bare metal") GPFS filesystem owning cluster? This is not using GPFS for openstack block/image storage - but using GPFS as a "NAS" service, with openstack guest instances as as a "GPFS client". --- Orlando -- -- Dr Orlando Richards Research Facilities (ECDF) Systems Leader Information Services IT Infrastructure Division Tel: 0131 650 4994 skype: orlando.richards The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss From chekh at stanford.edu Wed Nov 19 19:37:50 2014 From: chekh at stanford.edu (Alex Chekholko) Date: Wed, 19 Nov 2014 11:37:50 -0800 Subject: [gpfsug-discuss] GPFS inside OpenStack guests In-Reply-To: References: <546CC6E0.1010800@ed.ac.uk>, Message-ID: <546CF18E.3010802@stanford.edu> Just make the new VMs NFS clients, no? It's so much simpler and the performance is not much less. But you do need to run CNFS in the GPFS cluster. On 11/19/14 11:03 AM, Simon Thompson (Research Computing - IT Services) wrote: > > Yes, what about the random name nature of a vm image? > > For example I spin up a new vm, how does it join the gpfs cluster to be able to use nsd protocol? > > And how about attaching to the netowkrk as neutron networking uses per tenant networks, so how would you actually get access to the gpfs cluster? > > Simon > ________________________________________ > From: gpfsug-discuss-bounces at gpfsug.org [gpfsug-discuss-bounces at gpfsug.org] on behalf of Sven Oehme [oehmes at gmail.com] > Sent: 19 November 2014 19:00 > To: gpfsug main discussion list > Subject: Re: [gpfsug-discuss] GPFS inside OpenStack guests > > technically there are multiple ways to do this. > > 1. you can use the NSD protocol for this, just need to have adequate Network resources (or use PCI pass trough of the network adapter to the guest) > 2. you attach the physical disks as virtio block devices > 3. pass trough of the Block HBA (e.g. FC adapter) into the guest. > > if you use virtio you need to make sure all caching is disabled entirely or you end up with major issues and i am not sure about official support for this, 1 and 3 are straight forward ... > > Sven > > > > > > On Wed, Nov 19, 2014 at 8:35 AM, Orlando Richards > wrote: > Hi folks, > > Does anyone have experience of running GPFS inside OpenStack guests, to connect to an existing (traditional, "bare metal") GPFS filesystem owning cluster? > > This is not using GPFS for openstack block/image storage - but using GPFS as a "NAS" service, with openstack guest instances as as a "GPFS client". > > > --- > Orlando > > > > > -- > -- > Dr Orlando Richards > Research Facilities (ECDF) Systems Leader > Information Services > IT Infrastructure Division > Tel: 0131 650 4994 > skype: orlando.richards > > The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336. > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at gpfsug.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at gpfsug.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -- Alex Chekholko chekh at stanford.edu From orlando.richards at ed.ac.uk Wed Nov 19 20:56:32 2014 From: orlando.richards at ed.ac.uk (orlando.richards at ed.ac.uk) Date: Wed, 19 Nov 2014 20:56:32 +0000 (GMT) Subject: [gpfsug-discuss] GPFS inside OpenStack guests In-Reply-To: References: <546CC6E0.1010800@ed.ac.uk> Message-ID: On Wed, 19 Nov 2014, Simon Thompson (Research Computing - IT Services) wrote: > > I was asking this question at the gpfs forum on Monday at sc, but there didn't seem to be much in how wr could do it. > > One of the suggestions was to basically use nfs, or there is the Manilla compnents of Openstack coming, but still that isn't really true gpfs access. > NFS should be easy enough - but you can lose a lot of the gpfs good-ness by doing that (acl's, cloning, performance?, etc). > I did wonder about virtio, but whether that would work with gpfs passed from the hosting system. I was more looking for something fairly native - so that we don't have to, for example, start heavily customising the hypervisor stack. In fact - if you're pushing out to a third-party service provider cloud (and that could be your internal organisation's cloud run as a separate service) then you don't have that option at all. I've not dug into virtio much in a basic kvm hypervisor, but one of the guys in EPCC has been trying it out. Initial impressions (once he got it working!) were tarred by terrible performance. I've not caught up with how he got on after that initial look. > > Simon > ________________________________________ > From: gpfsug-discuss-bounces at gpfsug.org [gpfsug-discuss-bounces at gpfsug.org] on behalf of Orlando Richards [orlando.richards at ed.ac.uk] > Sent: 19 November 2014 16:35 > To: gpfsug-discuss at gpfsug.org > Subject: [gpfsug-discuss] GPFS inside OpenStack guests > > Hi folks, > > Does anyone have experience of running GPFS inside OpenStack guests, to > connect to an existing (traditional, "bare metal") GPFS filesystem > owning cluster? > > This is not using GPFS for openstack block/image storage - but using > GPFS as a "NAS" service, with openstack guest instances as as a "GPFS > client". > > > --- > Orlando > > > > > -- > -- > Dr Orlando Richards > Research Facilities (ECDF) Systems Leader > Information Services > IT Infrastructure Division > Tel: 0131 650 4994 > skype: orlando.richards > > The University of Edinburgh is a charitable body, registered in > Scotland, with registration number SC005336. > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at gpfsug.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at gpfsug.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > -- -- Dr Orlando Richards Research Facilities (ECDF) Systems Leader Information Services IT Infrastructure Division Tel: 0131 650 4994 skype: orlando.richards The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336. From orlando.richards at ed.ac.uk Wed Nov 19 20:56:38 2014 From: orlando.richards at ed.ac.uk (orlando.richards at ed.ac.uk) Date: Wed, 19 Nov 2014 20:56:38 +0000 (GMT) Subject: [gpfsug-discuss] GPFS inside OpenStack guests In-Reply-To: References: <546CC6E0.1010800@ed.ac.uk>, Message-ID: On Wed, 19 Nov 2014, Simon Thompson (Research Computing - IT Services) wrote: > > Yes, what about the random name nature of a vm image? > > > For example I spin up a new vm, how does it join the gpfs cluster to be able to use nsd protocol? I *think* this bit should be solvable - assuming one can pre-define the range of names the node will have, and can pre-populate your gpfs cluster config with these node names. The guest image should then have the full /var/mmfs tree (pulled from another gpfs node), but with the /var/mmfs/gen/mmfsNodeData file removed. When it starts up, it'll figure out "who" it is and regenerate that file, pull the latest cluster config from the primary config server, and start up. > And how about attaching to the netowkrk as neutron networking uses per tenant networks, so how would you actually get access to the gpfs cluster? This bit is where I can see the potential pitfall. OpenStack naturally uses NAT to handle traffic to and from guests - will GPFS cope with nat'ted clients in this way? Fair point on NFS from Alex - but will you get the same multi-threaded performance from NFS compared with GPFS? Also - could you make each hypervisor an NFS server for its guests, thus doing away with the need for CNFS, and removing the potential for the nfs server threads bottlenecking? For instance - if I have 300 worker nodes, and 7 NSD servers - I'd then have 300 NFS servers running, rather than 7 NFS servers. Direct block access to the storage from the hypervisor would also be possible (network configuration permitting), and the NFS traffic would flow only over a "virtual" network within the hypervisor, and so "should" (?) be more efficient. > > Simon > ________________________________________ > From: gpfsug-discuss-bounces at gpfsug.org [gpfsug-discuss-bounces at gpfsug.org] on behalf of Sven Oehme [oehmes at gmail.com] > Sent: 19 November 2014 19:00 > To: gpfsug main discussion list > Subject: Re: [gpfsug-discuss] GPFS inside OpenStack guests > > technically there are multiple ways to do this. > > 1. you can use the NSD protocol for this, just need to have adequate Network resources (or use PCI pass trough of the network adapter to the guest) > 2. you attach the physical disks as virtio block devices > 3. pass trough of the Block HBA (e.g. FC adapter) into the guest. > > if you use virtio you need to make sure all caching is disabled entirely or you end up with major issues and i am not sure about official support for this, 1 and 3 are straight forward ... > > Sven > > > > > > On Wed, Nov 19, 2014 at 8:35 AM, Orlando Richards > wrote: > Hi folks, > > Does anyone have experience of running GPFS inside OpenStack guests, to connect to an existing (traditional, "bare metal") GPFS filesystem owning cluster? > > This is not using GPFS for openstack block/image storage - but using GPFS as a "NAS" service, with openstack guest instances as as a "GPFS client". > > > --- > Orlando > > > > > -- > -- > Dr Orlando Richards > Research Facilities (ECDF) Systems Leader > Information Services > IT Infrastructure Division > Tel: 0131 650 4994 > skype: orlando.richards > > The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336. > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at gpfsug.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at gpfsug.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > -- -- Dr Orlando Richards Research Facilities (ECDF) Systems Leader Information Services IT Infrastructure Division Tel: 0131 650 4994 skype: orlando.richards The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336. From S.J.Thompson at bham.ac.uk Thu Nov 20 00:20:44 2014 From: S.J.Thompson at bham.ac.uk (Simon Thompson (Research Computing - IT Services)) Date: Thu, 20 Nov 2014 00:20:44 +0000 Subject: [gpfsug-discuss] GPFS inside OpenStack guests In-Reply-To: References: <546CC6E0.1010800@ed.ac.uk> Message-ID: On 19/11/2014 14:56, "orlando.richards at ed.ac.uk" wrote: >> >>And how about attaching to the netowkrk as neutron networking uses per >>tenant networks, so how would you actually get access to the gpfs >>cluster? > >This bit is where I can see the potential pitfall. OpenStack naturally >uses NAT to handle traffic to and from guests - will GPFS cope with >nat'ted clients in this way? Well, not necessarily, I was thinking about this and potentially you could create an external shared network which is bound to your GPFS interface, though there?s possible security questions maybe around exposing a real internal network device into a VM. I think there is also a Mellanox driver for the VPI Pro cards which allow you to pass the card through to instances. I can?t remember if that was just acceleration for Ethernet or if it could do IB as well. >Also - could you make each hypervisor an NFS server for its guests, thus >doing away with the need for CNFS, and removing the potential for the nfs >server threads bottlenecking? For instance - if I have 300 worker nodes, >and 7 NSD servers - I'd then have 300 NFS servers running, rather than 7 Would you then not need to have 300 server licenses though? Simon From jonathan at buzzard.me.uk Thu Nov 20 10:03:01 2014 From: jonathan at buzzard.me.uk (Jonathan Buzzard) Date: Thu, 20 Nov 2014 10:03:01 +0000 Subject: [gpfsug-discuss] GPFS inside OpenStack guests In-Reply-To: References: <546CC6E0.1010800@ed.ac.uk> , Message-ID: <1416477781.4171.23.camel@buzzard.phy.strath.ac.uk> On Wed, 2014-11-19 at 20:56 +0000, orlando.richards at ed.ac.uk wrote: > On Wed, 19 Nov 2014, Simon Thompson (Research Computing - IT Services) > wrote: > > > > > Yes, what about the random name nature of a vm image? > > > > > > For example I spin up a new vm, how does it join the gpfs cluster to be able to use nsd protocol? > > > I *think* this bit should be solvable - assuming one can pre-define the > range of names the node will have, and can pre-populate your gpfs cluster > config with these node names. The guest image should then have the full > /var/mmfs tree (pulled from another gpfs node), but with the > /var/mmfs/gen/mmfsNodeData file removed. When it starts up, it'll figure > out "who" it is and regenerate that file, pull the latest cluster config > from the primary config server, and start up. It's perfectly solvable with a bit of scripting and putting the cluster into admin mode central. > > > > > And how about attaching to the netowkrk as neutron networking uses per tenant networks, so how would you actually get access to the gpfs cluster? > > This bit is where I can see the potential pitfall. OpenStack naturally > uses NAT to handle traffic to and from guests - will GPFS cope with > nat'ted clients in this way? Not going to work with NAT. GPFS has some "funny" ideas about networking, but to put it succinctly all the nodes have to be on the same class A, B or C network. Though it considers every address in a class A network to be on the same network even though you may have divided it up internally into different networks. Consequently the network model in GPFS is broken. You would need to use bridged mode aka FlatNetworking in OpenStacks for this to work, but surely Jan knows all this. JAB. -- Jonathan A. Buzzard Email: jonathan (at) buzzard.me.uk Fife, United Kingdom. From janfrode at tanso.net Fri Nov 21 19:35:48 2014 From: janfrode at tanso.net (Jan-Frode Myklebust) Date: Fri, 21 Nov 2014 20:35:48 +0100 Subject: [gpfsug-discuss] Gathering node/fs statistics ? Message-ID: <20141121193548.GA11920@mushkin.tanso.net> I'm considering writing a Performance CoPilot agent (PMDA, Performance Metrics Domain Agent) for GPFS, and would like to collect all/most of the metrics that are already available in the gpfs SNMP agent -- ideally without using SNMP.. So, could someone help me with where to find GPFS performance data? I've noticed "mmfsadm" has a "resetstats" option, but what are these stats / where can I find them? All in mmpmon? Also the GPFS-MIB.txt seems to point at some commands I'm unfamiliar with: -- all other node data from EE "get nodes" command -- Status info from EE "get fs -b" command -- Performance data from mmpmon "gfis" command -- Storage pool table comes from EE "get pools" command -- Storage pool data comes from SDR and EE "get pools" command -- Disk data from EE "get fs" command -- Disk performance data from mmpmon "ds" command: -- From mmpmon nc: Any idea what 'EE "get nodes"' is? And what do they mean by 'mmpmon "gfis"', "nc" or "ds"? These commands doesn't work when fed to mmpmon.. -jf From oehmes at gmail.com Fri Nov 21 20:15:16 2014 From: oehmes at gmail.com (Sven Oehme) Date: Fri, 21 Nov 2014 12:15:16 -0800 Subject: [gpfsug-discuss] Gathering node/fs statistics ? In-Reply-To: <20141121193548.GA11920@mushkin.tanso.net> References: <20141121193548.GA11920@mushkin.tanso.net> Message-ID: Hi, you should take a look at the following 3 links : my performance talk about GPFS , take a look at the dstat plugin mentioned in the charts : http://www.gpfsug.org/wp-content/uploads/2014/05/UG10_GPFS_Performance_Session_v10.pdf documentation about the mmpmon interface and use in GPFS : http://www-01.ibm.com/support/knowledgecenter/SSFKCN_4.1.0/com.ibm.cluster.gpfs.v4r1.gpfs200.doc/bl1adv_mmpmonch.htm documentation about GSS/ESS/GNR in case you care about this as well and its additional mmpmon commands : http://www-01.ibm.com/support/knowledgecenter/SSFKCN/bl1du14a.pdf Sven On Fri, Nov 21, 2014 at 11:35 AM, Jan-Frode Myklebust wrote: > I'm considering writing a Performance CoPilot agent (PMDA, Performance > Metrics Domain Agent) for GPFS, and would like to collect all/most of > the metrics that are already available in the gpfs SNMP agent -- ideally > without using SNMP.. > > So, could someone help me with where to find GPFS performance data? I've > noticed "mmfsadm" has a "resetstats" option, but what are these stats / > where can I find them? All in mmpmon? > > Also the GPFS-MIB.txt seems to point at some commands I'm unfamiliar > with: > > -- all other node data from EE "get nodes" command > -- Status info from EE "get fs -b" command > -- Performance data from mmpmon "gfis" command > -- Storage pool table comes from EE "get pools" command > -- Storage pool data comes from SDR and EE "get pools" command > -- Disk data from EE "get fs" command > -- Disk performance data from mmpmon "ds" command: > -- From mmpmon nc: > > > Any idea what 'EE "get nodes"' is? And what do they mean by 'mmpmon > "gfis"', "nc" or "ds"? These commands doesn't work when fed to mmpmon.. > > > > -jf > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at gpfsug.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: From oester at gmail.com Fri Nov 21 20:29:05 2014 From: oester at gmail.com (Bob Oesterlin) Date: Fri, 21 Nov 2014 14:29:05 -0600 Subject: [gpfsug-discuss] Gathering node/fs statistics ? In-Reply-To: <20141121193548.GA11920@mushkin.tanso.net> References: <20141121193548.GA11920@mushkin.tanso.net> Message-ID: You might want to look at Arxview, www.arxscan.com. I've been working with them and they have good GPFS and Storage monitoring based on mmpmon. Lightweight too. Bob Oesterlin Nuance Communications On Friday, November 21, 2014, Jan-Frode Myklebust wrote: > I'm considering writing a Performance CoPilot agent (PMDA, Performance > Metrics Domain Agent) for GPFS, and would like to collect all/most of > the metrics that are already available in the gpfs SNMP agent -- ideally > without using SNMP.. > > So, could someone help me with where to find GPFS performance data? I've > noticed "mmfsadm" has a "resetstats" option, but what are these stats / > where can I find them? All in mmpmon? > > Also the GPFS-MIB.txt seems to point at some commands I'm unfamiliar > with: > > -- all other node data from EE "get nodes" command > -- Status info from EE "get fs -b" command > -- Performance data from mmpmon "gfis" command > -- Storage pool table comes from EE "get pools" command > -- Storage pool data comes from SDR and EE "get pools" command > -- Disk data from EE "get fs" command > -- Disk performance data from mmpmon "ds" command: > -- From mmpmon nc: > > > Any idea what 'EE "get nodes"' is? And what do they mean by 'mmpmon > "gfis"', "nc" or "ds"? These commands doesn't work when fed to mmpmon.. > > > > -jf > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at gpfsug.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -- Bob Oesterlin -------------- next part -------------- An HTML attachment was scrubbed... URL: From sabujp at gmail.com Fri Nov 21 22:50:02 2014 From: sabujp at gmail.com (Sabuj Pattanayek) Date: Fri, 21 Nov 2014 16:50:02 -0600 Subject: [gpfsug-discuss] any difference with the filespace view mmbackup sees from a global snapshot vs a snapshot on -j root with only 1 independent fileset (root)? Message-ID: Hi all, We're running 3.5.0.19 . Is there any difference in terms of the view of the filespace that mmbackup sees and then passes to TSM if we run mmbackup against a global snapshot vs a snapshot on -j root if we only have and ever plan on having one independent fileset (root)? It doesn't look like it to me just from ls, but just verifying. We want to get away from using a global snapshot if possible (and start using -j root snapshots) instead because for some reason it looks like it takes much much longer to run mmdelsnapshot on a global snapshot vs a snapshot on the root fileset. Thanks, Sabuj -------------- next part -------------- An HTML attachment was scrubbed... URL: From stijn.deweirdt at ugent.be Mon Nov 24 21:22:19 2014 From: stijn.deweirdt at ugent.be (Stijn De Weirdt) Date: Mon, 24 Nov 2014 22:22:19 +0100 Subject: [gpfsug-discuss] restripe or not Message-ID: <5473A18B.7000702@ugent.be> hi all, we are going to expand an existing filestytem with approx 50% capacity. the current filesystem is 75% full. we are in downtime (for more then just this reason), so we can take the IO rebalance hit for a while (say max 48hours). my questions: a. do we really need to rebalance? the mmadddisk page suggest normally it's going to be ok, but i never understood that. new data will end up mainly on new disks, so wrt to performance, this can't really work out, can it? b. can we change the priority of rebalancing somehow (fewer nodes taking part in the rebalance?) c. once we start the rebalance, how save is it to stop with kill or ctrl-c (or can we say eg. rebalance 25% now, rest later?) (and how often can we do this? eg a daily cron job to restripe at max one hour per day, would this cause issue in the long term many thanks, stijn From zgiles at gmail.com Mon Nov 24 23:14:21 2014 From: zgiles at gmail.com (Zachary Giles) Date: Mon, 24 Nov 2014 18:14:21 -0500 Subject: [gpfsug-discuss] restripe or not In-Reply-To: <5473A18B.7000702@ugent.be> References: <5473A18B.7000702@ugent.be> Message-ID: Interesting question.. Just some thoughts: Not an expert on restriping myself: * Your new storage -- is it the same size, shape, speed as the old storage? If not, then are you going to add it to the same storage pool, or an additional storage pool? If additional, restripe is not needed, as you can't / don't need to restripe across storage pools, the data will be in one or the other. However, you of course will need to make a policy to place data correctly. Of course, if you're going to double your storage and all your new data will be written to the new disks, then you may be leaving quite a bit of capacity on the floor. * mmadddisk man page and normal balancing -- yes, we've seen this suggestion as well -- that is, that new data will generally fill across the cluster and eventually fill in the gaps. We didn't restripe on a much smaller storage pool and it eventually did balance out, however, it was also a "tier 1" where data is migrated out often. If I were doubling my primary storage with more of the exact same disks, I'd probably restripe. * Stopping a restripe -- I'm """ Pretty Sure """ you can stop a restripe safely with a Ctrl-C. I'm """ Pretty Sure """ we've done that a few times ourselves with no problem. I remember I was going to restripe something but the estimates were too high and so I stopped it. I'd feel fairly confident in doing it, but I don't want to take responsibility for your storage. :) :) I don't think there's a need to restripe every hour or anything. If you're generally balanced at one point, you'd probably continue to be under normal operation. On Mon, Nov 24, 2014 at 4:22 PM, Stijn De Weirdt wrote: > hi all, > > we are going to expand an existing filestytem with approx 50% capacity. > the current filesystem is 75% full. > > we are in downtime (for more then just this reason), so we can take the IO > rebalance hit for a while (say max 48hours). > > my questions: > a. do we really need to rebalance? the mmadddisk page suggest normally > it's going to be ok, but i never understood that. new data will end up > mainly on new disks, so wrt to performance, this can't really work out, can > it? > b. can we change the priority of rebalancing somehow (fewer nodes taking > part in the rebalance?) > c. once we start the rebalance, how save is it to stop with kill or ctrl-c > (or can we say eg. rebalance 25% now, rest later?) > (and how often can we do this? eg a daily cron job to restripe at max one > hour per day, would this cause issue in the long term > > > many thanks, > > stijn > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at gpfsug.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -- Zach Giles zgiles at gmail.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From oester at gmail.com Tue Nov 25 02:01:06 2014 From: oester at gmail.com (Bob Oesterlin) Date: Mon, 24 Nov 2014 20:01:06 -0600 Subject: [gpfsug-discuss] restripe or not In-Reply-To: <5473A18B.7000702@ugent.be> References: <5473A18B.7000702@ugent.be> Message-ID: In general, the need to restripe after a disk add is dependent on a number of factors, as has been pointed out.. A couple of other thoughts/suggestions: - One thing you might consider (depending on your pattern of read/write traffic), is selectively suspending one or more of the existing NSDs, forcing GPFS to write new blocks to the new NSDs. That way at least some of the new data is being written to the new storage by default, rather than using up blocks on the existing NSDs. You can suspend/resume disks at any time. - You can pick a subset of nodes to perform the restripe with "mmrestripefs -N node1,node2,..." Keep in mind you'll get much better performance and less impact to the filesystem if you choose NSD servers with direct access to the disk. - Resume of restripe: Yes, you can do this, no harm, done it many times. You can track the balance of the disks using "mmdf ". This is a pretty intensive command, so I wouldn't run in frequently. Check it a few times each day, see if the data balance is improving by itself. When you stop/restart it, the restripe doesn't pick up exactly where it left off, it's going to scan the entire file system again. - You can also restripe single files if the are large and get a heavy I/O (mmrestripefile) Bob Oesterlin Nuance Communications On Mon, Nov 24, 2014 at 3:22 PM, Stijn De Weirdt wrote: > hi all, > > we are going to expand an existing filestytem with approx 50% capacity. > the current filesystem is 75% full. > > we are in downtime (for more then just this reason), so we can take the IO > rebalance hit for a while (say max 48hours). > > my questions: > a. do we really need to rebalance? the mmadddisk page suggest normally > it's going to be ok, but i never understood that. new data will end up > mainly on new disks, so wrt to performance, this can't really work out, can > it? > b. can we change the priority of rebalancing somehow (fewer nodes taking > part in the rebalance?) > c. once we start the rebalance, how save is it to stop with kill or ctrl-c > (or can we say eg. rebalance 25% now, rest later?) > (and how often can we do this? eg a daily cron job to restripe at max one > hour per day, would this cause issue in the long term > > > many thanks, > > stijn > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at gpfsug.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: From stijn.deweirdt at ugent.be Tue Nov 25 07:17:56 2014 From: stijn.deweirdt at ugent.be (Stijn De Weirdt) Date: Tue, 25 Nov 2014 08:17:56 +0100 Subject: [gpfsug-discuss] restripe or not In-Reply-To: References: <5473A18B.7000702@ugent.be> Message-ID: <54742D24.8090602@ugent.be> hi zachary, > * Your new storage -- is it the same size, shape, speed as the old storage? yes. we created and used it as "test" filesystem on the same hardware when we started. now we are shrinking the test filesystem and adding the free disks to the production one. > If not, then are you going to add it to the same storage pool, or an > additional storage pool? If additional, restripe is not needed, as you > can't / don't need to restripe across storage pools, the data will be in > one or the other. However, you of course will need to make a policy to > place data correctly. sure, but in this case, they end up in teh same pool. > Of course, if you're going to double your storage and all your new data > will be written to the new disks, then you may be leaving quite a bit of > capacity on the floor. > > * mmadddisk man page and normal balancing -- yes, we've seen this > suggestion as well -- that is, that new data will generally fill across the > cluster and eventually fill in the gaps. We didn't restripe on a much > smaller storage pool and it eventually did balance out, however, it was > also a "tier 1" where data is migrated out often. If I were doubling my > primary storage with more of the exact same disks, I'd probably restripe. more then half of the data on the current filesystem is more or less static (we expect it to stay there 2-3 year unmodified). similar data will be added in the near future. > > * Stopping a restripe -- I'm """ Pretty Sure """ you can stop a restripe > safely with a Ctrl-C. I'm """ Pretty Sure """ we've done that a few times > ourselves with no problem. I remember I was going to restripe something but > the estimates were too high and so I stopped it. I'd feel fairly confident > in doing it, but I don't want to take responsibility for your storage. :) yeah, i've also remember cancelling a restripe and i'm pretty sure it ddin't cause problems (i would certainly remember the problems ;) i'm looking for some further confirmation (or e.g. a reference to some docuemnt that says so. i vaguely remember sven(?) saying this on the lodon gpfs user day this year. > :) I don't think there's a need to restripe every hour or anything. If > you're generally balanced at one point, you'd probably continue to be under > normal operation. i was thinking to spread the total restripe over one or 2 hour periods each days the coming week(s); but i'm now realising this might not be the best idea, because it will rebalance any new data as well, slowing down the bulk rebalancing. anyway, thanks for the feedback. i'll probably let the rebalance run for 48 hours and see how far it got by that time. stijn > > > > > > On Mon, Nov 24, 2014 at 4:22 PM, Stijn De Weirdt > wrote: > >> hi all, >> >> we are going to expand an existing filestytem with approx 50% capacity. >> the current filesystem is 75% full. >> >> we are in downtime (for more then just this reason), so we can take the IO >> rebalance hit for a while (say max 48hours). >> >> my questions: >> a. do we really need to rebalance? the mmadddisk page suggest normally >> it's going to be ok, but i never understood that. new data will end up >> mainly on new disks, so wrt to performance, this can't really work out, can >> it? >> b. can we change the priority of rebalancing somehow (fewer nodes taking >> part in the rebalance?) >> c. once we start the rebalance, how save is it to stop with kill or ctrl-c >> (or can we say eg. rebalance 25% now, rest later?) >> (and how often can we do this? eg a daily cron job to restripe at max one >> hour per day, would this cause issue in the long term >> >> >> many thanks, >> >> stijn >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at gpfsug.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >> > > > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at gpfsug.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > From stijn.deweirdt at ugent.be Tue Nov 25 07:23:41 2014 From: stijn.deweirdt at ugent.be (Stijn De Weirdt) Date: Tue, 25 Nov 2014 08:23:41 +0100 Subject: [gpfsug-discuss] restripe or not In-Reply-To: References: <5473A18B.7000702@ugent.be> Message-ID: <54742E7D.7090009@ugent.be> hi bob, > - One thing you might consider (depending on your pattern of read/write > traffic), is selectively suspending one or more of the existing NSDs, > forcing GPFS to write new blocks to the new NSDs. That way at least some of > the new data is being written to the new storage by default, rather than > using up blocks on the existing NSDs. You can suspend/resume disks at any > time. is the gpfs placment weighted with the avalaible volume? i'd rather not make this a manual operation. > > - You can pick a subset of nodes to perform the restripe with "mmrestripefs > -N node1,node2,..." Keep in mind you'll get much better performance and > less impact to the filesystem if you choose NSD servers with direct access > to the disk. yes and i no i guess, our nsds see all disks, but the problem with nsds is that they don't honour any roles (our primary nsds have the preferred path to the controller and lun, meaning all access from non-primary nsd to that disk is suboptimal). > > - Resume of restripe: Yes, you can do this, no harm, done it many times. > You can track the balance of the disks using "mmdf ". This is a > pretty intensive command, so I wouldn't run in frequently. Check it a few > times each day, see if the data balance is improving by itself. When you thanks for the tip to monitor it with mmdf! > stop/restart it, the restripe doesn't pick up exactly where it left off, > it's going to scan the entire file system again. yeah, i realised that this is a flaw in my "one-hour a day" restripe idea ;) > > - You can also restripe single files if the are large and get a heavy I/O > (mmrestripefile) excellent tip! forgot about that one. if the rebalnce is to slow, i can run this on the static data. thanks a lot for the feedback stijn > > Bob Oesterlin > Nuance Communications > > > On Mon, Nov 24, 2014 at 3:22 PM, Stijn De Weirdt > wrote: > >> hi all, >> >> we are going to expand an existing filestytem with approx 50% capacity. >> the current filesystem is 75% full. >> >> we are in downtime (for more then just this reason), so we can take the IO >> rebalance hit for a while (say max 48hours). >> >> my questions: >> a. do we really need to rebalance? the mmadddisk page suggest normally >> it's going to be ok, but i never understood that. new data will end up >> mainly on new disks, so wrt to performance, this can't really work out, can >> it? >> b. can we change the priority of rebalancing somehow (fewer nodes taking >> part in the rebalance?) >> c. once we start the rebalance, how save is it to stop with kill or ctrl-c >> (or can we say eg. rebalance 25% now, rest later?) >> (and how often can we do this? eg a daily cron job to restripe at max one >> hour per day, would this cause issue in the long term >> >> >> many thanks, >> >> stijn >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at gpfsug.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >> > From L.A.Hurst at bham.ac.uk Tue Nov 25 10:45:51 2014 From: L.A.Hurst at bham.ac.uk (Laurence Alexander Hurst) Date: Tue, 25 Nov 2014 10:45:51 +0000 Subject: [gpfsug-discuss] Quotas and non-existant users Message-ID: Hi all, We have noticed that once users are deleted (gone entirely from the passwd backend) and all their files removed, their uid is still reported by GPFS? quota tools (albeit with zero files and space usage). Does anyone know if there is a way to clear out these spurious entries or is it a case that once a uid is in the quota database its there forever regardless of if that uid is still in use and has quota to record? Many Thanks, Laurence From ewahl at osc.edu Tue Nov 25 13:52:55 2014 From: ewahl at osc.edu (Wahl, Edward) Date: Tue, 25 Nov 2014 13:52:55 +0000 Subject: [gpfsug-discuss] Quotas and non-existant users In-Reply-To: References: Message-ID: <1416923575.2343.18.camel@localhost.localdomain> Do you still have policies or filesets associated with these users? Ed Wahl OSC On Tue, 2014-11-25 at 10:45 +0000, Laurence Alexander Hurst wrote: > Hi all, > > We have noticed that once users are deleted (gone entirely from the passwd backend) and all their files removed, their uid is still reported by GPFS? quota tools (albeit with zero files and space usage). > > Does anyone know if there is a way to clear out these spurious entries or is it a case that once a uid is in the quota database its there forever regardless of if that uid is still in use and has quota to record? > > Many Thanks, > > Laurence > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at gpfsug.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss From jonathan at buzzard.me.uk Tue Nov 25 14:00:29 2014 From: jonathan at buzzard.me.uk (Jonathan Buzzard) Date: Tue, 25 Nov 2014 14:00:29 +0000 Subject: [gpfsug-discuss] Quotas and non-existant users In-Reply-To: References: Message-ID: <1416924029.4171.69.camel@buzzard.phy.strath.ac.uk> On Tue, 2014-11-25 at 10:45 +0000, Laurence Alexander Hurst wrote: > Hi all, > > We have noticed that once users are deleted (gone entirely from the > passwd backend) and all their files removed, their uid is still > reported by GPFS? quota tools (albeit with zero files and space usage). > There is something somewhere that references them, because they do disappear. I know because I cleared out a GPFS file system that had files and directories used by "depreciated" user and group names, and the check I was using to make sure I had got everything belonging to a particular user or group was mmrepquota. JAB. -- Jonathan A. Buzzard Email: jonathan (at) buzzard.me.uk Fife, United Kingdom. From stijn.deweirdt at ugent.be Tue Nov 25 16:25:58 2014 From: stijn.deweirdt at ugent.be (Stijn De Weirdt) Date: Tue, 25 Nov 2014 17:25:58 +0100 Subject: [gpfsug-discuss] gpfs.gnr updates Message-ID: <5474AD96.3050006@ugent.be> hi all, does anyone know where we can find the release notes and update rpms for gpfs.gnr? we logged a case with ibm a while ago, and we assumed that the fix for the issue was part of the regular gpfs updates (we assumed as much from the conversation with ibm tech support). many thanks, stijn From L.A.Hurst at bham.ac.uk Wed Nov 26 10:14:26 2014 From: L.A.Hurst at bham.ac.uk (Laurence Alexander Hurst) Date: Wed, 26 Nov 2014 10:14:26 +0000 Subject: [gpfsug-discuss] Quotas and non-existant users In-Reply-To: <1416924029.4171.69.camel@buzzard.phy.strath.ac.uk> References: <1416924029.4171.69.camel@buzzard.phy.strath.ac.uk> Message-ID: Hmm, mmrepquota is reporting no files owned by any of the users in question. I?ll see if `find` disagrees. They have the default fileset user quotas applied, so they?re not users we?ve edited to grant quota extensions to. We have had a problem (which IBM have acknowledged, iirc) whereby it is not possible to reset a user?s quota back to the default if it has been modified, perhaps this is related? I?ll see if `find` turns anything up or I?ll raise a ticket with IBM and see what they think. I?ve pulled out a single example, but all 75 users I have are the same. mmrepquota gpfs | grep 8695 8695 nbu USR 0 0 5368709120 0 none | 0 0 0 0 none 8695 bb USR 0 0 1073741824 0 none | 0 0 0 0 none Thanks for your input. Laurence On 25/11/2014 14:00, "Jonathan Buzzard" wrote: >On Tue, 2014-11-25 at 10:45 +0000, Laurence Alexander Hurst wrote: >> Hi all, >> >> We have noticed that once users are deleted (gone entirely from the >> passwd backend) and all their files removed, their uid is still >> reported by GPFS? quota tools (albeit with zero files and space usage). >> > >There is something somewhere that references them, because they do >disappear. I know because I cleared out a GPFS file system that had >files and directories used by "depreciated" user and group names, and >the check I was using to make sure I had got everything belonging to a >particular user or group was mmrepquota. > >JAB. > >-- >Jonathan A. Buzzard Email: jonathan (at) buzzard.me.uk >Fife, United Kingdom. > > >_______________________________________________ >gpfsug-discuss mailing list >gpfsug-discuss at gpfsug.org >http://gpfsug.org/mailman/listinfo/gpfsug-discuss From chair at gpfsug.org Thu Nov 27 09:21:30 2014 From: chair at gpfsug.org (Jez Tucker (Chair)) Date: Thu, 27 Nov 2014 09:21:30 +0000 Subject: [gpfsug-discuss] gpfs.gnr updates In-Reply-To: <5474AD96.3050006@ugent.be> References: <5474AD96.3050006@ugent.be> Message-ID: <5476ED1A.8050504@gpfsug.org> Hi Stijn, As far as I am aware, GNR updates are not publicly available for download. You should approach your reseller / IBM Business partner who should be able to supply you with the updates. IBMers, please feel free to correct this statement if in error. Jez On 25/11/14 16:25, Stijn De Weirdt wrote: > hi all, > > does anyone know where we can find the release notes and update rpms > for gpfs.gnr? > we logged a case with ibm a while ago, and we assumed that the fix for > the issue was part of the regular gpfs updates (we assumed as much > from the conversation with ibm tech support). > > many thanks, > > stijn > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at gpfsug.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss From jonathan at buzzard.me.uk Thu Nov 27 09:47:59 2014 From: jonathan at buzzard.me.uk (Jonathan Buzzard) Date: Thu, 27 Nov 2014 09:47:59 +0000 Subject: [gpfsug-discuss] Quotas and non-existant users In-Reply-To: References: <1416924029.4171.69.camel@buzzard.phy.strath.ac.uk> Message-ID: <1417081679.4171.99.camel@buzzard.phy.strath.ac.uk> On Wed, 2014-11-26 at 10:14 +0000, Laurence Alexander Hurst wrote: > Hmm, mmrepquota is reporting no files owned by any of the users in > question. I?ll see if `find` disagrees. > They have the default fileset > user quotas applied, so they?re not users we?ve edited to grant quota > extensions to. We have had a problem (which IBM have acknowledged, iirc) > whereby it is not possible to reset a user?s quota back to the default if > it has been modified, perhaps this is related? I?ll see if `find` turns > anything up or I?ll raise a ticket with IBM and see what they think. > > I?ve pulled out a single example, but all 75 users I have are the same. > > mmrepquota gpfs | grep 8695 > 8695 nbu USR 0 0 5368709120 0 > none | 0 0 0 0 none > 8695 bb USR 0 0 1073741824 0 > none | 0 0 0 0 none > While the number of files and usage is zero look at those "in doubt" numbers. Until these also fall to zero then the users are not going to disappear from the quota reporting would be my guess. Quite why the "in doubt" numbers are still so large is another question. I have vague recollections of this happening to me when I deleted large amounts of data belonging to a user down to zero when I was clearing the file system up I mentioned before. Though to be honest most of my clearing up was identifying who the files really belonged to (there had in the distance past been a change of usernames; gone from local usernames to using the university wide ones and not everyone had claimed their files. All related to a move to using Active Directory) and doing chown's on the data. I think what happens is when the file number goes to zero the quota system stops updating for that user and if there is anything "in doubt" it never gets updated and sticks around forever. Might be worth creating a couple of files for the user in the appropriate filesets and then give it a bit of time and see if the output of mmrepquota matches what you believe is the real case. If this works and the "in doubt" number goes to zero I would at this point do a chown to a different user that is not going away and then delete the files. Something else to consider is that they might be in an ACL somewhere which is confusing the quota system. JAB. -- Jonathan A. Buzzard Email: jonathan (at) buzzard.me.uk Fife, United Kingdom. From peserocka at gmail.com Thu Nov 27 10:01:55 2014 From: peserocka at gmail.com (P Serocka) Date: Thu, 27 Nov 2014 18:01:55 +0800 Subject: [gpfsug-discuss] Quotas and non-existant users In-Reply-To: <1417081679.4171.99.camel@buzzard.phy.strath.ac.uk> References: <1416924029.4171.69.camel@buzzard.phy.strath.ac.uk> <1417081679.4171.99.camel@buzzard.phy.strath.ac.uk> Message-ID: Any chance to run mmcheckquota? which should remove all "doubt"... On 2014 Nov 27. md, at 17:47 st, Jonathan Buzzard wrote: > On Wed, 2014-11-26 at 10:14 +0000, Laurence Alexander Hurst wrote: >> Hmm, mmrepquota is reporting no files owned by any of the users in >> question. I?ll see if `find` disagrees. >> They have the default fileset >> user quotas applied, so they?re not users we?ve edited to grant quota >> extensions to. We have had a problem (which IBM have acknowledged, iirc) >> whereby it is not possible to reset a user?s quota back to the default if >> it has been modified, perhaps this is related? I?ll see if `find` turns >> anything up or I?ll raise a ticket with IBM and see what they think. >> >> I?ve pulled out a single example, but all 75 users I have are the same. >> >> mmrepquota gpfs | grep 8695 >> 8695 nbu USR 0 0 5368709120 0 >> none | 0 0 0 0 none >> 8695 bb USR 0 0 1073741824 0 >> none | 0 0 0 0 none >> > > While the number of files and usage is zero look at those "in doubt" > numbers. Until these also fall to zero then the users are not going to > disappear from the quota reporting would be my guess. Quite why the "in > doubt" numbers are still so large is another question. I have vague > recollections of this happening to me when I deleted large amounts of > data belonging to a user down to zero when I was clearing the file > system up I mentioned before. Though to be honest most of my clearing up > was identifying who the files really belonged to (there had in the > distance past been a change of usernames; gone from local usernames to > using the university wide ones and not everyone had claimed their files. > All related to a move to using Active Directory) and doing chown's on > the data. > > I think what happens is when the file number goes to zero the quota > system stops updating for that user and if there is anything "in doubt" > it never gets updated and sticks around forever. > > Might be worth creating a couple of files for the user in the > appropriate filesets and then give it a bit of time and see if the > output of mmrepquota matches what you believe is the real case. If this > works and the "in doubt" number goes to zero I would at this point do a > chown to a different user that is not going away and then delete the > files. > > Something else to consider is that they might be in an ACL somewhere > which is confusing the quota system. > > > JAB. > > -- > Jonathan A. Buzzard Email: jonathan (at) buzzard.me.uk > Fife, United Kingdom. > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at gpfsug.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss From jonathan at buzzard.me.uk Thu Nov 27 10:02:03 2014 From: jonathan at buzzard.me.uk (Jonathan Buzzard) Date: Thu, 27 Nov 2014 10:02:03 +0000 Subject: [gpfsug-discuss] Quotas and non-existant users In-Reply-To: <1417081679.4171.99.camel@buzzard.phy.strath.ac.uk> References: <1416924029.4171.69.camel@buzzard.phy.strath.ac.uk> <1417081679.4171.99.camel@buzzard.phy.strath.ac.uk> Message-ID: <1417082523.4171.104.camel@buzzard.phy.strath.ac.uk> On Thu, 2014-11-27 at 09:47 +0000, Jonathan Buzzard wrote: > On Wed, 2014-11-26 at 10:14 +0000, Laurence Alexander Hurst wrote: > > Hmm, mmrepquota is reporting no files owned by any of the users in > > question. I?ll see if `find` disagrees. > > They have the default fileset > > user quotas applied, so they?re not users we?ve edited to grant quota > > extensions to. We have had a problem (which IBM have acknowledged, iirc) > > whereby it is not possible to reset a user?s quota back to the default if > > it has been modified, perhaps this is related? I?ll see if `find` turns > > anything up or I?ll raise a ticket with IBM and see what they think. > > > > I?ve pulled out a single example, but all 75 users I have are the same. > > > > mmrepquota gpfs | grep 8695 > > 8695 nbu USR 0 0 5368709120 0 > > none | 0 0 0 0 none > > 8695 bb USR 0 0 1073741824 0 > > none | 0 0 0 0 none > > > > While the number of files and usage is zero look at those "in doubt" > numbers. Ignore that those are quota numbers. Hard when the column headings are missing. Anyway a "Homer Simpson" momentum coming up... Simple answer really remove the quotas for those users in those file sets (I am presuming they are per fileset user hard limits). They are sticking around in mmrepquota because they have a hard limit set. D'oh! JAB. -- Jonathan A. Buzzard Email: jonathan (at) buzzard.me.uk Fife, United Kingdom. From peserocka at gmail.com Thu Nov 27 10:06:31 2014 From: peserocka at gmail.com (P Serocka) Date: Thu, 27 Nov 2014 18:06:31 +0800 Subject: [gpfsug-discuss] Quotas and non-existant users In-Reply-To: <1417082523.4171.104.camel@buzzard.phy.strath.ac.uk> References: <1416924029.4171.69.camel@buzzard.phy.strath.ac.uk> <1417081679.4171.99.camel@buzzard.phy.strath.ac.uk> <1417082523.4171.104.camel@buzzard.phy.strath.ac.uk> Message-ID: <44A03A01-4010-4210-8892-2AE37451EEFA@gmail.com> ;-) Ignore my other message on mmcheckquota then. On 2014 Nov 27. md, at 18:02 st, Jonathan Buzzard wrote: > On Thu, 2014-11-27 at 09:47 +0000, Jonathan Buzzard wrote: >> On Wed, 2014-11-26 at 10:14 +0000, Laurence Alexander Hurst wrote: >>> Hmm, mmrepquota is reporting no files owned by any of the users in >>> question. I?ll see if `find` disagrees. >>> They have the default fileset >>> user quotas applied, so they?re not users we?ve edited to grant quota >>> extensions to. We have had a problem (which IBM have acknowledged, iirc) >>> whereby it is not possible to reset a user?s quota back to the default if >>> it has been modified, perhaps this is related? I?ll see if `find` turns >>> anything up or I?ll raise a ticket with IBM and see what they think. >>> >>> I?ve pulled out a single example, but all 75 users I have are the same. >>> >>> mmrepquota gpfs | grep 8695 >>> 8695 nbu USR 0 0 5368709120 0 >>> none | 0 0 0 0 none >>> 8695 bb USR 0 0 1073741824 0 >>> none | 0 0 0 0 none >>> >> >> While the number of files and usage is zero look at those "in doubt" >> numbers. > > Ignore that those are quota numbers. Hard when the column headings are > missing. > > Anyway a "Homer Simpson" momentum coming up... > > Simple answer really remove the quotas for those users in those file > sets (I am presuming they are per fileset user hard limits). They are > sticking around in mmrepquota because they have a hard limit set. D'oh! > > JAB. > > -- > Jonathan A. Buzzard Email: jonathan (at) buzzard.me.uk > Fife, United Kingdom. > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at gpfsug.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss From sdinardo at ebi.ac.uk Wed Nov 5 10:15:59 2014 From: sdinardo at ebi.ac.uk (Salvatore Di Nardo) Date: Wed, 05 Nov 2014 10:15:59 +0000 Subject: [gpfsug-discuss] maybe a silly question about "old school" gpfs Message-ID: <5459F8DF.2090806@ebi.ac.uk> Hello again, to understand better GPFS, recently i build up an test gpfs cluster using some old hardware that was going to be retired. THe storage was SAN devices, so instead to use native raids I went for the old school gpfs. the configuration is basically: 3x servers 3x san storages 2x san switches I did no zoning, so all the servers can see all the LUNs, but on nsd creation I gave each LUN a primary, secondary and third server. with the following rule: STORAGE primary secondary tertiary storage1 server1 server2 server3 storage2 server2 server3 server1 storage3 server3 server1 server2 looking at the mmcrnsd, it was my understanding that the primary server is the one that wrote on the NSD unless it fails, then the following server take the ownership of the lun. Now come the question: when i did from server 1 a dd surprisingly i discovered that server1 was writing to all the luns. the other 2 server was doing nothing. this behaviour surprises me because on GSS only the RG owner can write, so one server "ask" the other server to write to his own RG's.In fact on GSS can be seen a lot of ETH traffic and io/s on each server. While i understand that the situation it's different I'm puzzled about the fact that all the servers seems able to write to all the luns. SAN deviced usually should be connected to one server only, as paralled access could create data corruption. In environments where you connect a SAN to multiple servers ( example VMWARE cloud) its softeware task to avoid data overwriting between server ( and data corruption ). Honestly, what i was expecting is: server1 writing on his own luns, and data traffic ( ethernet) to the other 2 server , basically asking *them* to write on the other luns. I dont know if this behaviour its normal or not. I triied to find a documentation about that, but could not find any. Could somebody tell me if this _/"every server write to all the luns"/_ its intended or not? Thanks in advance, Salvatore -------------- next part -------------- An HTML attachment was scrubbed... URL: From viccornell at gmail.com Wed Nov 5 10:22:38 2014 From: viccornell at gmail.com (Vic Cornell) Date: Wed, 5 Nov 2014 10:22:38 +0000 Subject: [gpfsug-discuss] maybe a silly question about "old school" gpfs In-Reply-To: <5459F8DF.2090806@ebi.ac.uk> References: <5459F8DF.2090806@ebi.ac.uk> Message-ID: <3F74C441-C25D-4F19-AD05-04AD897A08D3@gmail.com> Hi Salvatore, If you are doing the IO on the NSD server itself and it can see all of the NSDs it will use its "local? access to write to the LUNS. You need some GPFS clients to see the workload spread across all of the NSD servers. Vic > On 5 Nov 2014, at 10:15, Salvatore Di Nardo wrote: > > Hello again, > to understand better GPFS, recently i build up an test gpfs cluster using some old hardware that was going to be retired. THe storage was SAN devices, so instead to use native raids I went for the old school gpfs. the configuration is basically: > > 3x servers > 3x san storages > 2x san switches > > I did no zoning, so all the servers can see all the LUNs, but on nsd creation I gave each LUN a primary, secondary and third server. with the following rule: > > STORAGE > primary > secondary > tertiary > storage1 > server1 > server2 server3 > storage2 server2 server3 server1 > storage3 server3 server1 server2 > > looking at the mmcrnsd, it was my understanding that the primary server is the one that wrote on the NSD unless it fails, then the following server take the ownership of the lun. > > Now come the question: > when i did from server 1 a dd surprisingly i discovered that server1 was writing to all the luns. the other 2 server was doing nothing. this behaviour surprises me because on GSS only the RG owner can write, so one server "ask" the other server to write to his own RG's.In fact on GSS can be seen a lot of ETH traffic and io/s on each server. While i understand that the situation it's different I'm puzzled about the fact that all the servers seems able to write to all the luns. > > SAN deviced usually should be connected to one server only, as paralled access could create data corruption. In environments where you connect a SAN to multiple servers ( example VMWARE cloud) its softeware task to avoid data overwriting between server ( and data corruption ). > > Honestly, what i was expecting is: server1 writing on his own luns, and data traffic ( ethernet) to the other 2 server , basically asking them to write on the other luns. I dont know if this behaviour its normal or not. I triied to find a documentation about that, but could not find any. > > Could somebody tell me if this "every server write to all the luns" its intended or not? > > Thanks in advance, > Salvatore > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at gpfsug.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From stijn.deweirdt at ugent.be Wed Nov 5 10:25:07 2014 From: stijn.deweirdt at ugent.be (Stijn De Weirdt) Date: Wed, 05 Nov 2014 11:25:07 +0100 Subject: [gpfsug-discuss] maybe a silly question about "old school" gpfs In-Reply-To: <5459F8DF.2090806@ebi.ac.uk> References: <5459F8DF.2090806@ebi.ac.uk> Message-ID: <5459FB03.8080801@ugent.be> yes, this behaviour is normal, and a bit annoying sometimes, but GPFS doesn't really like (or isn't designed) to run stuff on the NSDs directly. the GSS probably send the data to the other NSD to distribute the (possible) compute cost from the raid, where there is none for regular LUN access. (but you also shouldn't be running on the GSS NSDs ;) stijn On 11/05/2014 11:15 AM, Salvatore Di Nardo wrote: > Hello again, > to understand better GPFS, recently i build up an test gpfs cluster > using some old hardware that was going to be retired. THe storage was > SAN devices, so instead to use native raids I went for the old school > gpfs. the configuration is basically: > > 3x servers > 3x san storages > 2x san switches > > I did no zoning, so all the servers can see all the LUNs, but on nsd > creation I gave each LUN a primary, secondary and third server. with the > following rule: > > STORAGE > primary > secondary > tertiary > storage1 > server1 > server2 server3 > storage2 server2 server3 server1 > storage3 server3 server1 server2 > > > > looking at the mmcrnsd, it was my understanding that the primary server > is the one that wrote on the NSD unless it fails, then the following > server take the ownership of the lun. > > Now come the question: > when i did from server 1 a dd surprisingly i discovered that server1 was > writing to all the luns. the other 2 server was doing nothing. this > behaviour surprises me because on GSS only the RG owner can write, so > one server "ask" the other server to write to his own RG's.In fact on > GSS can be seen a lot of ETH traffic and io/s on each server. While i > understand that the situation it's different I'm puzzled about the fact > that all the servers seems able to write to all the luns. > > SAN deviced usually should be connected to one server only, as paralled > access could create data corruption. In environments where you connect a > SAN to multiple servers ( example VMWARE cloud) its softeware task to > avoid data overwriting between server ( and data corruption ). > > Honestly, what i was expecting is: server1 writing on his own luns, and > data traffic ( ethernet) to the other 2 server , basically asking *them* > to write on the other luns. I dont know if this behaviour its normal or > not. I triied to find a documentation about that, but could not find any. > > Could somebody tell me if this _/"every server write to all the luns"/_ > its intended or not? > > Thanks in advance, > Salvatore > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at gpfsug.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > From kgunda at in.ibm.com Wed Nov 5 10:25:07 2014 From: kgunda at in.ibm.com (Kalyan Gunda) Date: Wed, 5 Nov 2014 15:55:07 +0530 Subject: [gpfsug-discuss] maybe a silly question about "old school" gpfs In-Reply-To: <5459F8DF.2090806@ebi.ac.uk> References: <5459F8DF.2090806@ebi.ac.uk> Message-ID: In case of SAN connectivity, all nodes can write to disks. This avoids going over the network to get to disks. Only when local access isn't present either due to connectivity or zoning will it use the defined NSD server. If there is a need to have the node always use a NSD server, you can enforce it via mount option -o usensdserver=always If the first nsd server is down, it will use the next NSD server in the list. In general NSD servers are a priority list of servers rather than a primary/secondary config which is the case when using native raid. Also note that multiple nodes accessing the same disk will not cause corruption as higher level token mgmt in GPFS will take care of data consistency. Regards Kalyan C Gunda STSM, Elastic Storage Development Member of The IBM Academy of Technology EGL D Block, Bangalore From: Salvatore Di Nardo To: gpfsug main discussion list Date: 11/05/2014 03:44 PM Subject: [gpfsug-discuss] maybe a silly question about "old school" gpfs Sent by: gpfsug-discuss-bounces at gpfsug.org Hello again, to understand better GPFS, recently i build up an test gpfs cluster using some old hardware that was going to be retired. THe storage was SAN devices, so instead to use native raids I went for the old school gpfs. the configuration is basically: 3x servers 3x san storages 2x san switches I did no zoning, so all the servers can see all the LUNs, but on nsd creation I gave each LUN a primary, secondary and third server. with the following rule: |-------------------+---------------+--------------------+---------------| |STORAGE |primary |secondary |tertiary | |-------------------+---------------+--------------------+---------------| |storage1 |server1 |server2 |server3 | |-------------------+---------------+--------------------+---------------| |storage2 |server2 |server3 |server1 | |-------------------+---------------+--------------------+---------------| |storage3 |server3 |server1 |server2 | |-------------------+---------------+--------------------+---------------| looking at the mmcrnsd, it was my understanding that the primary server is the one that wrote on the NSD unless it fails, then the following server take the ownership of the lun. Now come the question: when i did from server 1 a dd surprisingly i discovered that server1 was writing to all the luns. the other 2 server was doing nothing. this behaviour surprises me because on GSS only the RG owner can write, so one server "ask" the other server to write to his own RG's.In fact on GSS can be seen a lot of ETH traffic and io/s on each server. While i understand that the situation it's different I'm puzzled about the fact that all the servers seems able to write to all the luns. SAN deviced usually should be connected to one server only, as paralled access could create data corruption. In environments where you connect a SAN to multiple servers ( example VMWARE cloud) its softeware task to avoid data overwriting between server ( and data corruption ). Honestly, what? i was expecting is: server1 writing on his own luns, and data traffic ( ethernet) to the other 2 server , basically asking them to write on the other luns. I dont know if this behaviour its normal or not. I triied to find a documentation about that, but could not find any. Could somebody? tell me if this "every server write to all the luns" its intended or not? Thanks in advance, Salvatore_______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss From sdinardo at ebi.ac.uk Wed Nov 5 10:33:57 2014 From: sdinardo at ebi.ac.uk (Salvatore Di Nardo) Date: Wed, 05 Nov 2014 10:33:57 +0000 Subject: [gpfsug-discuss] maybe a silly question about "old school" gpfs In-Reply-To: <3F74C441-C25D-4F19-AD05-04AD897A08D3@gmail.com> References: <5459F8DF.2090806@ebi.ac.uk> <3F74C441-C25D-4F19-AD05-04AD897A08D3@gmail.com> Message-ID: <5459FD15.3070105@ebi.ac.uk> I understand that my test its a bit particular because the client was also one of the servers. Usually clients don't have direct access to the storages, but still it made think, hot the things are supposed to work. For example i did another test with 3 dd's, one each server. All the servers was writing to all the luns. In other words a lun was accessed in parallel by 3 servers. Its that a problem, or gpfs manage properly the concurrency and avoid data corruption? I'm asking because i was not expecting a server to write to an NSD he doesn't own, even if its locally available. I thought that the general availablity was for failover, not for parallel access. Regards, Salvatore On 05/11/14 10:22, Vic Cornell wrote: > Hi Salvatore, > > If you are doing the IO on the NSD server itself and it can see all of > the NSDs it will use its "local? access to write to the LUNS. > > You need some GPFS clients to see the workload spread across all of > the NSD servers. > > Vic > > > >> On 5 Nov 2014, at 10:15, Salvatore Di Nardo > > wrote: >> >> Hello again, >> to understand better GPFS, recently i build up an test gpfs cluster >> using some old hardware that was going to be retired. THe storage was >> SAN devices, so instead to use native raids I went for the old school >> gpfs. the configuration is basically: >> >> 3x servers >> 3x san storages >> 2x san switches >> >> I did no zoning, so all the servers can see all the LUNs, but on nsd >> creation I gave each LUN a primary, secondary and third server. with >> the following rule: >> >> STORAGE >> primary >> secondary >> tertiary >> storage1 >> server1 >> server2 server3 >> storage2 server2 server3 server1 >> storage3 server3 server1 server2 >> >> >> >> looking at the mmcrnsd, it was my understanding that the primary >> server is the one that wrote on the NSD unless it fails, then the >> following server take the ownership of the lun. >> >> Now come the question: >> when i did from server 1 a dd surprisingly i discovered that server1 >> was writing to all the luns. the other 2 server was doing nothing. >> this behaviour surprises me because on GSS only the RG owner can >> write, so one server "ask" the other server to write to his own >> RG's.In fact on GSS can be seen a lot of ETH traffic and io/s on each >> server. While i understand that the situation it's different I'm >> puzzled about the fact that all the servers seems able to write to >> all the luns. >> >> SAN deviced usually should be connected to one server only, as >> paralled access could create data corruption. In environments where >> you connect a SAN to multiple servers ( example VMWARE cloud) its >> softeware task to avoid data overwriting between server ( and data >> corruption ). >> >> Honestly, what i was expecting is: server1 writing on his own luns, >> and data traffic ( ethernet) to the other 2 server , basically asking >> *them* to write on the other luns. I dont know if this behaviour its >> normal or not. I triied to find a documentation about that, but could >> not find any. >> >> Could somebody tell me if this _/"every server write to all the >> luns"/_ its intended or not? >> >> Thanks in advance, >> Salvatore >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at gpfsug.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at gpfsug.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From jonathan at buzzard.me.uk Wed Nov 5 10:38:48 2014 From: jonathan at buzzard.me.uk (Jonathan Buzzard) Date: Wed, 05 Nov 2014 10:38:48 +0000 Subject: [gpfsug-discuss] maybe a silly question about "old school" gpfs In-Reply-To: <5459F8DF.2090806@ebi.ac.uk> References: <5459F8DF.2090806@ebi.ac.uk> Message-ID: <1415183928.3474.4.camel@buzzard.phy.strath.ac.uk> On Wed, 2014-11-05 at 10:15 +0000, Salvatore Di Nardo wrote: [SNIP] > Now come the question: > when i did from server 1 a dd surprisingly i discovered that server1 > was writing to all the luns. the other 2 server was doing nothing. > this behaviour surprises me because on GSS only the RG owner can > write, so one server "ask" the other server to write to his own > RG's.In fact on GSS can be seen a lot of ETH traffic and io/s on each > server. While i understand that the situation it's different I'm > puzzled about the fact that all the servers seems able to write to all > the luns. The difference is that in GSS the NSD servers are in effect doing software RAID on the disks. Therefore they and they alone can write to the NSD. In the traditional setup the NSD is on a RAID device on SAN controller and multiple machines are able to access the block device at the same time with token management in GPFS preventing corruption. I guess from a technical perspective you could have the GSS software RAID distributed between the NSD servers, but that would be rather more complex software and it is no surprise IBM have gone down the easy route. JAB. -- Jonathan A. Buzzard Email: jonathan (at) buzzard.me.uk Fife, United Kingdom. From viccornell at gmail.com Wed Nov 5 10:42:22 2014 From: viccornell at gmail.com (Vic Cornell) Date: Wed, 5 Nov 2014 10:42:22 +0000 Subject: [gpfsug-discuss] maybe a silly question about "old school" gpfs In-Reply-To: <5459FD15.3070105@ebi.ac.uk> References: <5459F8DF.2090806@ebi.ac.uk> <3F74C441-C25D-4F19-AD05-04AD897A08D3@gmail.com> <5459FD15.3070105@ebi.ac.uk> Message-ID: <75801708-F65D-4B39-82CA-6DC4FB5AA6EB@gmail.com> > On 5 Nov 2014, at 10:33, Salvatore Di Nardo wrote: > > I understand that my test its a bit particular because the client was also one of the servers. > Usually clients don't have direct access to the storages, but still it made think, hot the things are supposed to work. > > For example i did another test with 3 dd's, one each server. All the servers was writing to all the luns. > In other words a lun was accessed in parallel by 3 servers. > > Its that a problem, or gpfs manage properly the concurrency and avoid data corruption? Its not a problem if you use locks. Remember the clients - even the ones running on the NSD servers are talking to the filesystem - not to the LUNS/NSDs directly. It is the NSD processes that talk to the NSDs. So loosely speaking it is as if all of the processes you are running were running on a single system with a local filesystem So yes - gpfs is designed to manage the problems created by having a distributed, shared filesystem, and does a pretty good job IMHO. > I'm asking because i was not expecting a server to write to an NSD he doesn't own, even if its locally available. > I thought that the general availablity was for failover, not for parallel access. Bear in mind that GPFS supports a number of access models, one of which is where all of the systems in the cluster have access to all of the disks. So parallel access is most commonly used for failover, but that is not the limit of its capabilities. Vic > > > Regards, > Salvatore > > > > On 05/11/14 10:22, Vic Cornell wrote: >> Hi Salvatore, >> >> If you are doing the IO on the NSD server itself and it can see all of the NSDs it will use its "local? access to write to the LUNS. >> >> You need some GPFS clients to see the workload spread across all of the NSD servers. >> >> Vic >> >> >> >>> On 5 Nov 2014, at 10:15, Salvatore Di Nardo > wrote: >>> >>> Hello again, >>> to understand better GPFS, recently i build up an test gpfs cluster using some old hardware that was going to be retired. THe storage was SAN devices, so instead to use native raids I went for the old school gpfs. the configuration is basically: >>> >>> 3x servers >>> 3x san storages >>> 2x san switches >>> >>> I did no zoning, so all the servers can see all the LUNs, but on nsd creation I gave each LUN a primary, secondary and third server. with the following rule: >>> >>> STORAGE >>> primary >>> secondary >>> tertiary >>> storage1 >>> server1 >>> server2 server3 >>> storage2 server2 server3 server1 >>> storage3 server3 server1 server2 >>> >>> looking at the mmcrnsd, it was my understanding that the primary server is the one that wrote on the NSD unless it fails, then the following server take the ownership of the lun. >>> >>> Now come the question: >>> when i did from server 1 a dd surprisingly i discovered that server1 was writing to all the luns. the other 2 server was doing nothing. this behaviour surprises me because on GSS only the RG owner can write, so one server "ask" the other server to write to his own RG's.In fact on GSS can be seen a lot of ETH traffic and io/s on each server. While i understand that the situation it's different I'm puzzled about the fact that all the servers seems able to write to all the luns. >>> >>> SAN deviced usually should be connected to one server only, as paralled access could create data corruption. In environments where you connect a SAN to multiple servers ( example VMWARE cloud) its softeware task to avoid data overwriting between server ( and data corruption ). >>> >>> Honestly, what i was expecting is: server1 writing on his own luns, and data traffic ( ethernet) to the other 2 server , basically asking them to write on the other luns. I dont know if this behaviour its normal or not. I triied to find a documentation about that, but could not find any. >>> >>> Could somebody tell me if this "every server write to all the luns" its intended or not? >>> >>> Thanks in advance, >>> Salvatore >>> _______________________________________________ >>> gpfsug-discuss mailing list >>> gpfsug-discuss at gpfsug.org >>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >> >> >> >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at gpfsug.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at gpfsug.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From sdinardo at ebi.ac.uk Wed Nov 5 10:46:52 2014 From: sdinardo at ebi.ac.uk (Salvatore Di Nardo) Date: Wed, 05 Nov 2014 10:46:52 +0000 Subject: [gpfsug-discuss] maybe a silly question about "old school" gpfs In-Reply-To: References: <5459F8DF.2090806@ebi.ac.uk> Message-ID: <545A001C.1040908@ebi.ac.uk> On 05/11/14 10:25, Kalyan Gunda wrote: > Also note that multiple nodes accessing the same disk will not cause > corruption as higher level token mgmt in GPFS will take care of data > consistency. This is exactly what I wanted to be sure. Thanks! From ewahl at osc.edu Wed Nov 5 13:56:38 2014 From: ewahl at osc.edu (Ed Wahl) Date: Wed, 5 Nov 2014 13:56:38 +0000 Subject: [gpfsug-discuss] maybe a silly question about "old school" gpfs In-Reply-To: <545A001C.1040908@ebi.ac.uk> References: <5459F8DF.2090806@ebi.ac.uk> , <545A001C.1040908@ebi.ac.uk> Message-ID: You can designate how many of the nodes do token management as well. mmlscluster should show which are "manager"s. Under some circumstances you may want to increase the defaults on heavily used file systems using mmchnode, especially with few NSDs and many writers. Ed Wahl OSC ________________________________________ From: gpfsug-discuss-bounces at gpfsug.org [gpfsug-discuss-bounces at gpfsug.org] on behalf of Salvatore Di Nardo [sdinardo at ebi.ac.uk] Sent: Wednesday, November 05, 2014 5:46 AM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] maybe a silly question about "old school" gpfs On 05/11/14 10:25, Kalyan Gunda wrote: > Also note that multiple nodes accessing the same disk will not cause > corruption as higher level token mgmt in GPFS will take care of data > consistency. This is exactly what I wanted to be sure. Thanks! _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss From pavel.pokorny at datera.cz Fri Nov 7 11:15:34 2014 From: pavel.pokorny at datera.cz (Pavel Pokorny) Date: Fri, 7 Nov 2014 12:15:34 +0100 Subject: [gpfsug-discuss] GPFS - pagepool data protection? Message-ID: Hello to all, I would like to ask question about pagepool and protection of data written through pagepool. Is there a possibility of loosing data written to GPFS in situation that data are stored in pagepool but still not written to disks? I think that for regular file system work this can be solved using GPFS journal. What about using GPFS as a NFS store for VMware datastores? Thank you for your answers, Pavel -- Ing. Pavel Pokorn? DATERA s.r.o. | Ovocn? trh 580/2 | Praha | Czech Republic www.datera.cz | Mobil: +420 602 357 194 | E-mail: pavel.pokorny at datera.cz -------------- next part -------------- An HTML attachment was scrubbed... URL: From lhorrocks-barlow at ocf.co.uk Wed Nov 5 10:47:06 2014 From: lhorrocks-barlow at ocf.co.uk (Laurence Horrocks- Barlow) Date: Wed, 5 Nov 2014 10:47:06 +0000 Subject: [gpfsug-discuss] maybe a silly question about "old school" gpfs In-Reply-To: <5459FD15.3070105@ebi.ac.uk> References: <5459F8DF.2090806@ebi.ac.uk> <3F74C441-C25D-4F19-AD05-04AD897A08D3@gmail.com> <5459FD15.3070105@ebi.ac.uk> Message-ID: <545A002A.4080301@ocf.co.uk> Hi Salvatore, GSS and GPFS systems are different beasts. In a traditional GPFS configuration I would expect any NSD server to write to any/all LUN's that it can see as a local disk providing it's part of the same FS. In GSS there is effectively a software RAID level added on top of the disks, with this I would expect only the RG owner to write down to the vdisk. As for corruption, GPFS uses a token system to manage access to LUN's, Metadata, etc. Kind Regards, Laurence Horrocks-Barlow Linux Systems Software Engineer OCF plc Tel: +44 (0)114 257 2200 Fax: +44 (0)114 257 0022 Web: www.ocf.co.uk Blog: blog.ocf.co.uk Twitter: @ocfplc OCF plc is a company registered in England and Wales. Registered number 4132533, VAT number GB 780 6803 14. Registered office address: OCF plc, 5 Rotunda Business Centre, Thorncliffe Park, Chapeltown, Sheffield, S35 2PG. This message is private and confidential. If you have received this message in error, please notify us and remove it from your system. On 11/05/2014 10:33 AM, Salvatore Di Nardo wrote: > I understand that my test its a bit particular because the client was > also one of the servers. > Usually clients don't have direct access to the storages, but still it > made think, hot the things are supposed to work. > > For example i did another test with 3 dd's, one each server. All the > servers was writing to all the luns. > In other words a lun was accessed in parallel by 3 servers. > > Its that a problem, or gpfs manage properly the concurrency and avoid > data corruption? > I'm asking because i was not expecting a server to write to an NSD he > doesn't own, even if its locally available. > I thought that the general availablity was for failover, not for > parallel access. > > > Regards, > Salvatore > > > > On 05/11/14 10:22, Vic Cornell wrote: >> Hi Salvatore, >> >> If you are doing the IO on the NSD server itself and it can see all >> of the NSDs it will use its "local? access to write to the LUNS. >> >> You need some GPFS clients to see the workload spread across all of >> the NSD servers. >> >> Vic >> >> >> >>> On 5 Nov 2014, at 10:15, Salvatore Di Nardo >> > wrote: >>> >>> Hello again, >>> to understand better GPFS, recently i build up an test gpfs cluster >>> using some old hardware that was going to be retired. THe storage >>> was SAN devices, so instead to use native raids I went for the old >>> school gpfs. the configuration is basically: >>> >>> 3x servers >>> 3x san storages >>> 2x san switches >>> >>> I did no zoning, so all the servers can see all the LUNs, but on nsd >>> creation I gave each LUN a primary, secondary and third server. with >>> the following rule: >>> >>> STORAGE >>> primary >>> secondary >>> tertiary >>> storage1 >>> server1 >>> server2 server3 >>> storage2 server2 server3 server1 >>> storage3 server3 server1 server2 >>> >>> >>> >>> looking at the mmcrnsd, it was my understanding that the primary >>> server is the one that wrote on the NSD unless it fails, then the >>> following server take the ownership of the lun. >>> >>> Now come the question: >>> when i did from server 1 a dd surprisingly i discovered that server1 >>> was writing to all the luns. the other 2 server was doing nothing. >>> this behaviour surprises me because on GSS only the RG owner can >>> write, so one server "ask" the other server to write to his own >>> RG's.In fact on GSS can be seen a lot of ETH traffic and io/s on >>> each server. While i understand that the situation it's different >>> I'm puzzled about the fact that all the servers seems able to write >>> to all the luns. >>> >>> SAN deviced usually should be connected to one server only, as >>> paralled access could create data corruption. In environments where >>> you connect a SAN to multiple servers ( example VMWARE cloud) its >>> softeware task to avoid data overwriting between server ( and data >>> corruption ). >>> >>> Honestly, what i was expecting is: server1 writing on his own luns, >>> and data traffic ( ethernet) to the other 2 server , basically >>> asking *them* to write on the other luns. I dont know if this >>> behaviour its normal or not. I triied to find a documentation about >>> that, but could not find any. >>> >>> Could somebody tell me if this _/"every server write to all the >>> luns"/_ its intended or not? >>> >>> Thanks in advance, >>> Salvatore >>> _______________________________________________ >>> gpfsug-discuss mailing list >>> gpfsug-discuss at gpfsug.org >>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >> >> >> >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at gpfsug.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: From dhildeb at us.ibm.com Fri Nov 7 22:42:06 2014 From: dhildeb at us.ibm.com (Dean Hildebrand) Date: Fri, 7 Nov 2014 23:42:06 +0100 Subject: [gpfsug-discuss] GPFS - pagepool data protection? In-Reply-To: References: Message-ID: Hi Paul, GPFS correctly implements POSIX semantics and NFS close-to-open semantics. Its a little complicated, but effectively what this means is that when the application issues certain calls to ensure data/metadata is "stable" (e.g., fsync), then it is guaranteed to be stable. It also controls ordering between nodes among many other things. As part of making sure data is stable, the GPFS recovery journal is used in a variety of instances. With VMWare ESX using NFS to GPFS, then the same thing occurs, except the situation is even more simple since every write request will have the 'stable' flag set, ensuring it does writethrough to the storage system. Dean Hildebrand IBM Almaden Research Center From: Pavel Pokorny To: gpfsug-discuss at gpfsug.org Date: 11/07/2014 03:15 AM Subject: [gpfsug-discuss] GPFS - pagepool data protection? Sent by: gpfsug-discuss-bounces at gpfsug.org Hello to all, I would like to ask question about pagepool and protection of data written through pagepool. Is there a possibility of loosing data written to GPFS in situation that data are stored in pagepool but still not written to disks? I think that for regular file system work this can be solved using GPFS journal. What about using GPFS as a NFS store for VMware datastores? Thank you for your answers, Pavel -- Ing. Pavel Pokorn? DATERA s.r.o.?|?Ovocn? trh 580/2?|?Praha?|?Czech Republic www.datera.cz?|?Mobil:?+420 602 357 194?|?E-mail:?pavel.pokorny at datera.cz _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: From jamiedavis at us.ibm.com Sat Nov 8 23:13:17 2014 From: jamiedavis at us.ibm.com (James Davis) Date: Sat, 8 Nov 2014 18:13:17 -0500 Subject: [gpfsug-discuss] Hi everybody Message-ID: Hey all, My name is Jamie Davis and I work for IBM on the GPFS test team. I'm interested in learning more about how customers use GPFS and what typical questions and issues are like, and I thought joining this mailing list would be a good start. If my presence seems inappropriate or makes anyone uncomfortable I can leave the list. --- I don't know how often GPFS users look in /usr/lpp/mmfs/samples...but while I'm sending a mass email, I thought I'd take a moment to point anyone running GPFS 4.1.0.4 to /usr/lpp/mmfs/samples/ilm/mmfind. mmfind is basically a find-esque wrapper around mmapplypolicy that I wrote in response to complaints I've heard about the learning curve associated with writing policies for mmapplypolicy. Since it's in samples, use-at-your-own-risk and I make no promise that everything works correctly. The -skipPolicy and -saveTmpFiles flags will do everything but actually run mmapplypolicy -- I suggest you double-check its work before you run it on a production system. Please send me any comments on it if you give it a try! Jamie Davis GPFS Test IBM -------------- next part -------------- An HTML attachment was scrubbed... URL: From chair at gpfsug.org Mon Nov 10 16:18:24 2014 From: chair at gpfsug.org (Jez Tucker) Date: Mon, 10 Nov 2014 16:18:24 +0000 Subject: [gpfsug-discuss] SC 14 and storagebeers events this week Message-ID: <5460E550.8020705@gpfsug.org> Hi all Just a quick reminder that the IBM GPFS User Group is at SC '14 in New Orleans Nov 17th. Also, there's a social in London W1 - #storagebeers on Nov 13th. For more info on both of these, please see the main website: www.gpfsug.org Best, Jez -------------- next part -------------- An HTML attachment was scrubbed... URL: From chair at gpfsug.org Tue Nov 11 13:59:38 2014 From: chair at gpfsug.org (Jez Tucker (Chair)) Date: Tue, 11 Nov 2014 13:59:38 +0000 Subject: [gpfsug-discuss] storagebeers postponed Message-ID: <5462164A.70607@gpfsug.org> Hi all I've just received notification that #storagebeers, due to happen 13th Nov, has unfortunately had to be postponed. I'll update you all with a new date when I receive it. Very best, Jez From jez at rib-it.org Tue Nov 11 16:49:48 2014 From: jez at rib-it.org (Jez Tucker) Date: Tue, 11 Nov 2014 16:49:48 +0000 Subject: [gpfsug-discuss] Hi everybody In-Reply-To: References: Message-ID: <54623E2C.2070903@rib-it.org> Hi Jamie, You're indeed very welcome. A few of the IBM devs are list members and their presence is appreciated. I suggest if you want to know more regarding use cases etc., ask some pointed questions. Discussion is good. Jez On 08/11/14 23:13, James Davis wrote: > > Hey all, > > My name is Jamie Davis and I work for IBM on the GPFS test team. I'm > interested in learning more about how customers use GPFS and what > typical questions and issues are like, and I thought joining this > mailing list would be a good start. If my presence seems inappropriate > or makes anyone uncomfortable I can leave the list. > > --- > > I don't know how often GPFS users look in /usr/lpp/mmfs/samples...but > while I'm sending a mass email, I thought I'd take a moment to point > anyone running GPFS 4.1.0.4 to /usr/lpp/mmfs/samples/ilm/mmfind. > mmfind is basically a find-esque wrapper around mmapplypolicy that I > wrote in response to complaints I've heard about the learning curve > associated with writing policies for mmapplypolicy. Since it's in > samples, use-at-your-own-risk and I make no promise that everything > works correctly. The -skipPolicy and -saveTmpFiles flags will do > everything but actually run mmapplypolicy -- I suggest you > double-check its work before you run it on a production system. > > Please send me any comments on it if you give it a try! > > Jamie Davis > GPFS Test > IBM > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at gpfsug.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From pavel.pokorny at datera.cz Wed Nov 12 12:20:57 2014 From: pavel.pokorny at datera.cz (Pavel Pokorny) Date: Wed, 12 Nov 2014 13:20:57 +0100 Subject: [gpfsug-discuss] GPFS - pagepool data protection? Message-ID: Hi, thanks. A I understand the write process to GPFS filesystem: 1. Application on a node makes write call 2. Token Manager stuff is done to coordinate the required-byte-range 3. mmfsd gets metadata from the file?s metanode 4. mmfsd acquires a buffer from the page pool 5. Data is moved from application data buffer to page pool buffer 6. VSD layer copies data from the page pool to the send pool and so on. What I am looking at and want to clarify is step 5. Situation when data is moved to page pool. What happen if the server will crash at tjis point? Will GPFS use journal to get to stable state? Thank you, Pavel -- Ing. Pavel Pokorn? DATERA s.r.o. | Ovocn? trh 580/2 | Praha | Czech Republic www.datera.cz | Mobil: +420 602 357 194 | E-mail: pavel.pokorny at datera.cz On Sat, Nov 8, 2014 at 1:00 PM, wrote: > Send gpfsug-discuss mailing list submissions to > gpfsug-discuss at gpfsug.org > > To subscribe or unsubscribe via the World Wide Web, visit > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > or, via email, send a message with subject or body 'help' to > gpfsug-discuss-request at gpfsug.org > > You can reach the person managing the list at > gpfsug-discuss-owner at gpfsug.org > > When replying, please edit your Subject line so it is more specific > than "Re: Contents of gpfsug-discuss digest..." > > > Today's Topics: > > 1. Re: GPFS - pagepool data protection? (Dean Hildebrand) > > > ---------------------------------------------------------------------- > > Message: 1 > Date: Fri, 7 Nov 2014 23:42:06 +0100 > From: Dean Hildebrand > To: gpfsug main discussion list > Subject: Re: [gpfsug-discuss] GPFS - pagepool data protection? > Message-ID: > < > OF1ED92A57.DD700837-ONC1257D89.007C4EF1-88257D89.007CB453 at us.ibm.com> > Content-Type: text/plain; charset="iso-8859-1" > > > Hi Paul, > > GPFS correctly implements POSIX semantics and NFS close-to-open semantics. > Its a little complicated, but effectively what this means is that when the > application issues certain calls to ensure data/metadata is "stable" (e.g., > fsync), then it is guaranteed to be stable. It also controls ordering > between nodes among many other things. As part of making sure data is > stable, the GPFS recovery journal is used in a variety of instances. > > With VMWare ESX using NFS to GPFS, then the same thing occurs, except the > situation is even more simple since every write request will have the > 'stable' flag set, ensuring it does writethrough to the storage system. > > Dean Hildebrand > IBM Almaden Research Center > > > > > From: Pavel Pokorny > To: gpfsug-discuss at gpfsug.org > Date: 11/07/2014 03:15 AM > Subject: [gpfsug-discuss] GPFS - pagepool data protection? > Sent by: gpfsug-discuss-bounces at gpfsug.org > > > > Hello to all, > I would like to ask question about pagepool and protection of data written > through pagepool. > Is there a possibility of loosing data written to GPFS in situation that > data are stored in pagepool but still not written to disks? > I think that for regular file system work this can be solved using GPFS > journal. What about using GPFS as a NFS store for VMware datastores? > Thank you for your answers, > Pavel > -- > Ing. Pavel Pokorn? > DATERA s.r.o.?|?Ovocn? trh 580/2?|?Praha?|?Czech Republic > www.datera.cz?|?Mobil:?+420 602 357 194?|?E-mail:?pavel.pokorny at datera.cz > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at gpfsug.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -------------- next part -------------- > An HTML attachment was scrubbed... > URL: < > http://gpfsug.org/pipermail/gpfsug-discuss/attachments/20141107/ecec5a47/attachment-0001.html > > > -------------- next part -------------- > A non-text attachment was scrubbed... > Name: graycol.gif > Type: image/gif > Size: 105 bytes > Desc: not available > URL: < > http://gpfsug.org/pipermail/gpfsug-discuss/attachments/20141107/ecec5a47/attachment-0001.gif > > > > ------------------------------ > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at gpfsug.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > End of gpfsug-discuss Digest, Vol 34, Issue 7 > ********************************************* > -------------- next part -------------- An HTML attachment was scrubbed... URL: From kraemerf at de.ibm.com Wed Nov 12 14:05:03 2014 From: kraemerf at de.ibm.com (Frank Kraemer) Date: Wed, 12 Nov 2014 15:05:03 +0100 Subject: [gpfsug-discuss] IBM Software Defined Infrastructure Roadshow 2014 (Frankfurt, London & Paris) Message-ID: FYI: IBM Software Defined Infrastructure Roadshow 2014 (Frankfurt 02. Dec 2014, London 03. Dec 2014 & Paris 04. Dec 2014) https://www-950.ibm.com/events/wwe/grp/grp019.nsf/v17_events?openform&lp=platform_computing_roadshow&locale=en_GB P.S. The German GPFS technical team will be available for discussions in Frankfurt. Feel free to contact me. -frank- Frank Kraemer IBM Consulting IT Specialist / Client Technical Architect Hechtsheimer Str. 2, 55131 Mainz mailto:kraemerf at de.ibm.com voice: +49171-3043699 IBM Germany From dhildeb at us.ibm.com Sat Nov 15 20:31:53 2014 From: dhildeb at us.ibm.com (Dean Hildebrand) Date: Sat, 15 Nov 2014 12:31:53 -0800 Subject: [gpfsug-discuss] GPFS - pagepool data protection? In-Reply-To: References: Message-ID: Hi Pavel, You are more or less right in your description, but the key that I tried to convey in my first email is that GPFS only obey's POSIX. So your question can be answered by looking at how your application performs the write and does your application ask to make the data live only in the pagepool or on stable storage. By default posix says that file create and writes are unstable, so just doing a write puts it in the pagepool and will be lost if a crash occurs immediately after. To make it stable, the application must do something in posix to make it stable, of which there are many ways to do so, including but not limited to O_SYNC, DIO, some form of fsync post write, etc, etc... Dean Hildebrand IBM Almaden Research Center From: Pavel Pokorny To: gpfsug-discuss at gpfsug.org Date: 11/12/2014 04:21 AM Subject: Re: [gpfsug-discuss] GPFS - pagepool data protection? Sent by: gpfsug-discuss-bounces at gpfsug.org Hi, thanks. A I understand the write process to GPFS filesystem: 1.?Application on a node makes write call 2.?Token Manager stuff is done to coordinate the required-byte-range 3.?mmfsd gets metadata from the file?s metanode 4.?mmfsd acquires a buffer from the page pool 5.?Data is moved from application data buffer to page pool buffer 6. VSD layer copies data from the page pool to the send pool ?and so on. What I am looking at and want to clarify is step 5. Situation when data is moved to page pool. What happen if the server will crash at tjis point? Will GPFS use journal to get to stable state? Thank you, Pavel -- Ing. Pavel Pokorn? DATERA s.r.o.?|?Ovocn? trh 580/2?|?Praha?|?Czech Republic www.datera.cz?|?Mobil:?+420 602 357 194?|?E-mail:?pavel.pokorny at datera.cz On Sat, Nov 8, 2014 at 1:00 PM, wrote: Send gpfsug-discuss mailing list submissions to ? ? ? ? gpfsug-discuss at gpfsug.org To subscribe or unsubscribe via the World Wide Web, visit ? ? ? ? http://gpfsug.org/mailman/listinfo/gpfsug-discuss or, via email, send a message with subject or body 'help' to ? ? ? ? gpfsug-discuss-request at gpfsug.org You can reach the person managing the list at ? ? ? ? gpfsug-discuss-owner at gpfsug.org When replying, please edit your Subject line so it is more specific than "Re: Contents of gpfsug-discuss digest..." Today's Topics: ? ?1. Re: GPFS - pagepool data protection? (Dean Hildebrand) ---------------------------------------------------------------------- Message: 1 Date: Fri, 7 Nov 2014 23:42:06 +0100 From: Dean Hildebrand To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] GPFS - pagepool data protection? Message-ID: ? ? ? ? < OF1ED92A57.DD700837-ONC1257D89.007C4EF1-88257D89.007CB453 at us.ibm.com> Content-Type: text/plain; charset="iso-8859-1" Hi Paul, GPFS correctly implements POSIX semantics and NFS close-to-open semantics. Its a little complicated, but effectively what this means is that when the application issues certain calls to ensure data/metadata is "stable" (e.g., fsync), then it is guaranteed to be stable.? It also controls ordering between nodes among many other things.? As part of making sure data is stable, the GPFS recovery journal is used in a variety of instances. With VMWare ESX using NFS to GPFS, then the same thing occurs, except the situation is even more simple since every write request will have the 'stable' flag set, ensuring it does writethrough to the storage system. Dean Hildebrand IBM Almaden Research Center From:? ?Pavel Pokorny To:? ? ?gpfsug-discuss at gpfsug.org Date:? ?11/07/2014 03:15 AM Subject:? ? ? ? [gpfsug-discuss] GPFS - pagepool data protection? Sent by:? ? ? ? gpfsug-discuss-bounces at gpfsug.org Hello to all, I would like to ask question about pagepool and protection of data written through pagepool. Is there a possibility of loosing data written to GPFS in situation that data are stored in pagepool but still not written to disks? I think that for regular file system work this can be solved using GPFS journal. What about using GPFS as a NFS store for VMware datastores? Thank you for your answers, Pavel -- Ing. Pavel Pokorn? DATERA s.r.o.?|?Ovocn? trh 580/2?|?Praha?|?Czech Republic www.datera.cz?|?Mobil:?+420 602 357 194?|?E-mail:?pavel.pokorny at datera.cz _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: < http://gpfsug.org/pipermail/gpfsug-discuss/attachments/20141107/ecec5a47/attachment-0001.html > -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: < http://gpfsug.org/pipermail/gpfsug-discuss/attachments/20141107/ecec5a47/attachment-0001.gif > ------------------------------ _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss End of gpfsug-discuss Digest, Vol 34, Issue 7 ********************************************* _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: From seanlee at tw.ibm.com Mon Nov 17 09:49:39 2014 From: seanlee at tw.ibm.com (Sean S Lee) Date: Mon, 17 Nov 2014 17:49:39 +0800 Subject: [gpfsug-discuss] GPFS - pagepool data protection? In-Reply-To: References: Message-ID: Hi Pavel, Most popular filesystems work that way. Write buffering improves the performance at the expense of some risk. Today most applications and all modern OS correctly handle "crash consistency", meaning they can recover from uncommitted writes. If you have data which absolutely cannot tolerate any "in-flight" data loss, it requires significant planning and resources on multiple levels, but as far as GPFS is concerned you could create a small file system and data (VMDK's) or serve GPFS or cNFS (mount GPFS with "syncfs", mount NFS with sync,no_wdelay) to VM clients from those filesystems. Your VM OS (VMDK) could be on a regular GPFS file system and your app data and logs could be on a small GPFS with synchronous writes. Regards Sean -------------- next part -------------- An HTML attachment was scrubbed... URL: From pavel.pokorny at datera.cz Mon Nov 17 12:49:26 2014 From: pavel.pokorny at datera.cz (Pavel Pokorny) Date: Mon, 17 Nov 2014 13:49:26 +0100 Subject: [gpfsug-discuss] GPFS - pagepool data protection? Message-ID: Hello, thanks you for all the answers, It is more clear now. Regards, Pavel -- Ing. Pavel Pokorn? DATERA s.r.o. | Ovocn? trh 580/2 | Praha | Czech Republic www.datera.cz | Mobil: +420 602 357 194 | E-mail: pavel.pokorny at datera.cz On Mon, Nov 17, 2014 at 1:00 PM, wrote: > Send gpfsug-discuss mailing list submissions to > gpfsug-discuss at gpfsug.org > > To subscribe or unsubscribe via the World Wide Web, visit > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > or, via email, send a message with subject or body 'help' to > gpfsug-discuss-request at gpfsug.org > > You can reach the person managing the list at > gpfsug-discuss-owner at gpfsug.org > > When replying, please edit your Subject line so it is more specific > than "Re: Contents of gpfsug-discuss digest..." > > > Today's Topics: > > 1. GPFS - pagepool data protection? (Sean S Lee) > > > ---------------------------------------------------------------------- > > Message: 1 > Date: Mon, 17 Nov 2014 17:49:39 +0800 > From: Sean S Lee > To: gpfsug-discuss at gpfsug.org > Subject: [gpfsug-discuss] GPFS - pagepool data protection? > Message-ID: > < > OF20A72494.9E59B93F-ON48257D93.00350BA6-48257D93.0035F912 at tw.ibm.com> > Content-Type: text/plain; charset="us-ascii" > > > Hi Pavel, > > Most popular filesystems work that way. > > Write buffering improves the performance at the expense of some risk. > Today most applications and all modern OS correctly handle "crash > consistency", meaning they can recover from uncommitted writes. > > If you have data which absolutely cannot tolerate any "in-flight" data > loss, it requires significant planning and resources on multiple levels, > but as far as GPFS is concerned you could create a small file system and > data (VMDK's) or serve GPFS or cNFS (mount GPFS with "syncfs", mount NFS > with sync,no_wdelay) to VM clients from those filesystems. > Your VM OS (VMDK) could be on a regular GPFS file system and your app data > and logs could be on a small GPFS with synchronous writes. > > Regards > Sean > -------------- next part -------------- > An HTML attachment was scrubbed... > URL: < > http://gpfsug.org/pipermail/gpfsug-discuss/attachments/20141117/1eb905cc/attachment-0001.html > > > > ------------------------------ > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at gpfsug.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > End of gpfsug-discuss Digest, Vol 34, Issue 13 > ********************************************** > -------------- next part -------------- An HTML attachment was scrubbed... URL: From orlando.richards at ed.ac.uk Wed Nov 19 16:35:44 2014 From: orlando.richards at ed.ac.uk (Orlando Richards) Date: Wed, 19 Nov 2014 16:35:44 +0000 Subject: [gpfsug-discuss] GPFS inside OpenStack guests Message-ID: <546CC6E0.1010800@ed.ac.uk> Hi folks, Does anyone have experience of running GPFS inside OpenStack guests, to connect to an existing (traditional, "bare metal") GPFS filesystem owning cluster? This is not using GPFS for openstack block/image storage - but using GPFS as a "NAS" service, with openstack guest instances as as a "GPFS client". --- Orlando -- -- Dr Orlando Richards Research Facilities (ECDF) Systems Leader Information Services IT Infrastructure Division Tel: 0131 650 4994 skype: orlando.richards The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336. From S.J.Thompson at bham.ac.uk Wed Nov 19 18:36:30 2014 From: S.J.Thompson at bham.ac.uk (Simon Thompson (Research Computing - IT Services)) Date: Wed, 19 Nov 2014 18:36:30 +0000 Subject: [gpfsug-discuss] GPFS inside OpenStack guests In-Reply-To: <546CC6E0.1010800@ed.ac.uk> References: <546CC6E0.1010800@ed.ac.uk> Message-ID: I was asking this question at the gpfs forum on Monday at sc, but there didn't seem to be much in how wr could do it. One of the suggestions was to basically use nfs, or there is the Manilla compnents of Openstack coming, but still that isn't really true gpfs access. I did wonder about virtio, but whether that would work with gpfs passed from the hosting system. Simon ________________________________________ From: gpfsug-discuss-bounces at gpfsug.org [gpfsug-discuss-bounces at gpfsug.org] on behalf of Orlando Richards [orlando.richards at ed.ac.uk] Sent: 19 November 2014 16:35 To: gpfsug-discuss at gpfsug.org Subject: [gpfsug-discuss] GPFS inside OpenStack guests Hi folks, Does anyone have experience of running GPFS inside OpenStack guests, to connect to an existing (traditional, "bare metal") GPFS filesystem owning cluster? This is not using GPFS for openstack block/image storage - but using GPFS as a "NAS" service, with openstack guest instances as as a "GPFS client". --- Orlando -- -- Dr Orlando Richards Research Facilities (ECDF) Systems Leader Information Services IT Infrastructure Division Tel: 0131 650 4994 skype: orlando.richards The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss From oehmes at gmail.com Wed Nov 19 19:00:50 2014 From: oehmes at gmail.com (Sven Oehme) Date: Wed, 19 Nov 2014 11:00:50 -0800 Subject: [gpfsug-discuss] GPFS inside OpenStack guests In-Reply-To: <546CC6E0.1010800@ed.ac.uk> References: <546CC6E0.1010800@ed.ac.uk> Message-ID: technically there are multiple ways to do this. 1. you can use the NSD protocol for this, just need to have adequate Network resources (or use PCI pass trough of the network adapter to the guest) 2. you attach the physical disks as virtio block devices 3. pass trough of the Block HBA (e.g. FC adapter) into the guest. if you use virtio you need to make sure all caching is disabled entirely or you end up with major issues and i am not sure about official support for this, 1 and 3 are straight forward ... Sven On Wed, Nov 19, 2014 at 8:35 AM, Orlando Richards wrote: > Hi folks, > > Does anyone have experience of running GPFS inside OpenStack guests, to > connect to an existing (traditional, "bare metal") GPFS filesystem owning > cluster? > > This is not using GPFS for openstack block/image storage - but using GPFS > as a "NAS" service, with openstack guest instances as as a "GPFS client". > > > --- > Orlando > > > > > -- > -- > Dr Orlando Richards > Research Facilities (ECDF) Systems Leader > Information Services > IT Infrastructure Division > Tel: 0131 650 4994 > skype: orlando.richards > > The University of Edinburgh is a charitable body, registered in Scotland, > with registration number SC005336. > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at gpfsug.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: From S.J.Thompson at bham.ac.uk Wed Nov 19 19:03:55 2014 From: S.J.Thompson at bham.ac.uk (Simon Thompson (Research Computing - IT Services)) Date: Wed, 19 Nov 2014 19:03:55 +0000 Subject: [gpfsug-discuss] GPFS inside OpenStack guests In-Reply-To: References: <546CC6E0.1010800@ed.ac.uk>, Message-ID: Yes, what about the random name nature of a vm image? For example I spin up a new vm, how does it join the gpfs cluster to be able to use nsd protocol? And how about attaching to the netowkrk as neutron networking uses per tenant networks, so how would you actually get access to the gpfs cluster? Simon ________________________________________ From: gpfsug-discuss-bounces at gpfsug.org [gpfsug-discuss-bounces at gpfsug.org] on behalf of Sven Oehme [oehmes at gmail.com] Sent: 19 November 2014 19:00 To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] GPFS inside OpenStack guests technically there are multiple ways to do this. 1. you can use the NSD protocol for this, just need to have adequate Network resources (or use PCI pass trough of the network adapter to the guest) 2. you attach the physical disks as virtio block devices 3. pass trough of the Block HBA (e.g. FC adapter) into the guest. if you use virtio you need to make sure all caching is disabled entirely or you end up with major issues and i am not sure about official support for this, 1 and 3 are straight forward ... Sven On Wed, Nov 19, 2014 at 8:35 AM, Orlando Richards > wrote: Hi folks, Does anyone have experience of running GPFS inside OpenStack guests, to connect to an existing (traditional, "bare metal") GPFS filesystem owning cluster? This is not using GPFS for openstack block/image storage - but using GPFS as a "NAS" service, with openstack guest instances as as a "GPFS client". --- Orlando -- -- Dr Orlando Richards Research Facilities (ECDF) Systems Leader Information Services IT Infrastructure Division Tel: 0131 650 4994 skype: orlando.richards The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss From chekh at stanford.edu Wed Nov 19 19:37:50 2014 From: chekh at stanford.edu (Alex Chekholko) Date: Wed, 19 Nov 2014 11:37:50 -0800 Subject: [gpfsug-discuss] GPFS inside OpenStack guests In-Reply-To: References: <546CC6E0.1010800@ed.ac.uk>, Message-ID: <546CF18E.3010802@stanford.edu> Just make the new VMs NFS clients, no? It's so much simpler and the performance is not much less. But you do need to run CNFS in the GPFS cluster. On 11/19/14 11:03 AM, Simon Thompson (Research Computing - IT Services) wrote: > > Yes, what about the random name nature of a vm image? > > For example I spin up a new vm, how does it join the gpfs cluster to be able to use nsd protocol? > > And how about attaching to the netowkrk as neutron networking uses per tenant networks, so how would you actually get access to the gpfs cluster? > > Simon > ________________________________________ > From: gpfsug-discuss-bounces at gpfsug.org [gpfsug-discuss-bounces at gpfsug.org] on behalf of Sven Oehme [oehmes at gmail.com] > Sent: 19 November 2014 19:00 > To: gpfsug main discussion list > Subject: Re: [gpfsug-discuss] GPFS inside OpenStack guests > > technically there are multiple ways to do this. > > 1. you can use the NSD protocol for this, just need to have adequate Network resources (or use PCI pass trough of the network adapter to the guest) > 2. you attach the physical disks as virtio block devices > 3. pass trough of the Block HBA (e.g. FC adapter) into the guest. > > if you use virtio you need to make sure all caching is disabled entirely or you end up with major issues and i am not sure about official support for this, 1 and 3 are straight forward ... > > Sven > > > > > > On Wed, Nov 19, 2014 at 8:35 AM, Orlando Richards > wrote: > Hi folks, > > Does anyone have experience of running GPFS inside OpenStack guests, to connect to an existing (traditional, "bare metal") GPFS filesystem owning cluster? > > This is not using GPFS for openstack block/image storage - but using GPFS as a "NAS" service, with openstack guest instances as as a "GPFS client". > > > --- > Orlando > > > > > -- > -- > Dr Orlando Richards > Research Facilities (ECDF) Systems Leader > Information Services > IT Infrastructure Division > Tel: 0131 650 4994 > skype: orlando.richards > > The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336. > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at gpfsug.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at gpfsug.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -- Alex Chekholko chekh at stanford.edu From orlando.richards at ed.ac.uk Wed Nov 19 20:56:32 2014 From: orlando.richards at ed.ac.uk (orlando.richards at ed.ac.uk) Date: Wed, 19 Nov 2014 20:56:32 +0000 (GMT) Subject: [gpfsug-discuss] GPFS inside OpenStack guests In-Reply-To: References: <546CC6E0.1010800@ed.ac.uk> Message-ID: On Wed, 19 Nov 2014, Simon Thompson (Research Computing - IT Services) wrote: > > I was asking this question at the gpfs forum on Monday at sc, but there didn't seem to be much in how wr could do it. > > One of the suggestions was to basically use nfs, or there is the Manilla compnents of Openstack coming, but still that isn't really true gpfs access. > NFS should be easy enough - but you can lose a lot of the gpfs good-ness by doing that (acl's, cloning, performance?, etc). > I did wonder about virtio, but whether that would work with gpfs passed from the hosting system. I was more looking for something fairly native - so that we don't have to, for example, start heavily customising the hypervisor stack. In fact - if you're pushing out to a third-party service provider cloud (and that could be your internal organisation's cloud run as a separate service) then you don't have that option at all. I've not dug into virtio much in a basic kvm hypervisor, but one of the guys in EPCC has been trying it out. Initial impressions (once he got it working!) were tarred by terrible performance. I've not caught up with how he got on after that initial look. > > Simon > ________________________________________ > From: gpfsug-discuss-bounces at gpfsug.org [gpfsug-discuss-bounces at gpfsug.org] on behalf of Orlando Richards [orlando.richards at ed.ac.uk] > Sent: 19 November 2014 16:35 > To: gpfsug-discuss at gpfsug.org > Subject: [gpfsug-discuss] GPFS inside OpenStack guests > > Hi folks, > > Does anyone have experience of running GPFS inside OpenStack guests, to > connect to an existing (traditional, "bare metal") GPFS filesystem > owning cluster? > > This is not using GPFS for openstack block/image storage - but using > GPFS as a "NAS" service, with openstack guest instances as as a "GPFS > client". > > > --- > Orlando > > > > > -- > -- > Dr Orlando Richards > Research Facilities (ECDF) Systems Leader > Information Services > IT Infrastructure Division > Tel: 0131 650 4994 > skype: orlando.richards > > The University of Edinburgh is a charitable body, registered in > Scotland, with registration number SC005336. > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at gpfsug.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at gpfsug.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > -- -- Dr Orlando Richards Research Facilities (ECDF) Systems Leader Information Services IT Infrastructure Division Tel: 0131 650 4994 skype: orlando.richards The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336. From orlando.richards at ed.ac.uk Wed Nov 19 20:56:38 2014 From: orlando.richards at ed.ac.uk (orlando.richards at ed.ac.uk) Date: Wed, 19 Nov 2014 20:56:38 +0000 (GMT) Subject: [gpfsug-discuss] GPFS inside OpenStack guests In-Reply-To: References: <546CC6E0.1010800@ed.ac.uk>, Message-ID: On Wed, 19 Nov 2014, Simon Thompson (Research Computing - IT Services) wrote: > > Yes, what about the random name nature of a vm image? > > > For example I spin up a new vm, how does it join the gpfs cluster to be able to use nsd protocol? I *think* this bit should be solvable - assuming one can pre-define the range of names the node will have, and can pre-populate your gpfs cluster config with these node names. The guest image should then have the full /var/mmfs tree (pulled from another gpfs node), but with the /var/mmfs/gen/mmfsNodeData file removed. When it starts up, it'll figure out "who" it is and regenerate that file, pull the latest cluster config from the primary config server, and start up. > And how about attaching to the netowkrk as neutron networking uses per tenant networks, so how would you actually get access to the gpfs cluster? This bit is where I can see the potential pitfall. OpenStack naturally uses NAT to handle traffic to and from guests - will GPFS cope with nat'ted clients in this way? Fair point on NFS from Alex - but will you get the same multi-threaded performance from NFS compared with GPFS? Also - could you make each hypervisor an NFS server for its guests, thus doing away with the need for CNFS, and removing the potential for the nfs server threads bottlenecking? For instance - if I have 300 worker nodes, and 7 NSD servers - I'd then have 300 NFS servers running, rather than 7 NFS servers. Direct block access to the storage from the hypervisor would also be possible (network configuration permitting), and the NFS traffic would flow only over a "virtual" network within the hypervisor, and so "should" (?) be more efficient. > > Simon > ________________________________________ > From: gpfsug-discuss-bounces at gpfsug.org [gpfsug-discuss-bounces at gpfsug.org] on behalf of Sven Oehme [oehmes at gmail.com] > Sent: 19 November 2014 19:00 > To: gpfsug main discussion list > Subject: Re: [gpfsug-discuss] GPFS inside OpenStack guests > > technically there are multiple ways to do this. > > 1. you can use the NSD protocol for this, just need to have adequate Network resources (or use PCI pass trough of the network adapter to the guest) > 2. you attach the physical disks as virtio block devices > 3. pass trough of the Block HBA (e.g. FC adapter) into the guest. > > if you use virtio you need to make sure all caching is disabled entirely or you end up with major issues and i am not sure about official support for this, 1 and 3 are straight forward ... > > Sven > > > > > > On Wed, Nov 19, 2014 at 8:35 AM, Orlando Richards > wrote: > Hi folks, > > Does anyone have experience of running GPFS inside OpenStack guests, to connect to an existing (traditional, "bare metal") GPFS filesystem owning cluster? > > This is not using GPFS for openstack block/image storage - but using GPFS as a "NAS" service, with openstack guest instances as as a "GPFS client". > > > --- > Orlando > > > > > -- > -- > Dr Orlando Richards > Research Facilities (ECDF) Systems Leader > Information Services > IT Infrastructure Division > Tel: 0131 650 4994 > skype: orlando.richards > > The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336. > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at gpfsug.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at gpfsug.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > -- -- Dr Orlando Richards Research Facilities (ECDF) Systems Leader Information Services IT Infrastructure Division Tel: 0131 650 4994 skype: orlando.richards The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336. From S.J.Thompson at bham.ac.uk Thu Nov 20 00:20:44 2014 From: S.J.Thompson at bham.ac.uk (Simon Thompson (Research Computing - IT Services)) Date: Thu, 20 Nov 2014 00:20:44 +0000 Subject: [gpfsug-discuss] GPFS inside OpenStack guests In-Reply-To: References: <546CC6E0.1010800@ed.ac.uk> Message-ID: On 19/11/2014 14:56, "orlando.richards at ed.ac.uk" wrote: >> >>And how about attaching to the netowkrk as neutron networking uses per >>tenant networks, so how would you actually get access to the gpfs >>cluster? > >This bit is where I can see the potential pitfall. OpenStack naturally >uses NAT to handle traffic to and from guests - will GPFS cope with >nat'ted clients in this way? Well, not necessarily, I was thinking about this and potentially you could create an external shared network which is bound to your GPFS interface, though there?s possible security questions maybe around exposing a real internal network device into a VM. I think there is also a Mellanox driver for the VPI Pro cards which allow you to pass the card through to instances. I can?t remember if that was just acceleration for Ethernet or if it could do IB as well. >Also - could you make each hypervisor an NFS server for its guests, thus >doing away with the need for CNFS, and removing the potential for the nfs >server threads bottlenecking? For instance - if I have 300 worker nodes, >and 7 NSD servers - I'd then have 300 NFS servers running, rather than 7 Would you then not need to have 300 server licenses though? Simon From jonathan at buzzard.me.uk Thu Nov 20 10:03:01 2014 From: jonathan at buzzard.me.uk (Jonathan Buzzard) Date: Thu, 20 Nov 2014 10:03:01 +0000 Subject: [gpfsug-discuss] GPFS inside OpenStack guests In-Reply-To: References: <546CC6E0.1010800@ed.ac.uk> , Message-ID: <1416477781.4171.23.camel@buzzard.phy.strath.ac.uk> On Wed, 2014-11-19 at 20:56 +0000, orlando.richards at ed.ac.uk wrote: > On Wed, 19 Nov 2014, Simon Thompson (Research Computing - IT Services) > wrote: > > > > > Yes, what about the random name nature of a vm image? > > > > > > For example I spin up a new vm, how does it join the gpfs cluster to be able to use nsd protocol? > > > I *think* this bit should be solvable - assuming one can pre-define the > range of names the node will have, and can pre-populate your gpfs cluster > config with these node names. The guest image should then have the full > /var/mmfs tree (pulled from another gpfs node), but with the > /var/mmfs/gen/mmfsNodeData file removed. When it starts up, it'll figure > out "who" it is and regenerate that file, pull the latest cluster config > from the primary config server, and start up. It's perfectly solvable with a bit of scripting and putting the cluster into admin mode central. > > > > > And how about attaching to the netowkrk as neutron networking uses per tenant networks, so how would you actually get access to the gpfs cluster? > > This bit is where I can see the potential pitfall. OpenStack naturally > uses NAT to handle traffic to and from guests - will GPFS cope with > nat'ted clients in this way? Not going to work with NAT. GPFS has some "funny" ideas about networking, but to put it succinctly all the nodes have to be on the same class A, B or C network. Though it considers every address in a class A network to be on the same network even though you may have divided it up internally into different networks. Consequently the network model in GPFS is broken. You would need to use bridged mode aka FlatNetworking in OpenStacks for this to work, but surely Jan knows all this. JAB. -- Jonathan A. Buzzard Email: jonathan (at) buzzard.me.uk Fife, United Kingdom. From janfrode at tanso.net Fri Nov 21 19:35:48 2014 From: janfrode at tanso.net (Jan-Frode Myklebust) Date: Fri, 21 Nov 2014 20:35:48 +0100 Subject: [gpfsug-discuss] Gathering node/fs statistics ? Message-ID: <20141121193548.GA11920@mushkin.tanso.net> I'm considering writing a Performance CoPilot agent (PMDA, Performance Metrics Domain Agent) for GPFS, and would like to collect all/most of the metrics that are already available in the gpfs SNMP agent -- ideally without using SNMP.. So, could someone help me with where to find GPFS performance data? I've noticed "mmfsadm" has a "resetstats" option, but what are these stats / where can I find them? All in mmpmon? Also the GPFS-MIB.txt seems to point at some commands I'm unfamiliar with: -- all other node data from EE "get nodes" command -- Status info from EE "get fs -b" command -- Performance data from mmpmon "gfis" command -- Storage pool table comes from EE "get pools" command -- Storage pool data comes from SDR and EE "get pools" command -- Disk data from EE "get fs" command -- Disk performance data from mmpmon "ds" command: -- From mmpmon nc: Any idea what 'EE "get nodes"' is? And what do they mean by 'mmpmon "gfis"', "nc" or "ds"? These commands doesn't work when fed to mmpmon.. -jf From oehmes at gmail.com Fri Nov 21 20:15:16 2014 From: oehmes at gmail.com (Sven Oehme) Date: Fri, 21 Nov 2014 12:15:16 -0800 Subject: [gpfsug-discuss] Gathering node/fs statistics ? In-Reply-To: <20141121193548.GA11920@mushkin.tanso.net> References: <20141121193548.GA11920@mushkin.tanso.net> Message-ID: Hi, you should take a look at the following 3 links : my performance talk about GPFS , take a look at the dstat plugin mentioned in the charts : http://www.gpfsug.org/wp-content/uploads/2014/05/UG10_GPFS_Performance_Session_v10.pdf documentation about the mmpmon interface and use in GPFS : http://www-01.ibm.com/support/knowledgecenter/SSFKCN_4.1.0/com.ibm.cluster.gpfs.v4r1.gpfs200.doc/bl1adv_mmpmonch.htm documentation about GSS/ESS/GNR in case you care about this as well and its additional mmpmon commands : http://www-01.ibm.com/support/knowledgecenter/SSFKCN/bl1du14a.pdf Sven On Fri, Nov 21, 2014 at 11:35 AM, Jan-Frode Myklebust wrote: > I'm considering writing a Performance CoPilot agent (PMDA, Performance > Metrics Domain Agent) for GPFS, and would like to collect all/most of > the metrics that are already available in the gpfs SNMP agent -- ideally > without using SNMP.. > > So, could someone help me with where to find GPFS performance data? I've > noticed "mmfsadm" has a "resetstats" option, but what are these stats / > where can I find them? All in mmpmon? > > Also the GPFS-MIB.txt seems to point at some commands I'm unfamiliar > with: > > -- all other node data from EE "get nodes" command > -- Status info from EE "get fs -b" command > -- Performance data from mmpmon "gfis" command > -- Storage pool table comes from EE "get pools" command > -- Storage pool data comes from SDR and EE "get pools" command > -- Disk data from EE "get fs" command > -- Disk performance data from mmpmon "ds" command: > -- From mmpmon nc: > > > Any idea what 'EE "get nodes"' is? And what do they mean by 'mmpmon > "gfis"', "nc" or "ds"? These commands doesn't work when fed to mmpmon.. > > > > -jf > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at gpfsug.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: From oester at gmail.com Fri Nov 21 20:29:05 2014 From: oester at gmail.com (Bob Oesterlin) Date: Fri, 21 Nov 2014 14:29:05 -0600 Subject: [gpfsug-discuss] Gathering node/fs statistics ? In-Reply-To: <20141121193548.GA11920@mushkin.tanso.net> References: <20141121193548.GA11920@mushkin.tanso.net> Message-ID: You might want to look at Arxview, www.arxscan.com. I've been working with them and they have good GPFS and Storage monitoring based on mmpmon. Lightweight too. Bob Oesterlin Nuance Communications On Friday, November 21, 2014, Jan-Frode Myklebust wrote: > I'm considering writing a Performance CoPilot agent (PMDA, Performance > Metrics Domain Agent) for GPFS, and would like to collect all/most of > the metrics that are already available in the gpfs SNMP agent -- ideally > without using SNMP.. > > So, could someone help me with where to find GPFS performance data? I've > noticed "mmfsadm" has a "resetstats" option, but what are these stats / > where can I find them? All in mmpmon? > > Also the GPFS-MIB.txt seems to point at some commands I'm unfamiliar > with: > > -- all other node data from EE "get nodes" command > -- Status info from EE "get fs -b" command > -- Performance data from mmpmon "gfis" command > -- Storage pool table comes from EE "get pools" command > -- Storage pool data comes from SDR and EE "get pools" command > -- Disk data from EE "get fs" command > -- Disk performance data from mmpmon "ds" command: > -- From mmpmon nc: > > > Any idea what 'EE "get nodes"' is? And what do they mean by 'mmpmon > "gfis"', "nc" or "ds"? These commands doesn't work when fed to mmpmon.. > > > > -jf > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at gpfsug.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -- Bob Oesterlin -------------- next part -------------- An HTML attachment was scrubbed... URL: From sabujp at gmail.com Fri Nov 21 22:50:02 2014 From: sabujp at gmail.com (Sabuj Pattanayek) Date: Fri, 21 Nov 2014 16:50:02 -0600 Subject: [gpfsug-discuss] any difference with the filespace view mmbackup sees from a global snapshot vs a snapshot on -j root with only 1 independent fileset (root)? Message-ID: Hi all, We're running 3.5.0.19 . Is there any difference in terms of the view of the filespace that mmbackup sees and then passes to TSM if we run mmbackup against a global snapshot vs a snapshot on -j root if we only have and ever plan on having one independent fileset (root)? It doesn't look like it to me just from ls, but just verifying. We want to get away from using a global snapshot if possible (and start using -j root snapshots) instead because for some reason it looks like it takes much much longer to run mmdelsnapshot on a global snapshot vs a snapshot on the root fileset. Thanks, Sabuj -------------- next part -------------- An HTML attachment was scrubbed... URL: From stijn.deweirdt at ugent.be Mon Nov 24 21:22:19 2014 From: stijn.deweirdt at ugent.be (Stijn De Weirdt) Date: Mon, 24 Nov 2014 22:22:19 +0100 Subject: [gpfsug-discuss] restripe or not Message-ID: <5473A18B.7000702@ugent.be> hi all, we are going to expand an existing filestytem with approx 50% capacity. the current filesystem is 75% full. we are in downtime (for more then just this reason), so we can take the IO rebalance hit for a while (say max 48hours). my questions: a. do we really need to rebalance? the mmadddisk page suggest normally it's going to be ok, but i never understood that. new data will end up mainly on new disks, so wrt to performance, this can't really work out, can it? b. can we change the priority of rebalancing somehow (fewer nodes taking part in the rebalance?) c. once we start the rebalance, how save is it to stop with kill or ctrl-c (or can we say eg. rebalance 25% now, rest later?) (and how often can we do this? eg a daily cron job to restripe at max one hour per day, would this cause issue in the long term many thanks, stijn From zgiles at gmail.com Mon Nov 24 23:14:21 2014 From: zgiles at gmail.com (Zachary Giles) Date: Mon, 24 Nov 2014 18:14:21 -0500 Subject: [gpfsug-discuss] restripe or not In-Reply-To: <5473A18B.7000702@ugent.be> References: <5473A18B.7000702@ugent.be> Message-ID: Interesting question.. Just some thoughts: Not an expert on restriping myself: * Your new storage -- is it the same size, shape, speed as the old storage? If not, then are you going to add it to the same storage pool, or an additional storage pool? If additional, restripe is not needed, as you can't / don't need to restripe across storage pools, the data will be in one or the other. However, you of course will need to make a policy to place data correctly. Of course, if you're going to double your storage and all your new data will be written to the new disks, then you may be leaving quite a bit of capacity on the floor. * mmadddisk man page and normal balancing -- yes, we've seen this suggestion as well -- that is, that new data will generally fill across the cluster and eventually fill in the gaps. We didn't restripe on a much smaller storage pool and it eventually did balance out, however, it was also a "tier 1" where data is migrated out often. If I were doubling my primary storage with more of the exact same disks, I'd probably restripe. * Stopping a restripe -- I'm """ Pretty Sure """ you can stop a restripe safely with a Ctrl-C. I'm """ Pretty Sure """ we've done that a few times ourselves with no problem. I remember I was going to restripe something but the estimates were too high and so I stopped it. I'd feel fairly confident in doing it, but I don't want to take responsibility for your storage. :) :) I don't think there's a need to restripe every hour or anything. If you're generally balanced at one point, you'd probably continue to be under normal operation. On Mon, Nov 24, 2014 at 4:22 PM, Stijn De Weirdt wrote: > hi all, > > we are going to expand an existing filestytem with approx 50% capacity. > the current filesystem is 75% full. > > we are in downtime (for more then just this reason), so we can take the IO > rebalance hit for a while (say max 48hours). > > my questions: > a. do we really need to rebalance? the mmadddisk page suggest normally > it's going to be ok, but i never understood that. new data will end up > mainly on new disks, so wrt to performance, this can't really work out, can > it? > b. can we change the priority of rebalancing somehow (fewer nodes taking > part in the rebalance?) > c. once we start the rebalance, how save is it to stop with kill or ctrl-c > (or can we say eg. rebalance 25% now, rest later?) > (and how often can we do this? eg a daily cron job to restripe at max one > hour per day, would this cause issue in the long term > > > many thanks, > > stijn > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at gpfsug.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -- Zach Giles zgiles at gmail.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From oester at gmail.com Tue Nov 25 02:01:06 2014 From: oester at gmail.com (Bob Oesterlin) Date: Mon, 24 Nov 2014 20:01:06 -0600 Subject: [gpfsug-discuss] restripe or not In-Reply-To: <5473A18B.7000702@ugent.be> References: <5473A18B.7000702@ugent.be> Message-ID: In general, the need to restripe after a disk add is dependent on a number of factors, as has been pointed out.. A couple of other thoughts/suggestions: - One thing you might consider (depending on your pattern of read/write traffic), is selectively suspending one or more of the existing NSDs, forcing GPFS to write new blocks to the new NSDs. That way at least some of the new data is being written to the new storage by default, rather than using up blocks on the existing NSDs. You can suspend/resume disks at any time. - You can pick a subset of nodes to perform the restripe with "mmrestripefs -N node1,node2,..." Keep in mind you'll get much better performance and less impact to the filesystem if you choose NSD servers with direct access to the disk. - Resume of restripe: Yes, you can do this, no harm, done it many times. You can track the balance of the disks using "mmdf ". This is a pretty intensive command, so I wouldn't run in frequently. Check it a few times each day, see if the data balance is improving by itself. When you stop/restart it, the restripe doesn't pick up exactly where it left off, it's going to scan the entire file system again. - You can also restripe single files if the are large and get a heavy I/O (mmrestripefile) Bob Oesterlin Nuance Communications On Mon, Nov 24, 2014 at 3:22 PM, Stijn De Weirdt wrote: > hi all, > > we are going to expand an existing filestytem with approx 50% capacity. > the current filesystem is 75% full. > > we are in downtime (for more then just this reason), so we can take the IO > rebalance hit for a while (say max 48hours). > > my questions: > a. do we really need to rebalance? the mmadddisk page suggest normally > it's going to be ok, but i never understood that. new data will end up > mainly on new disks, so wrt to performance, this can't really work out, can > it? > b. can we change the priority of rebalancing somehow (fewer nodes taking > part in the rebalance?) > c. once we start the rebalance, how save is it to stop with kill or ctrl-c > (or can we say eg. rebalance 25% now, rest later?) > (and how often can we do this? eg a daily cron job to restripe at max one > hour per day, would this cause issue in the long term > > > many thanks, > > stijn > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at gpfsug.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: From stijn.deweirdt at ugent.be Tue Nov 25 07:17:56 2014 From: stijn.deweirdt at ugent.be (Stijn De Weirdt) Date: Tue, 25 Nov 2014 08:17:56 +0100 Subject: [gpfsug-discuss] restripe or not In-Reply-To: References: <5473A18B.7000702@ugent.be> Message-ID: <54742D24.8090602@ugent.be> hi zachary, > * Your new storage -- is it the same size, shape, speed as the old storage? yes. we created and used it as "test" filesystem on the same hardware when we started. now we are shrinking the test filesystem and adding the free disks to the production one. > If not, then are you going to add it to the same storage pool, or an > additional storage pool? If additional, restripe is not needed, as you > can't / don't need to restripe across storage pools, the data will be in > one or the other. However, you of course will need to make a policy to > place data correctly. sure, but in this case, they end up in teh same pool. > Of course, if you're going to double your storage and all your new data > will be written to the new disks, then you may be leaving quite a bit of > capacity on the floor. > > * mmadddisk man page and normal balancing -- yes, we've seen this > suggestion as well -- that is, that new data will generally fill across the > cluster and eventually fill in the gaps. We didn't restripe on a much > smaller storage pool and it eventually did balance out, however, it was > also a "tier 1" where data is migrated out often. If I were doubling my > primary storage with more of the exact same disks, I'd probably restripe. more then half of the data on the current filesystem is more or less static (we expect it to stay there 2-3 year unmodified). similar data will be added in the near future. > > * Stopping a restripe -- I'm """ Pretty Sure """ you can stop a restripe > safely with a Ctrl-C. I'm """ Pretty Sure """ we've done that a few times > ourselves with no problem. I remember I was going to restripe something but > the estimates were too high and so I stopped it. I'd feel fairly confident > in doing it, but I don't want to take responsibility for your storage. :) yeah, i've also remember cancelling a restripe and i'm pretty sure it ddin't cause problems (i would certainly remember the problems ;) i'm looking for some further confirmation (or e.g. a reference to some docuemnt that says so. i vaguely remember sven(?) saying this on the lodon gpfs user day this year. > :) I don't think there's a need to restripe every hour or anything. If > you're generally balanced at one point, you'd probably continue to be under > normal operation. i was thinking to spread the total restripe over one or 2 hour periods each days the coming week(s); but i'm now realising this might not be the best idea, because it will rebalance any new data as well, slowing down the bulk rebalancing. anyway, thanks for the feedback. i'll probably let the rebalance run for 48 hours and see how far it got by that time. stijn > > > > > > On Mon, Nov 24, 2014 at 4:22 PM, Stijn De Weirdt > wrote: > >> hi all, >> >> we are going to expand an existing filestytem with approx 50% capacity. >> the current filesystem is 75% full. >> >> we are in downtime (for more then just this reason), so we can take the IO >> rebalance hit for a while (say max 48hours). >> >> my questions: >> a. do we really need to rebalance? the mmadddisk page suggest normally >> it's going to be ok, but i never understood that. new data will end up >> mainly on new disks, so wrt to performance, this can't really work out, can >> it? >> b. can we change the priority of rebalancing somehow (fewer nodes taking >> part in the rebalance?) >> c. once we start the rebalance, how save is it to stop with kill or ctrl-c >> (or can we say eg. rebalance 25% now, rest later?) >> (and how often can we do this? eg a daily cron job to restripe at max one >> hour per day, would this cause issue in the long term >> >> >> many thanks, >> >> stijn >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at gpfsug.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >> > > > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at gpfsug.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > From stijn.deweirdt at ugent.be Tue Nov 25 07:23:41 2014 From: stijn.deweirdt at ugent.be (Stijn De Weirdt) Date: Tue, 25 Nov 2014 08:23:41 +0100 Subject: [gpfsug-discuss] restripe or not In-Reply-To: References: <5473A18B.7000702@ugent.be> Message-ID: <54742E7D.7090009@ugent.be> hi bob, > - One thing you might consider (depending on your pattern of read/write > traffic), is selectively suspending one or more of the existing NSDs, > forcing GPFS to write new blocks to the new NSDs. That way at least some of > the new data is being written to the new storage by default, rather than > using up blocks on the existing NSDs. You can suspend/resume disks at any > time. is the gpfs placment weighted with the avalaible volume? i'd rather not make this a manual operation. > > - You can pick a subset of nodes to perform the restripe with "mmrestripefs > -N node1,node2,..." Keep in mind you'll get much better performance and > less impact to the filesystem if you choose NSD servers with direct access > to the disk. yes and i no i guess, our nsds see all disks, but the problem with nsds is that they don't honour any roles (our primary nsds have the preferred path to the controller and lun, meaning all access from non-primary nsd to that disk is suboptimal). > > - Resume of restripe: Yes, you can do this, no harm, done it many times. > You can track the balance of the disks using "mmdf ". This is a > pretty intensive command, so I wouldn't run in frequently. Check it a few > times each day, see if the data balance is improving by itself. When you thanks for the tip to monitor it with mmdf! > stop/restart it, the restripe doesn't pick up exactly where it left off, > it's going to scan the entire file system again. yeah, i realised that this is a flaw in my "one-hour a day" restripe idea ;) > > - You can also restripe single files if the are large and get a heavy I/O > (mmrestripefile) excellent tip! forgot about that one. if the rebalnce is to slow, i can run this on the static data. thanks a lot for the feedback stijn > > Bob Oesterlin > Nuance Communications > > > On Mon, Nov 24, 2014 at 3:22 PM, Stijn De Weirdt > wrote: > >> hi all, >> >> we are going to expand an existing filestytem with approx 50% capacity. >> the current filesystem is 75% full. >> >> we are in downtime (for more then just this reason), so we can take the IO >> rebalance hit for a while (say max 48hours). >> >> my questions: >> a. do we really need to rebalance? the mmadddisk page suggest normally >> it's going to be ok, but i never understood that. new data will end up >> mainly on new disks, so wrt to performance, this can't really work out, can >> it? >> b. can we change the priority of rebalancing somehow (fewer nodes taking >> part in the rebalance?) >> c. once we start the rebalance, how save is it to stop with kill or ctrl-c >> (or can we say eg. rebalance 25% now, rest later?) >> (and how often can we do this? eg a daily cron job to restripe at max one >> hour per day, would this cause issue in the long term >> >> >> many thanks, >> >> stijn >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at gpfsug.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >> > From L.A.Hurst at bham.ac.uk Tue Nov 25 10:45:51 2014 From: L.A.Hurst at bham.ac.uk (Laurence Alexander Hurst) Date: Tue, 25 Nov 2014 10:45:51 +0000 Subject: [gpfsug-discuss] Quotas and non-existant users Message-ID: Hi all, We have noticed that once users are deleted (gone entirely from the passwd backend) and all their files removed, their uid is still reported by GPFS? quota tools (albeit with zero files and space usage). Does anyone know if there is a way to clear out these spurious entries or is it a case that once a uid is in the quota database its there forever regardless of if that uid is still in use and has quota to record? Many Thanks, Laurence From ewahl at osc.edu Tue Nov 25 13:52:55 2014 From: ewahl at osc.edu (Wahl, Edward) Date: Tue, 25 Nov 2014 13:52:55 +0000 Subject: [gpfsug-discuss] Quotas and non-existant users In-Reply-To: References: Message-ID: <1416923575.2343.18.camel@localhost.localdomain> Do you still have policies or filesets associated with these users? Ed Wahl OSC On Tue, 2014-11-25 at 10:45 +0000, Laurence Alexander Hurst wrote: > Hi all, > > We have noticed that once users are deleted (gone entirely from the passwd backend) and all their files removed, their uid is still reported by GPFS? quota tools (albeit with zero files and space usage). > > Does anyone know if there is a way to clear out these spurious entries or is it a case that once a uid is in the quota database its there forever regardless of if that uid is still in use and has quota to record? > > Many Thanks, > > Laurence > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at gpfsug.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss From jonathan at buzzard.me.uk Tue Nov 25 14:00:29 2014 From: jonathan at buzzard.me.uk (Jonathan Buzzard) Date: Tue, 25 Nov 2014 14:00:29 +0000 Subject: [gpfsug-discuss] Quotas and non-existant users In-Reply-To: References: Message-ID: <1416924029.4171.69.camel@buzzard.phy.strath.ac.uk> On Tue, 2014-11-25 at 10:45 +0000, Laurence Alexander Hurst wrote: > Hi all, > > We have noticed that once users are deleted (gone entirely from the > passwd backend) and all their files removed, their uid is still > reported by GPFS? quota tools (albeit with zero files and space usage). > There is something somewhere that references them, because they do disappear. I know because I cleared out a GPFS file system that had files and directories used by "depreciated" user and group names, and the check I was using to make sure I had got everything belonging to a particular user or group was mmrepquota. JAB. -- Jonathan A. Buzzard Email: jonathan (at) buzzard.me.uk Fife, United Kingdom. From stijn.deweirdt at ugent.be Tue Nov 25 16:25:58 2014 From: stijn.deweirdt at ugent.be (Stijn De Weirdt) Date: Tue, 25 Nov 2014 17:25:58 +0100 Subject: [gpfsug-discuss] gpfs.gnr updates Message-ID: <5474AD96.3050006@ugent.be> hi all, does anyone know where we can find the release notes and update rpms for gpfs.gnr? we logged a case with ibm a while ago, and we assumed that the fix for the issue was part of the regular gpfs updates (we assumed as much from the conversation with ibm tech support). many thanks, stijn From L.A.Hurst at bham.ac.uk Wed Nov 26 10:14:26 2014 From: L.A.Hurst at bham.ac.uk (Laurence Alexander Hurst) Date: Wed, 26 Nov 2014 10:14:26 +0000 Subject: [gpfsug-discuss] Quotas and non-existant users In-Reply-To: <1416924029.4171.69.camel@buzzard.phy.strath.ac.uk> References: <1416924029.4171.69.camel@buzzard.phy.strath.ac.uk> Message-ID: Hmm, mmrepquota is reporting no files owned by any of the users in question. I?ll see if `find` disagrees. They have the default fileset user quotas applied, so they?re not users we?ve edited to grant quota extensions to. We have had a problem (which IBM have acknowledged, iirc) whereby it is not possible to reset a user?s quota back to the default if it has been modified, perhaps this is related? I?ll see if `find` turns anything up or I?ll raise a ticket with IBM and see what they think. I?ve pulled out a single example, but all 75 users I have are the same. mmrepquota gpfs | grep 8695 8695 nbu USR 0 0 5368709120 0 none | 0 0 0 0 none 8695 bb USR 0 0 1073741824 0 none | 0 0 0 0 none Thanks for your input. Laurence On 25/11/2014 14:00, "Jonathan Buzzard" wrote: >On Tue, 2014-11-25 at 10:45 +0000, Laurence Alexander Hurst wrote: >> Hi all, >> >> We have noticed that once users are deleted (gone entirely from the >> passwd backend) and all their files removed, their uid is still >> reported by GPFS? quota tools (albeit with zero files and space usage). >> > >There is something somewhere that references them, because they do >disappear. I know because I cleared out a GPFS file system that had >files and directories used by "depreciated" user and group names, and >the check I was using to make sure I had got everything belonging to a >particular user or group was mmrepquota. > >JAB. > >-- >Jonathan A. Buzzard Email: jonathan (at) buzzard.me.uk >Fife, United Kingdom. > > >_______________________________________________ >gpfsug-discuss mailing list >gpfsug-discuss at gpfsug.org >http://gpfsug.org/mailman/listinfo/gpfsug-discuss From chair at gpfsug.org Thu Nov 27 09:21:30 2014 From: chair at gpfsug.org (Jez Tucker (Chair)) Date: Thu, 27 Nov 2014 09:21:30 +0000 Subject: [gpfsug-discuss] gpfs.gnr updates In-Reply-To: <5474AD96.3050006@ugent.be> References: <5474AD96.3050006@ugent.be> Message-ID: <5476ED1A.8050504@gpfsug.org> Hi Stijn, As far as I am aware, GNR updates are not publicly available for download. You should approach your reseller / IBM Business partner who should be able to supply you with the updates. IBMers, please feel free to correct this statement if in error. Jez On 25/11/14 16:25, Stijn De Weirdt wrote: > hi all, > > does anyone know where we can find the release notes and update rpms > for gpfs.gnr? > we logged a case with ibm a while ago, and we assumed that the fix for > the issue was part of the regular gpfs updates (we assumed as much > from the conversation with ibm tech support). > > many thanks, > > stijn > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at gpfsug.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss From jonathan at buzzard.me.uk Thu Nov 27 09:47:59 2014 From: jonathan at buzzard.me.uk (Jonathan Buzzard) Date: Thu, 27 Nov 2014 09:47:59 +0000 Subject: [gpfsug-discuss] Quotas and non-existant users In-Reply-To: References: <1416924029.4171.69.camel@buzzard.phy.strath.ac.uk> Message-ID: <1417081679.4171.99.camel@buzzard.phy.strath.ac.uk> On Wed, 2014-11-26 at 10:14 +0000, Laurence Alexander Hurst wrote: > Hmm, mmrepquota is reporting no files owned by any of the users in > question. I?ll see if `find` disagrees. > They have the default fileset > user quotas applied, so they?re not users we?ve edited to grant quota > extensions to. We have had a problem (which IBM have acknowledged, iirc) > whereby it is not possible to reset a user?s quota back to the default if > it has been modified, perhaps this is related? I?ll see if `find` turns > anything up or I?ll raise a ticket with IBM and see what they think. > > I?ve pulled out a single example, but all 75 users I have are the same. > > mmrepquota gpfs | grep 8695 > 8695 nbu USR 0 0 5368709120 0 > none | 0 0 0 0 none > 8695 bb USR 0 0 1073741824 0 > none | 0 0 0 0 none > While the number of files and usage is zero look at those "in doubt" numbers. Until these also fall to zero then the users are not going to disappear from the quota reporting would be my guess. Quite why the "in doubt" numbers are still so large is another question. I have vague recollections of this happening to me when I deleted large amounts of data belonging to a user down to zero when I was clearing the file system up I mentioned before. Though to be honest most of my clearing up was identifying who the files really belonged to (there had in the distance past been a change of usernames; gone from local usernames to using the university wide ones and not everyone had claimed their files. All related to a move to using Active Directory) and doing chown's on the data. I think what happens is when the file number goes to zero the quota system stops updating for that user and if there is anything "in doubt" it never gets updated and sticks around forever. Might be worth creating a couple of files for the user in the appropriate filesets and then give it a bit of time and see if the output of mmrepquota matches what you believe is the real case. If this works and the "in doubt" number goes to zero I would at this point do a chown to a different user that is not going away and then delete the files. Something else to consider is that they might be in an ACL somewhere which is confusing the quota system. JAB. -- Jonathan A. Buzzard Email: jonathan (at) buzzard.me.uk Fife, United Kingdom. From peserocka at gmail.com Thu Nov 27 10:01:55 2014 From: peserocka at gmail.com (P Serocka) Date: Thu, 27 Nov 2014 18:01:55 +0800 Subject: [gpfsug-discuss] Quotas and non-existant users In-Reply-To: <1417081679.4171.99.camel@buzzard.phy.strath.ac.uk> References: <1416924029.4171.69.camel@buzzard.phy.strath.ac.uk> <1417081679.4171.99.camel@buzzard.phy.strath.ac.uk> Message-ID: Any chance to run mmcheckquota? which should remove all "doubt"... On 2014 Nov 27. md, at 17:47 st, Jonathan Buzzard wrote: > On Wed, 2014-11-26 at 10:14 +0000, Laurence Alexander Hurst wrote: >> Hmm, mmrepquota is reporting no files owned by any of the users in >> question. I?ll see if `find` disagrees. >> They have the default fileset >> user quotas applied, so they?re not users we?ve edited to grant quota >> extensions to. We have had a problem (which IBM have acknowledged, iirc) >> whereby it is not possible to reset a user?s quota back to the default if >> it has been modified, perhaps this is related? I?ll see if `find` turns >> anything up or I?ll raise a ticket with IBM and see what they think. >> >> I?ve pulled out a single example, but all 75 users I have are the same. >> >> mmrepquota gpfs | grep 8695 >> 8695 nbu USR 0 0 5368709120 0 >> none | 0 0 0 0 none >> 8695 bb USR 0 0 1073741824 0 >> none | 0 0 0 0 none >> > > While the number of files and usage is zero look at those "in doubt" > numbers. Until these also fall to zero then the users are not going to > disappear from the quota reporting would be my guess. Quite why the "in > doubt" numbers are still so large is another question. I have vague > recollections of this happening to me when I deleted large amounts of > data belonging to a user down to zero when I was clearing the file > system up I mentioned before. Though to be honest most of my clearing up > was identifying who the files really belonged to (there had in the > distance past been a change of usernames; gone from local usernames to > using the university wide ones and not everyone had claimed their files. > All related to a move to using Active Directory) and doing chown's on > the data. > > I think what happens is when the file number goes to zero the quota > system stops updating for that user and if there is anything "in doubt" > it never gets updated and sticks around forever. > > Might be worth creating a couple of files for the user in the > appropriate filesets and then give it a bit of time and see if the > output of mmrepquota matches what you believe is the real case. If this > works and the "in doubt" number goes to zero I would at this point do a > chown to a different user that is not going away and then delete the > files. > > Something else to consider is that they might be in an ACL somewhere > which is confusing the quota system. > > > JAB. > > -- > Jonathan A. Buzzard Email: jonathan (at) buzzard.me.uk > Fife, United Kingdom. > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at gpfsug.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss From jonathan at buzzard.me.uk Thu Nov 27 10:02:03 2014 From: jonathan at buzzard.me.uk (Jonathan Buzzard) Date: Thu, 27 Nov 2014 10:02:03 +0000 Subject: [gpfsug-discuss] Quotas and non-existant users In-Reply-To: <1417081679.4171.99.camel@buzzard.phy.strath.ac.uk> References: <1416924029.4171.69.camel@buzzard.phy.strath.ac.uk> <1417081679.4171.99.camel@buzzard.phy.strath.ac.uk> Message-ID: <1417082523.4171.104.camel@buzzard.phy.strath.ac.uk> On Thu, 2014-11-27 at 09:47 +0000, Jonathan Buzzard wrote: > On Wed, 2014-11-26 at 10:14 +0000, Laurence Alexander Hurst wrote: > > Hmm, mmrepquota is reporting no files owned by any of the users in > > question. I?ll see if `find` disagrees. > > They have the default fileset > > user quotas applied, so they?re not users we?ve edited to grant quota > > extensions to. We have had a problem (which IBM have acknowledged, iirc) > > whereby it is not possible to reset a user?s quota back to the default if > > it has been modified, perhaps this is related? I?ll see if `find` turns > > anything up or I?ll raise a ticket with IBM and see what they think. > > > > I?ve pulled out a single example, but all 75 users I have are the same. > > > > mmrepquota gpfs | grep 8695 > > 8695 nbu USR 0 0 5368709120 0 > > none | 0 0 0 0 none > > 8695 bb USR 0 0 1073741824 0 > > none | 0 0 0 0 none > > > > While the number of files and usage is zero look at those "in doubt" > numbers. Ignore that those are quota numbers. Hard when the column headings are missing. Anyway a "Homer Simpson" momentum coming up... Simple answer really remove the quotas for those users in those file sets (I am presuming they are per fileset user hard limits). They are sticking around in mmrepquota because they have a hard limit set. D'oh! JAB. -- Jonathan A. Buzzard Email: jonathan (at) buzzard.me.uk Fife, United Kingdom. From peserocka at gmail.com Thu Nov 27 10:06:31 2014 From: peserocka at gmail.com (P Serocka) Date: Thu, 27 Nov 2014 18:06:31 +0800 Subject: [gpfsug-discuss] Quotas and non-existant users In-Reply-To: <1417082523.4171.104.camel@buzzard.phy.strath.ac.uk> References: <1416924029.4171.69.camel@buzzard.phy.strath.ac.uk> <1417081679.4171.99.camel@buzzard.phy.strath.ac.uk> <1417082523.4171.104.camel@buzzard.phy.strath.ac.uk> Message-ID: <44A03A01-4010-4210-8892-2AE37451EEFA@gmail.com> ;-) Ignore my other message on mmcheckquota then. On 2014 Nov 27. md, at 18:02 st, Jonathan Buzzard wrote: > On Thu, 2014-11-27 at 09:47 +0000, Jonathan Buzzard wrote: >> On Wed, 2014-11-26 at 10:14 +0000, Laurence Alexander Hurst wrote: >>> Hmm, mmrepquota is reporting no files owned by any of the users in >>> question. I?ll see if `find` disagrees. >>> They have the default fileset >>> user quotas applied, so they?re not users we?ve edited to grant quota >>> extensions to. We have had a problem (which IBM have acknowledged, iirc) >>> whereby it is not possible to reset a user?s quota back to the default if >>> it has been modified, perhaps this is related? I?ll see if `find` turns >>> anything up or I?ll raise a ticket with IBM and see what they think. >>> >>> I?ve pulled out a single example, but all 75 users I have are the same. >>> >>> mmrepquota gpfs | grep 8695 >>> 8695 nbu USR 0 0 5368709120 0 >>> none | 0 0 0 0 none >>> 8695 bb USR 0 0 1073741824 0 >>> none | 0 0 0 0 none >>> >> >> While the number of files and usage is zero look at those "in doubt" >> numbers. > > Ignore that those are quota numbers. Hard when the column headings are > missing. > > Anyway a "Homer Simpson" momentum coming up... > > Simple answer really remove the quotas for those users in those file > sets (I am presuming they are per fileset user hard limits). They are > sticking around in mmrepquota because they have a hard limit set. D'oh! > > JAB. > > -- > Jonathan A. Buzzard Email: jonathan (at) buzzard.me.uk > Fife, United Kingdom. > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at gpfsug.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss From sdinardo at ebi.ac.uk Wed Nov 5 10:15:59 2014 From: sdinardo at ebi.ac.uk (Salvatore Di Nardo) Date: Wed, 05 Nov 2014 10:15:59 +0000 Subject: [gpfsug-discuss] maybe a silly question about "old school" gpfs Message-ID: <5459F8DF.2090806@ebi.ac.uk> Hello again, to understand better GPFS, recently i build up an test gpfs cluster using some old hardware that was going to be retired. THe storage was SAN devices, so instead to use native raids I went for the old school gpfs. the configuration is basically: 3x servers 3x san storages 2x san switches I did no zoning, so all the servers can see all the LUNs, but on nsd creation I gave each LUN a primary, secondary and third server. with the following rule: STORAGE primary secondary tertiary storage1 server1 server2 server3 storage2 server2 server3 server1 storage3 server3 server1 server2 looking at the mmcrnsd, it was my understanding that the primary server is the one that wrote on the NSD unless it fails, then the following server take the ownership of the lun. Now come the question: when i did from server 1 a dd surprisingly i discovered that server1 was writing to all the luns. the other 2 server was doing nothing. this behaviour surprises me because on GSS only the RG owner can write, so one server "ask" the other server to write to his own RG's.In fact on GSS can be seen a lot of ETH traffic and io/s on each server. While i understand that the situation it's different I'm puzzled about the fact that all the servers seems able to write to all the luns. SAN deviced usually should be connected to one server only, as paralled access could create data corruption. In environments where you connect a SAN to multiple servers ( example VMWARE cloud) its softeware task to avoid data overwriting between server ( and data corruption ). Honestly, what i was expecting is: server1 writing on his own luns, and data traffic ( ethernet) to the other 2 server , basically asking *them* to write on the other luns. I dont know if this behaviour its normal or not. I triied to find a documentation about that, but could not find any. Could somebody tell me if this _/"every server write to all the luns"/_ its intended or not? Thanks in advance, Salvatore -------------- next part -------------- An HTML attachment was scrubbed... URL: From viccornell at gmail.com Wed Nov 5 10:22:38 2014 From: viccornell at gmail.com (Vic Cornell) Date: Wed, 5 Nov 2014 10:22:38 +0000 Subject: [gpfsug-discuss] maybe a silly question about "old school" gpfs In-Reply-To: <5459F8DF.2090806@ebi.ac.uk> References: <5459F8DF.2090806@ebi.ac.uk> Message-ID: <3F74C441-C25D-4F19-AD05-04AD897A08D3@gmail.com> Hi Salvatore, If you are doing the IO on the NSD server itself and it can see all of the NSDs it will use its "local? access to write to the LUNS. You need some GPFS clients to see the workload spread across all of the NSD servers. Vic > On 5 Nov 2014, at 10:15, Salvatore Di Nardo wrote: > > Hello again, > to understand better GPFS, recently i build up an test gpfs cluster using some old hardware that was going to be retired. THe storage was SAN devices, so instead to use native raids I went for the old school gpfs. the configuration is basically: > > 3x servers > 3x san storages > 2x san switches > > I did no zoning, so all the servers can see all the LUNs, but on nsd creation I gave each LUN a primary, secondary and third server. with the following rule: > > STORAGE > primary > secondary > tertiary > storage1 > server1 > server2 server3 > storage2 server2 server3 server1 > storage3 server3 server1 server2 > > looking at the mmcrnsd, it was my understanding that the primary server is the one that wrote on the NSD unless it fails, then the following server take the ownership of the lun. > > Now come the question: > when i did from server 1 a dd surprisingly i discovered that server1 was writing to all the luns. the other 2 server was doing nothing. this behaviour surprises me because on GSS only the RG owner can write, so one server "ask" the other server to write to his own RG's.In fact on GSS can be seen a lot of ETH traffic and io/s on each server. While i understand that the situation it's different I'm puzzled about the fact that all the servers seems able to write to all the luns. > > SAN deviced usually should be connected to one server only, as paralled access could create data corruption. In environments where you connect a SAN to multiple servers ( example VMWARE cloud) its softeware task to avoid data overwriting between server ( and data corruption ). > > Honestly, what i was expecting is: server1 writing on his own luns, and data traffic ( ethernet) to the other 2 server , basically asking them to write on the other luns. I dont know if this behaviour its normal or not. I triied to find a documentation about that, but could not find any. > > Could somebody tell me if this "every server write to all the luns" its intended or not? > > Thanks in advance, > Salvatore > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at gpfsug.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From stijn.deweirdt at ugent.be Wed Nov 5 10:25:07 2014 From: stijn.deweirdt at ugent.be (Stijn De Weirdt) Date: Wed, 05 Nov 2014 11:25:07 +0100 Subject: [gpfsug-discuss] maybe a silly question about "old school" gpfs In-Reply-To: <5459F8DF.2090806@ebi.ac.uk> References: <5459F8DF.2090806@ebi.ac.uk> Message-ID: <5459FB03.8080801@ugent.be> yes, this behaviour is normal, and a bit annoying sometimes, but GPFS doesn't really like (or isn't designed) to run stuff on the NSDs directly. the GSS probably send the data to the other NSD to distribute the (possible) compute cost from the raid, where there is none for regular LUN access. (but you also shouldn't be running on the GSS NSDs ;) stijn On 11/05/2014 11:15 AM, Salvatore Di Nardo wrote: > Hello again, > to understand better GPFS, recently i build up an test gpfs cluster > using some old hardware that was going to be retired. THe storage was > SAN devices, so instead to use native raids I went for the old school > gpfs. the configuration is basically: > > 3x servers > 3x san storages > 2x san switches > > I did no zoning, so all the servers can see all the LUNs, but on nsd > creation I gave each LUN a primary, secondary and third server. with the > following rule: > > STORAGE > primary > secondary > tertiary > storage1 > server1 > server2 server3 > storage2 server2 server3 server1 > storage3 server3 server1 server2 > > > > looking at the mmcrnsd, it was my understanding that the primary server > is the one that wrote on the NSD unless it fails, then the following > server take the ownership of the lun. > > Now come the question: > when i did from server 1 a dd surprisingly i discovered that server1 was > writing to all the luns. the other 2 server was doing nothing. this > behaviour surprises me because on GSS only the RG owner can write, so > one server "ask" the other server to write to his own RG's.In fact on > GSS can be seen a lot of ETH traffic and io/s on each server. While i > understand that the situation it's different I'm puzzled about the fact > that all the servers seems able to write to all the luns. > > SAN deviced usually should be connected to one server only, as paralled > access could create data corruption. In environments where you connect a > SAN to multiple servers ( example VMWARE cloud) its softeware task to > avoid data overwriting between server ( and data corruption ). > > Honestly, what i was expecting is: server1 writing on his own luns, and > data traffic ( ethernet) to the other 2 server , basically asking *them* > to write on the other luns. I dont know if this behaviour its normal or > not. I triied to find a documentation about that, but could not find any. > > Could somebody tell me if this _/"every server write to all the luns"/_ > its intended or not? > > Thanks in advance, > Salvatore > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at gpfsug.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > From kgunda at in.ibm.com Wed Nov 5 10:25:07 2014 From: kgunda at in.ibm.com (Kalyan Gunda) Date: Wed, 5 Nov 2014 15:55:07 +0530 Subject: [gpfsug-discuss] maybe a silly question about "old school" gpfs In-Reply-To: <5459F8DF.2090806@ebi.ac.uk> References: <5459F8DF.2090806@ebi.ac.uk> Message-ID: In case of SAN connectivity, all nodes can write to disks. This avoids going over the network to get to disks. Only when local access isn't present either due to connectivity or zoning will it use the defined NSD server. If there is a need to have the node always use a NSD server, you can enforce it via mount option -o usensdserver=always If the first nsd server is down, it will use the next NSD server in the list. In general NSD servers are a priority list of servers rather than a primary/secondary config which is the case when using native raid. Also note that multiple nodes accessing the same disk will not cause corruption as higher level token mgmt in GPFS will take care of data consistency. Regards Kalyan C Gunda STSM, Elastic Storage Development Member of The IBM Academy of Technology EGL D Block, Bangalore From: Salvatore Di Nardo To: gpfsug main discussion list Date: 11/05/2014 03:44 PM Subject: [gpfsug-discuss] maybe a silly question about "old school" gpfs Sent by: gpfsug-discuss-bounces at gpfsug.org Hello again, to understand better GPFS, recently i build up an test gpfs cluster using some old hardware that was going to be retired. THe storage was SAN devices, so instead to use native raids I went for the old school gpfs. the configuration is basically: 3x servers 3x san storages 2x san switches I did no zoning, so all the servers can see all the LUNs, but on nsd creation I gave each LUN a primary, secondary and third server. with the following rule: |-------------------+---------------+--------------------+---------------| |STORAGE |primary |secondary |tertiary | |-------------------+---------------+--------------------+---------------| |storage1 |server1 |server2 |server3 | |-------------------+---------------+--------------------+---------------| |storage2 |server2 |server3 |server1 | |-------------------+---------------+--------------------+---------------| |storage3 |server3 |server1 |server2 | |-------------------+---------------+--------------------+---------------| looking at the mmcrnsd, it was my understanding that the primary server is the one that wrote on the NSD unless it fails, then the following server take the ownership of the lun. Now come the question: when i did from server 1 a dd surprisingly i discovered that server1 was writing to all the luns. the other 2 server was doing nothing. this behaviour surprises me because on GSS only the RG owner can write, so one server "ask" the other server to write to his own RG's.In fact on GSS can be seen a lot of ETH traffic and io/s on each server. While i understand that the situation it's different I'm puzzled about the fact that all the servers seems able to write to all the luns. SAN deviced usually should be connected to one server only, as paralled access could create data corruption. In environments where you connect a SAN to multiple servers ( example VMWARE cloud) its softeware task to avoid data overwriting between server ( and data corruption ). Honestly, what? i was expecting is: server1 writing on his own luns, and data traffic ( ethernet) to the other 2 server , basically asking them to write on the other luns. I dont know if this behaviour its normal or not. I triied to find a documentation about that, but could not find any. Could somebody? tell me if this "every server write to all the luns" its intended or not? Thanks in advance, Salvatore_______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss From sdinardo at ebi.ac.uk Wed Nov 5 10:33:57 2014 From: sdinardo at ebi.ac.uk (Salvatore Di Nardo) Date: Wed, 05 Nov 2014 10:33:57 +0000 Subject: [gpfsug-discuss] maybe a silly question about "old school" gpfs In-Reply-To: <3F74C441-C25D-4F19-AD05-04AD897A08D3@gmail.com> References: <5459F8DF.2090806@ebi.ac.uk> <3F74C441-C25D-4F19-AD05-04AD897A08D3@gmail.com> Message-ID: <5459FD15.3070105@ebi.ac.uk> I understand that my test its a bit particular because the client was also one of the servers. Usually clients don't have direct access to the storages, but still it made think, hot the things are supposed to work. For example i did another test with 3 dd's, one each server. All the servers was writing to all the luns. In other words a lun was accessed in parallel by 3 servers. Its that a problem, or gpfs manage properly the concurrency and avoid data corruption? I'm asking because i was not expecting a server to write to an NSD he doesn't own, even if its locally available. I thought that the general availablity was for failover, not for parallel access. Regards, Salvatore On 05/11/14 10:22, Vic Cornell wrote: > Hi Salvatore, > > If you are doing the IO on the NSD server itself and it can see all of > the NSDs it will use its "local? access to write to the LUNS. > > You need some GPFS clients to see the workload spread across all of > the NSD servers. > > Vic > > > >> On 5 Nov 2014, at 10:15, Salvatore Di Nardo > > wrote: >> >> Hello again, >> to understand better GPFS, recently i build up an test gpfs cluster >> using some old hardware that was going to be retired. THe storage was >> SAN devices, so instead to use native raids I went for the old school >> gpfs. the configuration is basically: >> >> 3x servers >> 3x san storages >> 2x san switches >> >> I did no zoning, so all the servers can see all the LUNs, but on nsd >> creation I gave each LUN a primary, secondary and third server. with >> the following rule: >> >> STORAGE >> primary >> secondary >> tertiary >> storage1 >> server1 >> server2 server3 >> storage2 server2 server3 server1 >> storage3 server3 server1 server2 >> >> >> >> looking at the mmcrnsd, it was my understanding that the primary >> server is the one that wrote on the NSD unless it fails, then the >> following server take the ownership of the lun. >> >> Now come the question: >> when i did from server 1 a dd surprisingly i discovered that server1 >> was writing to all the luns. the other 2 server was doing nothing. >> this behaviour surprises me because on GSS only the RG owner can >> write, so one server "ask" the other server to write to his own >> RG's.In fact on GSS can be seen a lot of ETH traffic and io/s on each >> server. While i understand that the situation it's different I'm >> puzzled about the fact that all the servers seems able to write to >> all the luns. >> >> SAN deviced usually should be connected to one server only, as >> paralled access could create data corruption. In environments where >> you connect a SAN to multiple servers ( example VMWARE cloud) its >> softeware task to avoid data overwriting between server ( and data >> corruption ). >> >> Honestly, what i was expecting is: server1 writing on his own luns, >> and data traffic ( ethernet) to the other 2 server , basically asking >> *them* to write on the other luns. I dont know if this behaviour its >> normal or not. I triied to find a documentation about that, but could >> not find any. >> >> Could somebody tell me if this _/"every server write to all the >> luns"/_ its intended or not? >> >> Thanks in advance, >> Salvatore >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at gpfsug.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at gpfsug.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From jonathan at buzzard.me.uk Wed Nov 5 10:38:48 2014 From: jonathan at buzzard.me.uk (Jonathan Buzzard) Date: Wed, 05 Nov 2014 10:38:48 +0000 Subject: [gpfsug-discuss] maybe a silly question about "old school" gpfs In-Reply-To: <5459F8DF.2090806@ebi.ac.uk> References: <5459F8DF.2090806@ebi.ac.uk> Message-ID: <1415183928.3474.4.camel@buzzard.phy.strath.ac.uk> On Wed, 2014-11-05 at 10:15 +0000, Salvatore Di Nardo wrote: [SNIP] > Now come the question: > when i did from server 1 a dd surprisingly i discovered that server1 > was writing to all the luns. the other 2 server was doing nothing. > this behaviour surprises me because on GSS only the RG owner can > write, so one server "ask" the other server to write to his own > RG's.In fact on GSS can be seen a lot of ETH traffic and io/s on each > server. While i understand that the situation it's different I'm > puzzled about the fact that all the servers seems able to write to all > the luns. The difference is that in GSS the NSD servers are in effect doing software RAID on the disks. Therefore they and they alone can write to the NSD. In the traditional setup the NSD is on a RAID device on SAN controller and multiple machines are able to access the block device at the same time with token management in GPFS preventing corruption. I guess from a technical perspective you could have the GSS software RAID distributed between the NSD servers, but that would be rather more complex software and it is no surprise IBM have gone down the easy route. JAB. -- Jonathan A. Buzzard Email: jonathan (at) buzzard.me.uk Fife, United Kingdom. From viccornell at gmail.com Wed Nov 5 10:42:22 2014 From: viccornell at gmail.com (Vic Cornell) Date: Wed, 5 Nov 2014 10:42:22 +0000 Subject: [gpfsug-discuss] maybe a silly question about "old school" gpfs In-Reply-To: <5459FD15.3070105@ebi.ac.uk> References: <5459F8DF.2090806@ebi.ac.uk> <3F74C441-C25D-4F19-AD05-04AD897A08D3@gmail.com> <5459FD15.3070105@ebi.ac.uk> Message-ID: <75801708-F65D-4B39-82CA-6DC4FB5AA6EB@gmail.com> > On 5 Nov 2014, at 10:33, Salvatore Di Nardo wrote: > > I understand that my test its a bit particular because the client was also one of the servers. > Usually clients don't have direct access to the storages, but still it made think, hot the things are supposed to work. > > For example i did another test with 3 dd's, one each server. All the servers was writing to all the luns. > In other words a lun was accessed in parallel by 3 servers. > > Its that a problem, or gpfs manage properly the concurrency and avoid data corruption? Its not a problem if you use locks. Remember the clients - even the ones running on the NSD servers are talking to the filesystem - not to the LUNS/NSDs directly. It is the NSD processes that talk to the NSDs. So loosely speaking it is as if all of the processes you are running were running on a single system with a local filesystem So yes - gpfs is designed to manage the problems created by having a distributed, shared filesystem, and does a pretty good job IMHO. > I'm asking because i was not expecting a server to write to an NSD he doesn't own, even if its locally available. > I thought that the general availablity was for failover, not for parallel access. Bear in mind that GPFS supports a number of access models, one of which is where all of the systems in the cluster have access to all of the disks. So parallel access is most commonly used for failover, but that is not the limit of its capabilities. Vic > > > Regards, > Salvatore > > > > On 05/11/14 10:22, Vic Cornell wrote: >> Hi Salvatore, >> >> If you are doing the IO on the NSD server itself and it can see all of the NSDs it will use its "local? access to write to the LUNS. >> >> You need some GPFS clients to see the workload spread across all of the NSD servers. >> >> Vic >> >> >> >>> On 5 Nov 2014, at 10:15, Salvatore Di Nardo > wrote: >>> >>> Hello again, >>> to understand better GPFS, recently i build up an test gpfs cluster using some old hardware that was going to be retired. THe storage was SAN devices, so instead to use native raids I went for the old school gpfs. the configuration is basically: >>> >>> 3x servers >>> 3x san storages >>> 2x san switches >>> >>> I did no zoning, so all the servers can see all the LUNs, but on nsd creation I gave each LUN a primary, secondary and third server. with the following rule: >>> >>> STORAGE >>> primary >>> secondary >>> tertiary >>> storage1 >>> server1 >>> server2 server3 >>> storage2 server2 server3 server1 >>> storage3 server3 server1 server2 >>> >>> looking at the mmcrnsd, it was my understanding that the primary server is the one that wrote on the NSD unless it fails, then the following server take the ownership of the lun. >>> >>> Now come the question: >>> when i did from server 1 a dd surprisingly i discovered that server1 was writing to all the luns. the other 2 server was doing nothing. this behaviour surprises me because on GSS only the RG owner can write, so one server "ask" the other server to write to his own RG's.In fact on GSS can be seen a lot of ETH traffic and io/s on each server. While i understand that the situation it's different I'm puzzled about the fact that all the servers seems able to write to all the luns. >>> >>> SAN deviced usually should be connected to one server only, as paralled access could create data corruption. In environments where you connect a SAN to multiple servers ( example VMWARE cloud) its softeware task to avoid data overwriting between server ( and data corruption ). >>> >>> Honestly, what i was expecting is: server1 writing on his own luns, and data traffic ( ethernet) to the other 2 server , basically asking them to write on the other luns. I dont know if this behaviour its normal or not. I triied to find a documentation about that, but could not find any. >>> >>> Could somebody tell me if this "every server write to all the luns" its intended or not? >>> >>> Thanks in advance, >>> Salvatore >>> _______________________________________________ >>> gpfsug-discuss mailing list >>> gpfsug-discuss at gpfsug.org >>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >> >> >> >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at gpfsug.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at gpfsug.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From sdinardo at ebi.ac.uk Wed Nov 5 10:46:52 2014 From: sdinardo at ebi.ac.uk (Salvatore Di Nardo) Date: Wed, 05 Nov 2014 10:46:52 +0000 Subject: [gpfsug-discuss] maybe a silly question about "old school" gpfs In-Reply-To: References: <5459F8DF.2090806@ebi.ac.uk> Message-ID: <545A001C.1040908@ebi.ac.uk> On 05/11/14 10:25, Kalyan Gunda wrote: > Also note that multiple nodes accessing the same disk will not cause > corruption as higher level token mgmt in GPFS will take care of data > consistency. This is exactly what I wanted to be sure. Thanks! From ewahl at osc.edu Wed Nov 5 13:56:38 2014 From: ewahl at osc.edu (Ed Wahl) Date: Wed, 5 Nov 2014 13:56:38 +0000 Subject: [gpfsug-discuss] maybe a silly question about "old school" gpfs In-Reply-To: <545A001C.1040908@ebi.ac.uk> References: <5459F8DF.2090806@ebi.ac.uk> , <545A001C.1040908@ebi.ac.uk> Message-ID: You can designate how many of the nodes do token management as well. mmlscluster should show which are "manager"s. Under some circumstances you may want to increase the defaults on heavily used file systems using mmchnode, especially with few NSDs and many writers. Ed Wahl OSC ________________________________________ From: gpfsug-discuss-bounces at gpfsug.org [gpfsug-discuss-bounces at gpfsug.org] on behalf of Salvatore Di Nardo [sdinardo at ebi.ac.uk] Sent: Wednesday, November 05, 2014 5:46 AM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] maybe a silly question about "old school" gpfs On 05/11/14 10:25, Kalyan Gunda wrote: > Also note that multiple nodes accessing the same disk will not cause > corruption as higher level token mgmt in GPFS will take care of data > consistency. This is exactly what I wanted to be sure. Thanks! _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss From pavel.pokorny at datera.cz Fri Nov 7 11:15:34 2014 From: pavel.pokorny at datera.cz (Pavel Pokorny) Date: Fri, 7 Nov 2014 12:15:34 +0100 Subject: [gpfsug-discuss] GPFS - pagepool data protection? Message-ID: Hello to all, I would like to ask question about pagepool and protection of data written through pagepool. Is there a possibility of loosing data written to GPFS in situation that data are stored in pagepool but still not written to disks? I think that for regular file system work this can be solved using GPFS journal. What about using GPFS as a NFS store for VMware datastores? Thank you for your answers, Pavel -- Ing. Pavel Pokorn? DATERA s.r.o. | Ovocn? trh 580/2 | Praha | Czech Republic www.datera.cz | Mobil: +420 602 357 194 | E-mail: pavel.pokorny at datera.cz -------------- next part -------------- An HTML attachment was scrubbed... URL: From lhorrocks-barlow at ocf.co.uk Wed Nov 5 10:47:06 2014 From: lhorrocks-barlow at ocf.co.uk (Laurence Horrocks- Barlow) Date: Wed, 5 Nov 2014 10:47:06 +0000 Subject: [gpfsug-discuss] maybe a silly question about "old school" gpfs In-Reply-To: <5459FD15.3070105@ebi.ac.uk> References: <5459F8DF.2090806@ebi.ac.uk> <3F74C441-C25D-4F19-AD05-04AD897A08D3@gmail.com> <5459FD15.3070105@ebi.ac.uk> Message-ID: <545A002A.4080301@ocf.co.uk> Hi Salvatore, GSS and GPFS systems are different beasts. In a traditional GPFS configuration I would expect any NSD server to write to any/all LUN's that it can see as a local disk providing it's part of the same FS. In GSS there is effectively a software RAID level added on top of the disks, with this I would expect only the RG owner to write down to the vdisk. As for corruption, GPFS uses a token system to manage access to LUN's, Metadata, etc. Kind Regards, Laurence Horrocks-Barlow Linux Systems Software Engineer OCF plc Tel: +44 (0)114 257 2200 Fax: +44 (0)114 257 0022 Web: www.ocf.co.uk Blog: blog.ocf.co.uk Twitter: @ocfplc OCF plc is a company registered in England and Wales. Registered number 4132533, VAT number GB 780 6803 14. Registered office address: OCF plc, 5 Rotunda Business Centre, Thorncliffe Park, Chapeltown, Sheffield, S35 2PG. This message is private and confidential. If you have received this message in error, please notify us and remove it from your system. On 11/05/2014 10:33 AM, Salvatore Di Nardo wrote: > I understand that my test its a bit particular because the client was > also one of the servers. > Usually clients don't have direct access to the storages, but still it > made think, hot the things are supposed to work. > > For example i did another test with 3 dd's, one each server. All the > servers was writing to all the luns. > In other words a lun was accessed in parallel by 3 servers. > > Its that a problem, or gpfs manage properly the concurrency and avoid > data corruption? > I'm asking because i was not expecting a server to write to an NSD he > doesn't own, even if its locally available. > I thought that the general availablity was for failover, not for > parallel access. > > > Regards, > Salvatore > > > > On 05/11/14 10:22, Vic Cornell wrote: >> Hi Salvatore, >> >> If you are doing the IO on the NSD server itself and it can see all >> of the NSDs it will use its "local? access to write to the LUNS. >> >> You need some GPFS clients to see the workload spread across all of >> the NSD servers. >> >> Vic >> >> >> >>> On 5 Nov 2014, at 10:15, Salvatore Di Nardo >> > wrote: >>> >>> Hello again, >>> to understand better GPFS, recently i build up an test gpfs cluster >>> using some old hardware that was going to be retired. THe storage >>> was SAN devices, so instead to use native raids I went for the old >>> school gpfs. the configuration is basically: >>> >>> 3x servers >>> 3x san storages >>> 2x san switches >>> >>> I did no zoning, so all the servers can see all the LUNs, but on nsd >>> creation I gave each LUN a primary, secondary and third server. with >>> the following rule: >>> >>> STORAGE >>> primary >>> secondary >>> tertiary >>> storage1 >>> server1 >>> server2 server3 >>> storage2 server2 server3 server1 >>> storage3 server3 server1 server2 >>> >>> >>> >>> looking at the mmcrnsd, it was my understanding that the primary >>> server is the one that wrote on the NSD unless it fails, then the >>> following server take the ownership of the lun. >>> >>> Now come the question: >>> when i did from server 1 a dd surprisingly i discovered that server1 >>> was writing to all the luns. the other 2 server was doing nothing. >>> this behaviour surprises me because on GSS only the RG owner can >>> write, so one server "ask" the other server to write to his own >>> RG's.In fact on GSS can be seen a lot of ETH traffic and io/s on >>> each server. While i understand that the situation it's different >>> I'm puzzled about the fact that all the servers seems able to write >>> to all the luns. >>> >>> SAN deviced usually should be connected to one server only, as >>> paralled access could create data corruption. In environments where >>> you connect a SAN to multiple servers ( example VMWARE cloud) its >>> softeware task to avoid data overwriting between server ( and data >>> corruption ). >>> >>> Honestly, what i was expecting is: server1 writing on his own luns, >>> and data traffic ( ethernet) to the other 2 server , basically >>> asking *them* to write on the other luns. I dont know if this >>> behaviour its normal or not. I triied to find a documentation about >>> that, but could not find any. >>> >>> Could somebody tell me if this _/"every server write to all the >>> luns"/_ its intended or not? >>> >>> Thanks in advance, >>> Salvatore >>> _______________________________________________ >>> gpfsug-discuss mailing list >>> gpfsug-discuss at gpfsug.org >>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >> >> >> >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at gpfsug.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: From dhildeb at us.ibm.com Fri Nov 7 22:42:06 2014 From: dhildeb at us.ibm.com (Dean Hildebrand) Date: Fri, 7 Nov 2014 23:42:06 +0100 Subject: [gpfsug-discuss] GPFS - pagepool data protection? In-Reply-To: References: Message-ID: Hi Paul, GPFS correctly implements POSIX semantics and NFS close-to-open semantics. Its a little complicated, but effectively what this means is that when the application issues certain calls to ensure data/metadata is "stable" (e.g., fsync), then it is guaranteed to be stable. It also controls ordering between nodes among many other things. As part of making sure data is stable, the GPFS recovery journal is used in a variety of instances. With VMWare ESX using NFS to GPFS, then the same thing occurs, except the situation is even more simple since every write request will have the 'stable' flag set, ensuring it does writethrough to the storage system. Dean Hildebrand IBM Almaden Research Center From: Pavel Pokorny To: gpfsug-discuss at gpfsug.org Date: 11/07/2014 03:15 AM Subject: [gpfsug-discuss] GPFS - pagepool data protection? Sent by: gpfsug-discuss-bounces at gpfsug.org Hello to all, I would like to ask question about pagepool and protection of data written through pagepool. Is there a possibility of loosing data written to GPFS in situation that data are stored in pagepool but still not written to disks? I think that for regular file system work this can be solved using GPFS journal. What about using GPFS as a NFS store for VMware datastores? Thank you for your answers, Pavel -- Ing. Pavel Pokorn? DATERA s.r.o.?|?Ovocn? trh 580/2?|?Praha?|?Czech Republic www.datera.cz?|?Mobil:?+420 602 357 194?|?E-mail:?pavel.pokorny at datera.cz _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: From jamiedavis at us.ibm.com Sat Nov 8 23:13:17 2014 From: jamiedavis at us.ibm.com (James Davis) Date: Sat, 8 Nov 2014 18:13:17 -0500 Subject: [gpfsug-discuss] Hi everybody Message-ID: Hey all, My name is Jamie Davis and I work for IBM on the GPFS test team. I'm interested in learning more about how customers use GPFS and what typical questions and issues are like, and I thought joining this mailing list would be a good start. If my presence seems inappropriate or makes anyone uncomfortable I can leave the list. --- I don't know how often GPFS users look in /usr/lpp/mmfs/samples...but while I'm sending a mass email, I thought I'd take a moment to point anyone running GPFS 4.1.0.4 to /usr/lpp/mmfs/samples/ilm/mmfind. mmfind is basically a find-esque wrapper around mmapplypolicy that I wrote in response to complaints I've heard about the learning curve associated with writing policies for mmapplypolicy. Since it's in samples, use-at-your-own-risk and I make no promise that everything works correctly. The -skipPolicy and -saveTmpFiles flags will do everything but actually run mmapplypolicy -- I suggest you double-check its work before you run it on a production system. Please send me any comments on it if you give it a try! Jamie Davis GPFS Test IBM -------------- next part -------------- An HTML attachment was scrubbed... URL: From chair at gpfsug.org Mon Nov 10 16:18:24 2014 From: chair at gpfsug.org (Jez Tucker) Date: Mon, 10 Nov 2014 16:18:24 +0000 Subject: [gpfsug-discuss] SC 14 and storagebeers events this week Message-ID: <5460E550.8020705@gpfsug.org> Hi all Just a quick reminder that the IBM GPFS User Group is at SC '14 in New Orleans Nov 17th. Also, there's a social in London W1 - #storagebeers on Nov 13th. For more info on both of these, please see the main website: www.gpfsug.org Best, Jez -------------- next part -------------- An HTML attachment was scrubbed... URL: From chair at gpfsug.org Tue Nov 11 13:59:38 2014 From: chair at gpfsug.org (Jez Tucker (Chair)) Date: Tue, 11 Nov 2014 13:59:38 +0000 Subject: [gpfsug-discuss] storagebeers postponed Message-ID: <5462164A.70607@gpfsug.org> Hi all I've just received notification that #storagebeers, due to happen 13th Nov, has unfortunately had to be postponed. I'll update you all with a new date when I receive it. Very best, Jez From jez at rib-it.org Tue Nov 11 16:49:48 2014 From: jez at rib-it.org (Jez Tucker) Date: Tue, 11 Nov 2014 16:49:48 +0000 Subject: [gpfsug-discuss] Hi everybody In-Reply-To: References: Message-ID: <54623E2C.2070903@rib-it.org> Hi Jamie, You're indeed very welcome. A few of the IBM devs are list members and their presence is appreciated. I suggest if you want to know more regarding use cases etc., ask some pointed questions. Discussion is good. Jez On 08/11/14 23:13, James Davis wrote: > > Hey all, > > My name is Jamie Davis and I work for IBM on the GPFS test team. I'm > interested in learning more about how customers use GPFS and what > typical questions and issues are like, and I thought joining this > mailing list would be a good start. If my presence seems inappropriate > or makes anyone uncomfortable I can leave the list. > > --- > > I don't know how often GPFS users look in /usr/lpp/mmfs/samples...but > while I'm sending a mass email, I thought I'd take a moment to point > anyone running GPFS 4.1.0.4 to /usr/lpp/mmfs/samples/ilm/mmfind. > mmfind is basically a find-esque wrapper around mmapplypolicy that I > wrote in response to complaints I've heard about the learning curve > associated with writing policies for mmapplypolicy. Since it's in > samples, use-at-your-own-risk and I make no promise that everything > works correctly. The -skipPolicy and -saveTmpFiles flags will do > everything but actually run mmapplypolicy -- I suggest you > double-check its work before you run it on a production system. > > Please send me any comments on it if you give it a try! > > Jamie Davis > GPFS Test > IBM > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at gpfsug.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From pavel.pokorny at datera.cz Wed Nov 12 12:20:57 2014 From: pavel.pokorny at datera.cz (Pavel Pokorny) Date: Wed, 12 Nov 2014 13:20:57 +0100 Subject: [gpfsug-discuss] GPFS - pagepool data protection? Message-ID: Hi, thanks. A I understand the write process to GPFS filesystem: 1. Application on a node makes write call 2. Token Manager stuff is done to coordinate the required-byte-range 3. mmfsd gets metadata from the file?s metanode 4. mmfsd acquires a buffer from the page pool 5. Data is moved from application data buffer to page pool buffer 6. VSD layer copies data from the page pool to the send pool and so on. What I am looking at and want to clarify is step 5. Situation when data is moved to page pool. What happen if the server will crash at tjis point? Will GPFS use journal to get to stable state? Thank you, Pavel -- Ing. Pavel Pokorn? DATERA s.r.o. | Ovocn? trh 580/2 | Praha | Czech Republic www.datera.cz | Mobil: +420 602 357 194 | E-mail: pavel.pokorny at datera.cz On Sat, Nov 8, 2014 at 1:00 PM, wrote: > Send gpfsug-discuss mailing list submissions to > gpfsug-discuss at gpfsug.org > > To subscribe or unsubscribe via the World Wide Web, visit > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > or, via email, send a message with subject or body 'help' to > gpfsug-discuss-request at gpfsug.org > > You can reach the person managing the list at > gpfsug-discuss-owner at gpfsug.org > > When replying, please edit your Subject line so it is more specific > than "Re: Contents of gpfsug-discuss digest..." > > > Today's Topics: > > 1. Re: GPFS - pagepool data protection? (Dean Hildebrand) > > > ---------------------------------------------------------------------- > > Message: 1 > Date: Fri, 7 Nov 2014 23:42:06 +0100 > From: Dean Hildebrand > To: gpfsug main discussion list > Subject: Re: [gpfsug-discuss] GPFS - pagepool data protection? > Message-ID: > < > OF1ED92A57.DD700837-ONC1257D89.007C4EF1-88257D89.007CB453 at us.ibm.com> > Content-Type: text/plain; charset="iso-8859-1" > > > Hi Paul, > > GPFS correctly implements POSIX semantics and NFS close-to-open semantics. > Its a little complicated, but effectively what this means is that when the > application issues certain calls to ensure data/metadata is "stable" (e.g., > fsync), then it is guaranteed to be stable. It also controls ordering > between nodes among many other things. As part of making sure data is > stable, the GPFS recovery journal is used in a variety of instances. > > With VMWare ESX using NFS to GPFS, then the same thing occurs, except the > situation is even more simple since every write request will have the > 'stable' flag set, ensuring it does writethrough to the storage system. > > Dean Hildebrand > IBM Almaden Research Center > > > > > From: Pavel Pokorny > To: gpfsug-discuss at gpfsug.org > Date: 11/07/2014 03:15 AM > Subject: [gpfsug-discuss] GPFS - pagepool data protection? > Sent by: gpfsug-discuss-bounces at gpfsug.org > > > > Hello to all, > I would like to ask question about pagepool and protection of data written > through pagepool. > Is there a possibility of loosing data written to GPFS in situation that > data are stored in pagepool but still not written to disks? > I think that for regular file system work this can be solved using GPFS > journal. What about using GPFS as a NFS store for VMware datastores? > Thank you for your answers, > Pavel > -- > Ing. Pavel Pokorn? > DATERA s.r.o.?|?Ovocn? trh 580/2?|?Praha?|?Czech Republic > www.datera.cz?|?Mobil:?+420 602 357 194?|?E-mail:?pavel.pokorny at datera.cz > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at gpfsug.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -------------- next part -------------- > An HTML attachment was scrubbed... > URL: < > http://gpfsug.org/pipermail/gpfsug-discuss/attachments/20141107/ecec5a47/attachment-0001.html > > > -------------- next part -------------- > A non-text attachment was scrubbed... > Name: graycol.gif > Type: image/gif > Size: 105 bytes > Desc: not available > URL: < > http://gpfsug.org/pipermail/gpfsug-discuss/attachments/20141107/ecec5a47/attachment-0001.gif > > > > ------------------------------ > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at gpfsug.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > End of gpfsug-discuss Digest, Vol 34, Issue 7 > ********************************************* > -------------- next part -------------- An HTML attachment was scrubbed... URL: From kraemerf at de.ibm.com Wed Nov 12 14:05:03 2014 From: kraemerf at de.ibm.com (Frank Kraemer) Date: Wed, 12 Nov 2014 15:05:03 +0100 Subject: [gpfsug-discuss] IBM Software Defined Infrastructure Roadshow 2014 (Frankfurt, London & Paris) Message-ID: FYI: IBM Software Defined Infrastructure Roadshow 2014 (Frankfurt 02. Dec 2014, London 03. Dec 2014 & Paris 04. Dec 2014) https://www-950.ibm.com/events/wwe/grp/grp019.nsf/v17_events?openform&lp=platform_computing_roadshow&locale=en_GB P.S. The German GPFS technical team will be available for discussions in Frankfurt. Feel free to contact me. -frank- Frank Kraemer IBM Consulting IT Specialist / Client Technical Architect Hechtsheimer Str. 2, 55131 Mainz mailto:kraemerf at de.ibm.com voice: +49171-3043699 IBM Germany From dhildeb at us.ibm.com Sat Nov 15 20:31:53 2014 From: dhildeb at us.ibm.com (Dean Hildebrand) Date: Sat, 15 Nov 2014 12:31:53 -0800 Subject: [gpfsug-discuss] GPFS - pagepool data protection? In-Reply-To: References: Message-ID: Hi Pavel, You are more or less right in your description, but the key that I tried to convey in my first email is that GPFS only obey's POSIX. So your question can be answered by looking at how your application performs the write and does your application ask to make the data live only in the pagepool or on stable storage. By default posix says that file create and writes are unstable, so just doing a write puts it in the pagepool and will be lost if a crash occurs immediately after. To make it stable, the application must do something in posix to make it stable, of which there are many ways to do so, including but not limited to O_SYNC, DIO, some form of fsync post write, etc, etc... Dean Hildebrand IBM Almaden Research Center From: Pavel Pokorny To: gpfsug-discuss at gpfsug.org Date: 11/12/2014 04:21 AM Subject: Re: [gpfsug-discuss] GPFS - pagepool data protection? Sent by: gpfsug-discuss-bounces at gpfsug.org Hi, thanks. A I understand the write process to GPFS filesystem: 1.?Application on a node makes write call 2.?Token Manager stuff is done to coordinate the required-byte-range 3.?mmfsd gets metadata from the file?s metanode 4.?mmfsd acquires a buffer from the page pool 5.?Data is moved from application data buffer to page pool buffer 6. VSD layer copies data from the page pool to the send pool ?and so on. What I am looking at and want to clarify is step 5. Situation when data is moved to page pool. What happen if the server will crash at tjis point? Will GPFS use journal to get to stable state? Thank you, Pavel -- Ing. Pavel Pokorn? DATERA s.r.o.?|?Ovocn? trh 580/2?|?Praha?|?Czech Republic www.datera.cz?|?Mobil:?+420 602 357 194?|?E-mail:?pavel.pokorny at datera.cz On Sat, Nov 8, 2014 at 1:00 PM, wrote: Send gpfsug-discuss mailing list submissions to ? ? ? ? gpfsug-discuss at gpfsug.org To subscribe or unsubscribe via the World Wide Web, visit ? ? ? ? http://gpfsug.org/mailman/listinfo/gpfsug-discuss or, via email, send a message with subject or body 'help' to ? ? ? ? gpfsug-discuss-request at gpfsug.org You can reach the person managing the list at ? ? ? ? gpfsug-discuss-owner at gpfsug.org When replying, please edit your Subject line so it is more specific than "Re: Contents of gpfsug-discuss digest..." Today's Topics: ? ?1. Re: GPFS - pagepool data protection? (Dean Hildebrand) ---------------------------------------------------------------------- Message: 1 Date: Fri, 7 Nov 2014 23:42:06 +0100 From: Dean Hildebrand To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] GPFS - pagepool data protection? Message-ID: ? ? ? ? < OF1ED92A57.DD700837-ONC1257D89.007C4EF1-88257D89.007CB453 at us.ibm.com> Content-Type: text/plain; charset="iso-8859-1" Hi Paul, GPFS correctly implements POSIX semantics and NFS close-to-open semantics. Its a little complicated, but effectively what this means is that when the application issues certain calls to ensure data/metadata is "stable" (e.g., fsync), then it is guaranteed to be stable.? It also controls ordering between nodes among many other things.? As part of making sure data is stable, the GPFS recovery journal is used in a variety of instances. With VMWare ESX using NFS to GPFS, then the same thing occurs, except the situation is even more simple since every write request will have the 'stable' flag set, ensuring it does writethrough to the storage system. Dean Hildebrand IBM Almaden Research Center From:? ?Pavel Pokorny To:? ? ?gpfsug-discuss at gpfsug.org Date:? ?11/07/2014 03:15 AM Subject:? ? ? ? [gpfsug-discuss] GPFS - pagepool data protection? Sent by:? ? ? ? gpfsug-discuss-bounces at gpfsug.org Hello to all, I would like to ask question about pagepool and protection of data written through pagepool. Is there a possibility of loosing data written to GPFS in situation that data are stored in pagepool but still not written to disks? I think that for regular file system work this can be solved using GPFS journal. What about using GPFS as a NFS store for VMware datastores? Thank you for your answers, Pavel -- Ing. Pavel Pokorn? DATERA s.r.o.?|?Ovocn? trh 580/2?|?Praha?|?Czech Republic www.datera.cz?|?Mobil:?+420 602 357 194?|?E-mail:?pavel.pokorny at datera.cz _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: < http://gpfsug.org/pipermail/gpfsug-discuss/attachments/20141107/ecec5a47/attachment-0001.html > -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: < http://gpfsug.org/pipermail/gpfsug-discuss/attachments/20141107/ecec5a47/attachment-0001.gif > ------------------------------ _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss End of gpfsug-discuss Digest, Vol 34, Issue 7 ********************************************* _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: From seanlee at tw.ibm.com Mon Nov 17 09:49:39 2014 From: seanlee at tw.ibm.com (Sean S Lee) Date: Mon, 17 Nov 2014 17:49:39 +0800 Subject: [gpfsug-discuss] GPFS - pagepool data protection? In-Reply-To: References: Message-ID: Hi Pavel, Most popular filesystems work that way. Write buffering improves the performance at the expense of some risk. Today most applications and all modern OS correctly handle "crash consistency", meaning they can recover from uncommitted writes. If you have data which absolutely cannot tolerate any "in-flight" data loss, it requires significant planning and resources on multiple levels, but as far as GPFS is concerned you could create a small file system and data (VMDK's) or serve GPFS or cNFS (mount GPFS with "syncfs", mount NFS with sync,no_wdelay) to VM clients from those filesystems. Your VM OS (VMDK) could be on a regular GPFS file system and your app data and logs could be on a small GPFS with synchronous writes. Regards Sean -------------- next part -------------- An HTML attachment was scrubbed... URL: From pavel.pokorny at datera.cz Mon Nov 17 12:49:26 2014 From: pavel.pokorny at datera.cz (Pavel Pokorny) Date: Mon, 17 Nov 2014 13:49:26 +0100 Subject: [gpfsug-discuss] GPFS - pagepool data protection? Message-ID: Hello, thanks you for all the answers, It is more clear now. Regards, Pavel -- Ing. Pavel Pokorn? DATERA s.r.o. | Ovocn? trh 580/2 | Praha | Czech Republic www.datera.cz | Mobil: +420 602 357 194 | E-mail: pavel.pokorny at datera.cz On Mon, Nov 17, 2014 at 1:00 PM, wrote: > Send gpfsug-discuss mailing list submissions to > gpfsug-discuss at gpfsug.org > > To subscribe or unsubscribe via the World Wide Web, visit > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > or, via email, send a message with subject or body 'help' to > gpfsug-discuss-request at gpfsug.org > > You can reach the person managing the list at > gpfsug-discuss-owner at gpfsug.org > > When replying, please edit your Subject line so it is more specific > than "Re: Contents of gpfsug-discuss digest..." > > > Today's Topics: > > 1. GPFS - pagepool data protection? (Sean S Lee) > > > ---------------------------------------------------------------------- > > Message: 1 > Date: Mon, 17 Nov 2014 17:49:39 +0800 > From: Sean S Lee > To: gpfsug-discuss at gpfsug.org > Subject: [gpfsug-discuss] GPFS - pagepool data protection? > Message-ID: > < > OF20A72494.9E59B93F-ON48257D93.00350BA6-48257D93.0035F912 at tw.ibm.com> > Content-Type: text/plain; charset="us-ascii" > > > Hi Pavel, > > Most popular filesystems work that way. > > Write buffering improves the performance at the expense of some risk. > Today most applications and all modern OS correctly handle "crash > consistency", meaning they can recover from uncommitted writes. > > If you have data which absolutely cannot tolerate any "in-flight" data > loss, it requires significant planning and resources on multiple levels, > but as far as GPFS is concerned you could create a small file system and > data (VMDK's) or serve GPFS or cNFS (mount GPFS with "syncfs", mount NFS > with sync,no_wdelay) to VM clients from those filesystems. > Your VM OS (VMDK) could be on a regular GPFS file system and your app data > and logs could be on a small GPFS with synchronous writes. > > Regards > Sean > -------------- next part -------------- > An HTML attachment was scrubbed... > URL: < > http://gpfsug.org/pipermail/gpfsug-discuss/attachments/20141117/1eb905cc/attachment-0001.html > > > > ------------------------------ > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at gpfsug.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > End of gpfsug-discuss Digest, Vol 34, Issue 13 > ********************************************** > -------------- next part -------------- An HTML attachment was scrubbed... URL: From orlando.richards at ed.ac.uk Wed Nov 19 16:35:44 2014 From: orlando.richards at ed.ac.uk (Orlando Richards) Date: Wed, 19 Nov 2014 16:35:44 +0000 Subject: [gpfsug-discuss] GPFS inside OpenStack guests Message-ID: <546CC6E0.1010800@ed.ac.uk> Hi folks, Does anyone have experience of running GPFS inside OpenStack guests, to connect to an existing (traditional, "bare metal") GPFS filesystem owning cluster? This is not using GPFS for openstack block/image storage - but using GPFS as a "NAS" service, with openstack guest instances as as a "GPFS client". --- Orlando -- -- Dr Orlando Richards Research Facilities (ECDF) Systems Leader Information Services IT Infrastructure Division Tel: 0131 650 4994 skype: orlando.richards The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336. From S.J.Thompson at bham.ac.uk Wed Nov 19 18:36:30 2014 From: S.J.Thompson at bham.ac.uk (Simon Thompson (Research Computing - IT Services)) Date: Wed, 19 Nov 2014 18:36:30 +0000 Subject: [gpfsug-discuss] GPFS inside OpenStack guests In-Reply-To: <546CC6E0.1010800@ed.ac.uk> References: <546CC6E0.1010800@ed.ac.uk> Message-ID: I was asking this question at the gpfs forum on Monday at sc, but there didn't seem to be much in how wr could do it. One of the suggestions was to basically use nfs, or there is the Manilla compnents of Openstack coming, but still that isn't really true gpfs access. I did wonder about virtio, but whether that would work with gpfs passed from the hosting system. Simon ________________________________________ From: gpfsug-discuss-bounces at gpfsug.org [gpfsug-discuss-bounces at gpfsug.org] on behalf of Orlando Richards [orlando.richards at ed.ac.uk] Sent: 19 November 2014 16:35 To: gpfsug-discuss at gpfsug.org Subject: [gpfsug-discuss] GPFS inside OpenStack guests Hi folks, Does anyone have experience of running GPFS inside OpenStack guests, to connect to an existing (traditional, "bare metal") GPFS filesystem owning cluster? This is not using GPFS for openstack block/image storage - but using GPFS as a "NAS" service, with openstack guest instances as as a "GPFS client". --- Orlando -- -- Dr Orlando Richards Research Facilities (ECDF) Systems Leader Information Services IT Infrastructure Division Tel: 0131 650 4994 skype: orlando.richards The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss From oehmes at gmail.com Wed Nov 19 19:00:50 2014 From: oehmes at gmail.com (Sven Oehme) Date: Wed, 19 Nov 2014 11:00:50 -0800 Subject: [gpfsug-discuss] GPFS inside OpenStack guests In-Reply-To: <546CC6E0.1010800@ed.ac.uk> References: <546CC6E0.1010800@ed.ac.uk> Message-ID: technically there are multiple ways to do this. 1. you can use the NSD protocol for this, just need to have adequate Network resources (or use PCI pass trough of the network adapter to the guest) 2. you attach the physical disks as virtio block devices 3. pass trough of the Block HBA (e.g. FC adapter) into the guest. if you use virtio you need to make sure all caching is disabled entirely or you end up with major issues and i am not sure about official support for this, 1 and 3 are straight forward ... Sven On Wed, Nov 19, 2014 at 8:35 AM, Orlando Richards wrote: > Hi folks, > > Does anyone have experience of running GPFS inside OpenStack guests, to > connect to an existing (traditional, "bare metal") GPFS filesystem owning > cluster? > > This is not using GPFS for openstack block/image storage - but using GPFS > as a "NAS" service, with openstack guest instances as as a "GPFS client". > > > --- > Orlando > > > > > -- > -- > Dr Orlando Richards > Research Facilities (ECDF) Systems Leader > Information Services > IT Infrastructure Division > Tel: 0131 650 4994 > skype: orlando.richards > > The University of Edinburgh is a charitable body, registered in Scotland, > with registration number SC005336. > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at gpfsug.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: From S.J.Thompson at bham.ac.uk Wed Nov 19 19:03:55 2014 From: S.J.Thompson at bham.ac.uk (Simon Thompson (Research Computing - IT Services)) Date: Wed, 19 Nov 2014 19:03:55 +0000 Subject: [gpfsug-discuss] GPFS inside OpenStack guests In-Reply-To: References: <546CC6E0.1010800@ed.ac.uk>, Message-ID: Yes, what about the random name nature of a vm image? For example I spin up a new vm, how does it join the gpfs cluster to be able to use nsd protocol? And how about attaching to the netowkrk as neutron networking uses per tenant networks, so how would you actually get access to the gpfs cluster? Simon ________________________________________ From: gpfsug-discuss-bounces at gpfsug.org [gpfsug-discuss-bounces at gpfsug.org] on behalf of Sven Oehme [oehmes at gmail.com] Sent: 19 November 2014 19:00 To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] GPFS inside OpenStack guests technically there are multiple ways to do this. 1. you can use the NSD protocol for this, just need to have adequate Network resources (or use PCI pass trough of the network adapter to the guest) 2. you attach the physical disks as virtio block devices 3. pass trough of the Block HBA (e.g. FC adapter) into the guest. if you use virtio you need to make sure all caching is disabled entirely or you end up with major issues and i am not sure about official support for this, 1 and 3 are straight forward ... Sven On Wed, Nov 19, 2014 at 8:35 AM, Orlando Richards > wrote: Hi folks, Does anyone have experience of running GPFS inside OpenStack guests, to connect to an existing (traditional, "bare metal") GPFS filesystem owning cluster? This is not using GPFS for openstack block/image storage - but using GPFS as a "NAS" service, with openstack guest instances as as a "GPFS client". --- Orlando -- -- Dr Orlando Richards Research Facilities (ECDF) Systems Leader Information Services IT Infrastructure Division Tel: 0131 650 4994 skype: orlando.richards The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss From chekh at stanford.edu Wed Nov 19 19:37:50 2014 From: chekh at stanford.edu (Alex Chekholko) Date: Wed, 19 Nov 2014 11:37:50 -0800 Subject: [gpfsug-discuss] GPFS inside OpenStack guests In-Reply-To: References: <546CC6E0.1010800@ed.ac.uk>, Message-ID: <546CF18E.3010802@stanford.edu> Just make the new VMs NFS clients, no? It's so much simpler and the performance is not much less. But you do need to run CNFS in the GPFS cluster. On 11/19/14 11:03 AM, Simon Thompson (Research Computing - IT Services) wrote: > > Yes, what about the random name nature of a vm image? > > For example I spin up a new vm, how does it join the gpfs cluster to be able to use nsd protocol? > > And how about attaching to the netowkrk as neutron networking uses per tenant networks, so how would you actually get access to the gpfs cluster? > > Simon > ________________________________________ > From: gpfsug-discuss-bounces at gpfsug.org [gpfsug-discuss-bounces at gpfsug.org] on behalf of Sven Oehme [oehmes at gmail.com] > Sent: 19 November 2014 19:00 > To: gpfsug main discussion list > Subject: Re: [gpfsug-discuss] GPFS inside OpenStack guests > > technically there are multiple ways to do this. > > 1. you can use the NSD protocol for this, just need to have adequate Network resources (or use PCI pass trough of the network adapter to the guest) > 2. you attach the physical disks as virtio block devices > 3. pass trough of the Block HBA (e.g. FC adapter) into the guest. > > if you use virtio you need to make sure all caching is disabled entirely or you end up with major issues and i am not sure about official support for this, 1 and 3 are straight forward ... > > Sven > > > > > > On Wed, Nov 19, 2014 at 8:35 AM, Orlando Richards > wrote: > Hi folks, > > Does anyone have experience of running GPFS inside OpenStack guests, to connect to an existing (traditional, "bare metal") GPFS filesystem owning cluster? > > This is not using GPFS for openstack block/image storage - but using GPFS as a "NAS" service, with openstack guest instances as as a "GPFS client". > > > --- > Orlando > > > > > -- > -- > Dr Orlando Richards > Research Facilities (ECDF) Systems Leader > Information Services > IT Infrastructure Division > Tel: 0131 650 4994 > skype: orlando.richards > > The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336. > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at gpfsug.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at gpfsug.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -- Alex Chekholko chekh at stanford.edu From orlando.richards at ed.ac.uk Wed Nov 19 20:56:32 2014 From: orlando.richards at ed.ac.uk (orlando.richards at ed.ac.uk) Date: Wed, 19 Nov 2014 20:56:32 +0000 (GMT) Subject: [gpfsug-discuss] GPFS inside OpenStack guests In-Reply-To: References: <546CC6E0.1010800@ed.ac.uk> Message-ID: On Wed, 19 Nov 2014, Simon Thompson (Research Computing - IT Services) wrote: > > I was asking this question at the gpfs forum on Monday at sc, but there didn't seem to be much in how wr could do it. > > One of the suggestions was to basically use nfs, or there is the Manilla compnents of Openstack coming, but still that isn't really true gpfs access. > NFS should be easy enough - but you can lose a lot of the gpfs good-ness by doing that (acl's, cloning, performance?, etc). > I did wonder about virtio, but whether that would work with gpfs passed from the hosting system. I was more looking for something fairly native - so that we don't have to, for example, start heavily customising the hypervisor stack. In fact - if you're pushing out to a third-party service provider cloud (and that could be your internal organisation's cloud run as a separate service) then you don't have that option at all. I've not dug into virtio much in a basic kvm hypervisor, but one of the guys in EPCC has been trying it out. Initial impressions (once he got it working!) were tarred by terrible performance. I've not caught up with how he got on after that initial look. > > Simon > ________________________________________ > From: gpfsug-discuss-bounces at gpfsug.org [gpfsug-discuss-bounces at gpfsug.org] on behalf of Orlando Richards [orlando.richards at ed.ac.uk] > Sent: 19 November 2014 16:35 > To: gpfsug-discuss at gpfsug.org > Subject: [gpfsug-discuss] GPFS inside OpenStack guests > > Hi folks, > > Does anyone have experience of running GPFS inside OpenStack guests, to > connect to an existing (traditional, "bare metal") GPFS filesystem > owning cluster? > > This is not using GPFS for openstack block/image storage - but using > GPFS as a "NAS" service, with openstack guest instances as as a "GPFS > client". > > > --- > Orlando > > > > > -- > -- > Dr Orlando Richards > Research Facilities (ECDF) Systems Leader > Information Services > IT Infrastructure Division > Tel: 0131 650 4994 > skype: orlando.richards > > The University of Edinburgh is a charitable body, registered in > Scotland, with registration number SC005336. > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at gpfsug.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at gpfsug.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > -- -- Dr Orlando Richards Research Facilities (ECDF) Systems Leader Information Services IT Infrastructure Division Tel: 0131 650 4994 skype: orlando.richards The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336. From orlando.richards at ed.ac.uk Wed Nov 19 20:56:38 2014 From: orlando.richards at ed.ac.uk (orlando.richards at ed.ac.uk) Date: Wed, 19 Nov 2014 20:56:38 +0000 (GMT) Subject: [gpfsug-discuss] GPFS inside OpenStack guests In-Reply-To: References: <546CC6E0.1010800@ed.ac.uk>, Message-ID: On Wed, 19 Nov 2014, Simon Thompson (Research Computing - IT Services) wrote: > > Yes, what about the random name nature of a vm image? > > > For example I spin up a new vm, how does it join the gpfs cluster to be able to use nsd protocol? I *think* this bit should be solvable - assuming one can pre-define the range of names the node will have, and can pre-populate your gpfs cluster config with these node names. The guest image should then have the full /var/mmfs tree (pulled from another gpfs node), but with the /var/mmfs/gen/mmfsNodeData file removed. When it starts up, it'll figure out "who" it is and regenerate that file, pull the latest cluster config from the primary config server, and start up. > And how about attaching to the netowkrk as neutron networking uses per tenant networks, so how would you actually get access to the gpfs cluster? This bit is where I can see the potential pitfall. OpenStack naturally uses NAT to handle traffic to and from guests - will GPFS cope with nat'ted clients in this way? Fair point on NFS from Alex - but will you get the same multi-threaded performance from NFS compared with GPFS? Also - could you make each hypervisor an NFS server for its guests, thus doing away with the need for CNFS, and removing the potential for the nfs server threads bottlenecking? For instance - if I have 300 worker nodes, and 7 NSD servers - I'd then have 300 NFS servers running, rather than 7 NFS servers. Direct block access to the storage from the hypervisor would also be possible (network configuration permitting), and the NFS traffic would flow only over a "virtual" network within the hypervisor, and so "should" (?) be more efficient. > > Simon > ________________________________________ > From: gpfsug-discuss-bounces at gpfsug.org [gpfsug-discuss-bounces at gpfsug.org] on behalf of Sven Oehme [oehmes at gmail.com] > Sent: 19 November 2014 19:00 > To: gpfsug main discussion list > Subject: Re: [gpfsug-discuss] GPFS inside OpenStack guests > > technically there are multiple ways to do this. > > 1. you can use the NSD protocol for this, just need to have adequate Network resources (or use PCI pass trough of the network adapter to the guest) > 2. you attach the physical disks as virtio block devices > 3. pass trough of the Block HBA (e.g. FC adapter) into the guest. > > if you use virtio you need to make sure all caching is disabled entirely or you end up with major issues and i am not sure about official support for this, 1 and 3 are straight forward ... > > Sven > > > > > > On Wed, Nov 19, 2014 at 8:35 AM, Orlando Richards > wrote: > Hi folks, > > Does anyone have experience of running GPFS inside OpenStack guests, to connect to an existing (traditional, "bare metal") GPFS filesystem owning cluster? > > This is not using GPFS for openstack block/image storage - but using GPFS as a "NAS" service, with openstack guest instances as as a "GPFS client". > > > --- > Orlando > > > > > -- > -- > Dr Orlando Richards > Research Facilities (ECDF) Systems Leader > Information Services > IT Infrastructure Division > Tel: 0131 650 4994 > skype: orlando.richards > > The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336. > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at gpfsug.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at gpfsug.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > -- -- Dr Orlando Richards Research Facilities (ECDF) Systems Leader Information Services IT Infrastructure Division Tel: 0131 650 4994 skype: orlando.richards The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336. From S.J.Thompson at bham.ac.uk Thu Nov 20 00:20:44 2014 From: S.J.Thompson at bham.ac.uk (Simon Thompson (Research Computing - IT Services)) Date: Thu, 20 Nov 2014 00:20:44 +0000 Subject: [gpfsug-discuss] GPFS inside OpenStack guests In-Reply-To: References: <546CC6E0.1010800@ed.ac.uk> Message-ID: On 19/11/2014 14:56, "orlando.richards at ed.ac.uk" wrote: >> >>And how about attaching to the netowkrk as neutron networking uses per >>tenant networks, so how would you actually get access to the gpfs >>cluster? > >This bit is where I can see the potential pitfall. OpenStack naturally >uses NAT to handle traffic to and from guests - will GPFS cope with >nat'ted clients in this way? Well, not necessarily, I was thinking about this and potentially you could create an external shared network which is bound to your GPFS interface, though there?s possible security questions maybe around exposing a real internal network device into a VM. I think there is also a Mellanox driver for the VPI Pro cards which allow you to pass the card through to instances. I can?t remember if that was just acceleration for Ethernet or if it could do IB as well. >Also - could you make each hypervisor an NFS server for its guests, thus >doing away with the need for CNFS, and removing the potential for the nfs >server threads bottlenecking? For instance - if I have 300 worker nodes, >and 7 NSD servers - I'd then have 300 NFS servers running, rather than 7 Would you then not need to have 300 server licenses though? Simon From jonathan at buzzard.me.uk Thu Nov 20 10:03:01 2014 From: jonathan at buzzard.me.uk (Jonathan Buzzard) Date: Thu, 20 Nov 2014 10:03:01 +0000 Subject: [gpfsug-discuss] GPFS inside OpenStack guests In-Reply-To: References: <546CC6E0.1010800@ed.ac.uk> , Message-ID: <1416477781.4171.23.camel@buzzard.phy.strath.ac.uk> On Wed, 2014-11-19 at 20:56 +0000, orlando.richards at ed.ac.uk wrote: > On Wed, 19 Nov 2014, Simon Thompson (Research Computing - IT Services) > wrote: > > > > > Yes, what about the random name nature of a vm image? > > > > > > For example I spin up a new vm, how does it join the gpfs cluster to be able to use nsd protocol? > > > I *think* this bit should be solvable - assuming one can pre-define the > range of names the node will have, and can pre-populate your gpfs cluster > config with these node names. The guest image should then have the full > /var/mmfs tree (pulled from another gpfs node), but with the > /var/mmfs/gen/mmfsNodeData file removed. When it starts up, it'll figure > out "who" it is and regenerate that file, pull the latest cluster config > from the primary config server, and start up. It's perfectly solvable with a bit of scripting and putting the cluster into admin mode central. > > > > > And how about attaching to the netowkrk as neutron networking uses per tenant networks, so how would you actually get access to the gpfs cluster? > > This bit is where I can see the potential pitfall. OpenStack naturally > uses NAT to handle traffic to and from guests - will GPFS cope with > nat'ted clients in this way? Not going to work with NAT. GPFS has some "funny" ideas about networking, but to put it succinctly all the nodes have to be on the same class A, B or C network. Though it considers every address in a class A network to be on the same network even though you may have divided it up internally into different networks. Consequently the network model in GPFS is broken. You would need to use bridged mode aka FlatNetworking in OpenStacks for this to work, but surely Jan knows all this. JAB. -- Jonathan A. Buzzard Email: jonathan (at) buzzard.me.uk Fife, United Kingdom. From janfrode at tanso.net Fri Nov 21 19:35:48 2014 From: janfrode at tanso.net (Jan-Frode Myklebust) Date: Fri, 21 Nov 2014 20:35:48 +0100 Subject: [gpfsug-discuss] Gathering node/fs statistics ? Message-ID: <20141121193548.GA11920@mushkin.tanso.net> I'm considering writing a Performance CoPilot agent (PMDA, Performance Metrics Domain Agent) for GPFS, and would like to collect all/most of the metrics that are already available in the gpfs SNMP agent -- ideally without using SNMP.. So, could someone help me with where to find GPFS performance data? I've noticed "mmfsadm" has a "resetstats" option, but what are these stats / where can I find them? All in mmpmon? Also the GPFS-MIB.txt seems to point at some commands I'm unfamiliar with: -- all other node data from EE "get nodes" command -- Status info from EE "get fs -b" command -- Performance data from mmpmon "gfis" command -- Storage pool table comes from EE "get pools" command -- Storage pool data comes from SDR and EE "get pools" command -- Disk data from EE "get fs" command -- Disk performance data from mmpmon "ds" command: -- From mmpmon nc: Any idea what 'EE "get nodes"' is? And what do they mean by 'mmpmon "gfis"', "nc" or "ds"? These commands doesn't work when fed to mmpmon.. -jf From oehmes at gmail.com Fri Nov 21 20:15:16 2014 From: oehmes at gmail.com (Sven Oehme) Date: Fri, 21 Nov 2014 12:15:16 -0800 Subject: [gpfsug-discuss] Gathering node/fs statistics ? In-Reply-To: <20141121193548.GA11920@mushkin.tanso.net> References: <20141121193548.GA11920@mushkin.tanso.net> Message-ID: Hi, you should take a look at the following 3 links : my performance talk about GPFS , take a look at the dstat plugin mentioned in the charts : http://www.gpfsug.org/wp-content/uploads/2014/05/UG10_GPFS_Performance_Session_v10.pdf documentation about the mmpmon interface and use in GPFS : http://www-01.ibm.com/support/knowledgecenter/SSFKCN_4.1.0/com.ibm.cluster.gpfs.v4r1.gpfs200.doc/bl1adv_mmpmonch.htm documentation about GSS/ESS/GNR in case you care about this as well and its additional mmpmon commands : http://www-01.ibm.com/support/knowledgecenter/SSFKCN/bl1du14a.pdf Sven On Fri, Nov 21, 2014 at 11:35 AM, Jan-Frode Myklebust wrote: > I'm considering writing a Performance CoPilot agent (PMDA, Performance > Metrics Domain Agent) for GPFS, and would like to collect all/most of > the metrics that are already available in the gpfs SNMP agent -- ideally > without using SNMP.. > > So, could someone help me with where to find GPFS performance data? I've > noticed "mmfsadm" has a "resetstats" option, but what are these stats / > where can I find them? All in mmpmon? > > Also the GPFS-MIB.txt seems to point at some commands I'm unfamiliar > with: > > -- all other node data from EE "get nodes" command > -- Status info from EE "get fs -b" command > -- Performance data from mmpmon "gfis" command > -- Storage pool table comes from EE "get pools" command > -- Storage pool data comes from SDR and EE "get pools" command > -- Disk data from EE "get fs" command > -- Disk performance data from mmpmon "ds" command: > -- From mmpmon nc: > > > Any idea what 'EE "get nodes"' is? And what do they mean by 'mmpmon > "gfis"', "nc" or "ds"? These commands doesn't work when fed to mmpmon.. > > > > -jf > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at gpfsug.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: From oester at gmail.com Fri Nov 21 20:29:05 2014 From: oester at gmail.com (Bob Oesterlin) Date: Fri, 21 Nov 2014 14:29:05 -0600 Subject: [gpfsug-discuss] Gathering node/fs statistics ? In-Reply-To: <20141121193548.GA11920@mushkin.tanso.net> References: <20141121193548.GA11920@mushkin.tanso.net> Message-ID: You might want to look at Arxview, www.arxscan.com. I've been working with them and they have good GPFS and Storage monitoring based on mmpmon. Lightweight too. Bob Oesterlin Nuance Communications On Friday, November 21, 2014, Jan-Frode Myklebust wrote: > I'm considering writing a Performance CoPilot agent (PMDA, Performance > Metrics Domain Agent) for GPFS, and would like to collect all/most of > the metrics that are already available in the gpfs SNMP agent -- ideally > without using SNMP.. > > So, could someone help me with where to find GPFS performance data? I've > noticed "mmfsadm" has a "resetstats" option, but what are these stats / > where can I find them? All in mmpmon? > > Also the GPFS-MIB.txt seems to point at some commands I'm unfamiliar > with: > > -- all other node data from EE "get nodes" command > -- Status info from EE "get fs -b" command > -- Performance data from mmpmon "gfis" command > -- Storage pool table comes from EE "get pools" command > -- Storage pool data comes from SDR and EE "get pools" command > -- Disk data from EE "get fs" command > -- Disk performance data from mmpmon "ds" command: > -- From mmpmon nc: > > > Any idea what 'EE "get nodes"' is? And what do they mean by 'mmpmon > "gfis"', "nc" or "ds"? These commands doesn't work when fed to mmpmon.. > > > > -jf > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at gpfsug.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -- Bob Oesterlin -------------- next part -------------- An HTML attachment was scrubbed... URL: From sabujp at gmail.com Fri Nov 21 22:50:02 2014 From: sabujp at gmail.com (Sabuj Pattanayek) Date: Fri, 21 Nov 2014 16:50:02 -0600 Subject: [gpfsug-discuss] any difference with the filespace view mmbackup sees from a global snapshot vs a snapshot on -j root with only 1 independent fileset (root)? Message-ID: Hi all, We're running 3.5.0.19 . Is there any difference in terms of the view of the filespace that mmbackup sees and then passes to TSM if we run mmbackup against a global snapshot vs a snapshot on -j root if we only have and ever plan on having one independent fileset (root)? It doesn't look like it to me just from ls, but just verifying. We want to get away from using a global snapshot if possible (and start using -j root snapshots) instead because for some reason it looks like it takes much much longer to run mmdelsnapshot on a global snapshot vs a snapshot on the root fileset. Thanks, Sabuj -------------- next part -------------- An HTML attachment was scrubbed... URL: From stijn.deweirdt at ugent.be Mon Nov 24 21:22:19 2014 From: stijn.deweirdt at ugent.be (Stijn De Weirdt) Date: Mon, 24 Nov 2014 22:22:19 +0100 Subject: [gpfsug-discuss] restripe or not Message-ID: <5473A18B.7000702@ugent.be> hi all, we are going to expand an existing filestytem with approx 50% capacity. the current filesystem is 75% full. we are in downtime (for more then just this reason), so we can take the IO rebalance hit for a while (say max 48hours). my questions: a. do we really need to rebalance? the mmadddisk page suggest normally it's going to be ok, but i never understood that. new data will end up mainly on new disks, so wrt to performance, this can't really work out, can it? b. can we change the priority of rebalancing somehow (fewer nodes taking part in the rebalance?) c. once we start the rebalance, how save is it to stop with kill or ctrl-c (or can we say eg. rebalance 25% now, rest later?) (and how often can we do this? eg a daily cron job to restripe at max one hour per day, would this cause issue in the long term many thanks, stijn From zgiles at gmail.com Mon Nov 24 23:14:21 2014 From: zgiles at gmail.com (Zachary Giles) Date: Mon, 24 Nov 2014 18:14:21 -0500 Subject: [gpfsug-discuss] restripe or not In-Reply-To: <5473A18B.7000702@ugent.be> References: <5473A18B.7000702@ugent.be> Message-ID: Interesting question.. Just some thoughts: Not an expert on restriping myself: * Your new storage -- is it the same size, shape, speed as the old storage? If not, then are you going to add it to the same storage pool, or an additional storage pool? If additional, restripe is not needed, as you can't / don't need to restripe across storage pools, the data will be in one or the other. However, you of course will need to make a policy to place data correctly. Of course, if you're going to double your storage and all your new data will be written to the new disks, then you may be leaving quite a bit of capacity on the floor. * mmadddisk man page and normal balancing -- yes, we've seen this suggestion as well -- that is, that new data will generally fill across the cluster and eventually fill in the gaps. We didn't restripe on a much smaller storage pool and it eventually did balance out, however, it was also a "tier 1" where data is migrated out often. If I were doubling my primary storage with more of the exact same disks, I'd probably restripe. * Stopping a restripe -- I'm """ Pretty Sure """ you can stop a restripe safely with a Ctrl-C. I'm """ Pretty Sure """ we've done that a few times ourselves with no problem. I remember I was going to restripe something but the estimates were too high and so I stopped it. I'd feel fairly confident in doing it, but I don't want to take responsibility for your storage. :) :) I don't think there's a need to restripe every hour or anything. If you're generally balanced at one point, you'd probably continue to be under normal operation. On Mon, Nov 24, 2014 at 4:22 PM, Stijn De Weirdt wrote: > hi all, > > we are going to expand an existing filestytem with approx 50% capacity. > the current filesystem is 75% full. > > we are in downtime (for more then just this reason), so we can take the IO > rebalance hit for a while (say max 48hours). > > my questions: > a. do we really need to rebalance? the mmadddisk page suggest normally > it's going to be ok, but i never understood that. new data will end up > mainly on new disks, so wrt to performance, this can't really work out, can > it? > b. can we change the priority of rebalancing somehow (fewer nodes taking > part in the rebalance?) > c. once we start the rebalance, how save is it to stop with kill or ctrl-c > (or can we say eg. rebalance 25% now, rest later?) > (and how often can we do this? eg a daily cron job to restripe at max one > hour per day, would this cause issue in the long term > > > many thanks, > > stijn > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at gpfsug.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -- Zach Giles zgiles at gmail.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From oester at gmail.com Tue Nov 25 02:01:06 2014 From: oester at gmail.com (Bob Oesterlin) Date: Mon, 24 Nov 2014 20:01:06 -0600 Subject: [gpfsug-discuss] restripe or not In-Reply-To: <5473A18B.7000702@ugent.be> References: <5473A18B.7000702@ugent.be> Message-ID: In general, the need to restripe after a disk add is dependent on a number of factors, as has been pointed out.. A couple of other thoughts/suggestions: - One thing you might consider (depending on your pattern of read/write traffic), is selectively suspending one or more of the existing NSDs, forcing GPFS to write new blocks to the new NSDs. That way at least some of the new data is being written to the new storage by default, rather than using up blocks on the existing NSDs. You can suspend/resume disks at any time. - You can pick a subset of nodes to perform the restripe with "mmrestripefs -N node1,node2,..." Keep in mind you'll get much better performance and less impact to the filesystem if you choose NSD servers with direct access to the disk. - Resume of restripe: Yes, you can do this, no harm, done it many times. You can track the balance of the disks using "mmdf ". This is a pretty intensive command, so I wouldn't run in frequently. Check it a few times each day, see if the data balance is improving by itself. When you stop/restart it, the restripe doesn't pick up exactly where it left off, it's going to scan the entire file system again. - You can also restripe single files if the are large and get a heavy I/O (mmrestripefile) Bob Oesterlin Nuance Communications On Mon, Nov 24, 2014 at 3:22 PM, Stijn De Weirdt wrote: > hi all, > > we are going to expand an existing filestytem with approx 50% capacity. > the current filesystem is 75% full. > > we are in downtime (for more then just this reason), so we can take the IO > rebalance hit for a while (say max 48hours). > > my questions: > a. do we really need to rebalance? the mmadddisk page suggest normally > it's going to be ok, but i never understood that. new data will end up > mainly on new disks, so wrt to performance, this can't really work out, can > it? > b. can we change the priority of rebalancing somehow (fewer nodes taking > part in the rebalance?) > c. once we start the rebalance, how save is it to stop with kill or ctrl-c > (or can we say eg. rebalance 25% now, rest later?) > (and how often can we do this? eg a daily cron job to restripe at max one > hour per day, would this cause issue in the long term > > > many thanks, > > stijn > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at gpfsug.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: From stijn.deweirdt at ugent.be Tue Nov 25 07:17:56 2014 From: stijn.deweirdt at ugent.be (Stijn De Weirdt) Date: Tue, 25 Nov 2014 08:17:56 +0100 Subject: [gpfsug-discuss] restripe or not In-Reply-To: References: <5473A18B.7000702@ugent.be> Message-ID: <54742D24.8090602@ugent.be> hi zachary, > * Your new storage -- is it the same size, shape, speed as the old storage? yes. we created and used it as "test" filesystem on the same hardware when we started. now we are shrinking the test filesystem and adding the free disks to the production one. > If not, then are you going to add it to the same storage pool, or an > additional storage pool? If additional, restripe is not needed, as you > can't / don't need to restripe across storage pools, the data will be in > one or the other. However, you of course will need to make a policy to > place data correctly. sure, but in this case, they end up in teh same pool. > Of course, if you're going to double your storage and all your new data > will be written to the new disks, then you may be leaving quite a bit of > capacity on the floor. > > * mmadddisk man page and normal balancing -- yes, we've seen this > suggestion as well -- that is, that new data will generally fill across the > cluster and eventually fill in the gaps. We didn't restripe on a much > smaller storage pool and it eventually did balance out, however, it was > also a "tier 1" where data is migrated out often. If I were doubling my > primary storage with more of the exact same disks, I'd probably restripe. more then half of the data on the current filesystem is more or less static (we expect it to stay there 2-3 year unmodified). similar data will be added in the near future. > > * Stopping a restripe -- I'm """ Pretty Sure """ you can stop a restripe > safely with a Ctrl-C. I'm """ Pretty Sure """ we've done that a few times > ourselves with no problem. I remember I was going to restripe something but > the estimates were too high and so I stopped it. I'd feel fairly confident > in doing it, but I don't want to take responsibility for your storage. :) yeah, i've also remember cancelling a restripe and i'm pretty sure it ddin't cause problems (i would certainly remember the problems ;) i'm looking for some further confirmation (or e.g. a reference to some docuemnt that says so. i vaguely remember sven(?) saying this on the lodon gpfs user day this year. > :) I don't think there's a need to restripe every hour or anything. If > you're generally balanced at one point, you'd probably continue to be under > normal operation. i was thinking to spread the total restripe over one or 2 hour periods each days the coming week(s); but i'm now realising this might not be the best idea, because it will rebalance any new data as well, slowing down the bulk rebalancing. anyway, thanks for the feedback. i'll probably let the rebalance run for 48 hours and see how far it got by that time. stijn > > > > > > On Mon, Nov 24, 2014 at 4:22 PM, Stijn De Weirdt > wrote: > >> hi all, >> >> we are going to expand an existing filestytem with approx 50% capacity. >> the current filesystem is 75% full. >> >> we are in downtime (for more then just this reason), so we can take the IO >> rebalance hit for a while (say max 48hours). >> >> my questions: >> a. do we really need to rebalance? the mmadddisk page suggest normally >> it's going to be ok, but i never understood that. new data will end up >> mainly on new disks, so wrt to performance, this can't really work out, can >> it? >> b. can we change the priority of rebalancing somehow (fewer nodes taking >> part in the rebalance?) >> c. once we start the rebalance, how save is it to stop with kill or ctrl-c >> (or can we say eg. rebalance 25% now, rest later?) >> (and how often can we do this? eg a daily cron job to restripe at max one >> hour per day, would this cause issue in the long term >> >> >> many thanks, >> >> stijn >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at gpfsug.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >> > > > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at gpfsug.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > From stijn.deweirdt at ugent.be Tue Nov 25 07:23:41 2014 From: stijn.deweirdt at ugent.be (Stijn De Weirdt) Date: Tue, 25 Nov 2014 08:23:41 +0100 Subject: [gpfsug-discuss] restripe or not In-Reply-To: References: <5473A18B.7000702@ugent.be> Message-ID: <54742E7D.7090009@ugent.be> hi bob, > - One thing you might consider (depending on your pattern of read/write > traffic), is selectively suspending one or more of the existing NSDs, > forcing GPFS to write new blocks to the new NSDs. That way at least some of > the new data is being written to the new storage by default, rather than > using up blocks on the existing NSDs. You can suspend/resume disks at any > time. is the gpfs placment weighted with the avalaible volume? i'd rather not make this a manual operation. > > - You can pick a subset of nodes to perform the restripe with "mmrestripefs > -N node1,node2,..." Keep in mind you'll get much better performance and > less impact to the filesystem if you choose NSD servers with direct access > to the disk. yes and i no i guess, our nsds see all disks, but the problem with nsds is that they don't honour any roles (our primary nsds have the preferred path to the controller and lun, meaning all access from non-primary nsd to that disk is suboptimal). > > - Resume of restripe: Yes, you can do this, no harm, done it many times. > You can track the balance of the disks using "mmdf ". This is a > pretty intensive command, so I wouldn't run in frequently. Check it a few > times each day, see if the data balance is improving by itself. When you thanks for the tip to monitor it with mmdf! > stop/restart it, the restripe doesn't pick up exactly where it left off, > it's going to scan the entire file system again. yeah, i realised that this is a flaw in my "one-hour a day" restripe idea ;) > > - You can also restripe single files if the are large and get a heavy I/O > (mmrestripefile) excellent tip! forgot about that one. if the rebalnce is to slow, i can run this on the static data. thanks a lot for the feedback stijn > > Bob Oesterlin > Nuance Communications > > > On Mon, Nov 24, 2014 at 3:22 PM, Stijn De Weirdt > wrote: > >> hi all, >> >> we are going to expand an existing filestytem with approx 50% capacity. >> the current filesystem is 75% full. >> >> we are in downtime (for more then just this reason), so we can take the IO >> rebalance hit for a while (say max 48hours). >> >> my questions: >> a. do we really need to rebalance? the mmadddisk page suggest normally >> it's going to be ok, but i never understood that. new data will end up >> mainly on new disks, so wrt to performance, this can't really work out, can >> it? >> b. can we change the priority of rebalancing somehow (fewer nodes taking >> part in the rebalance?) >> c. once we start the rebalance, how save is it to stop with kill or ctrl-c >> (or can we say eg. rebalance 25% now, rest later?) >> (and how often can we do this? eg a daily cron job to restripe at max one >> hour per day, would this cause issue in the long term >> >> >> many thanks, >> >> stijn >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at gpfsug.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >> > From L.A.Hurst at bham.ac.uk Tue Nov 25 10:45:51 2014 From: L.A.Hurst at bham.ac.uk (Laurence Alexander Hurst) Date: Tue, 25 Nov 2014 10:45:51 +0000 Subject: [gpfsug-discuss] Quotas and non-existant users Message-ID: Hi all, We have noticed that once users are deleted (gone entirely from the passwd backend) and all their files removed, their uid is still reported by GPFS? quota tools (albeit with zero files and space usage). Does anyone know if there is a way to clear out these spurious entries or is it a case that once a uid is in the quota database its there forever regardless of if that uid is still in use and has quota to record? Many Thanks, Laurence From ewahl at osc.edu Tue Nov 25 13:52:55 2014 From: ewahl at osc.edu (Wahl, Edward) Date: Tue, 25 Nov 2014 13:52:55 +0000 Subject: [gpfsug-discuss] Quotas and non-existant users In-Reply-To: References: Message-ID: <1416923575.2343.18.camel@localhost.localdomain> Do you still have policies or filesets associated with these users? Ed Wahl OSC On Tue, 2014-11-25 at 10:45 +0000, Laurence Alexander Hurst wrote: > Hi all, > > We have noticed that once users are deleted (gone entirely from the passwd backend) and all their files removed, their uid is still reported by GPFS? quota tools (albeit with zero files and space usage). > > Does anyone know if there is a way to clear out these spurious entries or is it a case that once a uid is in the quota database its there forever regardless of if that uid is still in use and has quota to record? > > Many Thanks, > > Laurence > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at gpfsug.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss From jonathan at buzzard.me.uk Tue Nov 25 14:00:29 2014 From: jonathan at buzzard.me.uk (Jonathan Buzzard) Date: Tue, 25 Nov 2014 14:00:29 +0000 Subject: [gpfsug-discuss] Quotas and non-existant users In-Reply-To: References: Message-ID: <1416924029.4171.69.camel@buzzard.phy.strath.ac.uk> On Tue, 2014-11-25 at 10:45 +0000, Laurence Alexander Hurst wrote: > Hi all, > > We have noticed that once users are deleted (gone entirely from the > passwd backend) and all their files removed, their uid is still > reported by GPFS? quota tools (albeit with zero files and space usage). > There is something somewhere that references them, because they do disappear. I know because I cleared out a GPFS file system that had files and directories used by "depreciated" user and group names, and the check I was using to make sure I had got everything belonging to a particular user or group was mmrepquota. JAB. -- Jonathan A. Buzzard Email: jonathan (at) buzzard.me.uk Fife, United Kingdom. From stijn.deweirdt at ugent.be Tue Nov 25 16:25:58 2014 From: stijn.deweirdt at ugent.be (Stijn De Weirdt) Date: Tue, 25 Nov 2014 17:25:58 +0100 Subject: [gpfsug-discuss] gpfs.gnr updates Message-ID: <5474AD96.3050006@ugent.be> hi all, does anyone know where we can find the release notes and update rpms for gpfs.gnr? we logged a case with ibm a while ago, and we assumed that the fix for the issue was part of the regular gpfs updates (we assumed as much from the conversation with ibm tech support). many thanks, stijn From L.A.Hurst at bham.ac.uk Wed Nov 26 10:14:26 2014 From: L.A.Hurst at bham.ac.uk (Laurence Alexander Hurst) Date: Wed, 26 Nov 2014 10:14:26 +0000 Subject: [gpfsug-discuss] Quotas and non-existant users In-Reply-To: <1416924029.4171.69.camel@buzzard.phy.strath.ac.uk> References: <1416924029.4171.69.camel@buzzard.phy.strath.ac.uk> Message-ID: Hmm, mmrepquota is reporting no files owned by any of the users in question. I?ll see if `find` disagrees. They have the default fileset user quotas applied, so they?re not users we?ve edited to grant quota extensions to. We have had a problem (which IBM have acknowledged, iirc) whereby it is not possible to reset a user?s quota back to the default if it has been modified, perhaps this is related? I?ll see if `find` turns anything up or I?ll raise a ticket with IBM and see what they think. I?ve pulled out a single example, but all 75 users I have are the same. mmrepquota gpfs | grep 8695 8695 nbu USR 0 0 5368709120 0 none | 0 0 0 0 none 8695 bb USR 0 0 1073741824 0 none | 0 0 0 0 none Thanks for your input. Laurence On 25/11/2014 14:00, "Jonathan Buzzard" wrote: >On Tue, 2014-11-25 at 10:45 +0000, Laurence Alexander Hurst wrote: >> Hi all, >> >> We have noticed that once users are deleted (gone entirely from the >> passwd backend) and all their files removed, their uid is still >> reported by GPFS? quota tools (albeit with zero files and space usage). >> > >There is something somewhere that references them, because they do >disappear. I know because I cleared out a GPFS file system that had >files and directories used by "depreciated" user and group names, and >the check I was using to make sure I had got everything belonging to a >particular user or group was mmrepquota. > >JAB. > >-- >Jonathan A. Buzzard Email: jonathan (at) buzzard.me.uk >Fife, United Kingdom. > > >_______________________________________________ >gpfsug-discuss mailing list >gpfsug-discuss at gpfsug.org >http://gpfsug.org/mailman/listinfo/gpfsug-discuss From chair at gpfsug.org Thu Nov 27 09:21:30 2014 From: chair at gpfsug.org (Jez Tucker (Chair)) Date: Thu, 27 Nov 2014 09:21:30 +0000 Subject: [gpfsug-discuss] gpfs.gnr updates In-Reply-To: <5474AD96.3050006@ugent.be> References: <5474AD96.3050006@ugent.be> Message-ID: <5476ED1A.8050504@gpfsug.org> Hi Stijn, As far as I am aware, GNR updates are not publicly available for download. You should approach your reseller / IBM Business partner who should be able to supply you with the updates. IBMers, please feel free to correct this statement if in error. Jez On 25/11/14 16:25, Stijn De Weirdt wrote: > hi all, > > does anyone know where we can find the release notes and update rpms > for gpfs.gnr? > we logged a case with ibm a while ago, and we assumed that the fix for > the issue was part of the regular gpfs updates (we assumed as much > from the conversation with ibm tech support). > > many thanks, > > stijn > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at gpfsug.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss From jonathan at buzzard.me.uk Thu Nov 27 09:47:59 2014 From: jonathan at buzzard.me.uk (Jonathan Buzzard) Date: Thu, 27 Nov 2014 09:47:59 +0000 Subject: [gpfsug-discuss] Quotas and non-existant users In-Reply-To: References: <1416924029.4171.69.camel@buzzard.phy.strath.ac.uk> Message-ID: <1417081679.4171.99.camel@buzzard.phy.strath.ac.uk> On Wed, 2014-11-26 at 10:14 +0000, Laurence Alexander Hurst wrote: > Hmm, mmrepquota is reporting no files owned by any of the users in > question. I?ll see if `find` disagrees. > They have the default fileset > user quotas applied, so they?re not users we?ve edited to grant quota > extensions to. We have had a problem (which IBM have acknowledged, iirc) > whereby it is not possible to reset a user?s quota back to the default if > it has been modified, perhaps this is related? I?ll see if `find` turns > anything up or I?ll raise a ticket with IBM and see what they think. > > I?ve pulled out a single example, but all 75 users I have are the same. > > mmrepquota gpfs | grep 8695 > 8695 nbu USR 0 0 5368709120 0 > none | 0 0 0 0 none > 8695 bb USR 0 0 1073741824 0 > none | 0 0 0 0 none > While the number of files and usage is zero look at those "in doubt" numbers. Until these also fall to zero then the users are not going to disappear from the quota reporting would be my guess. Quite why the "in doubt" numbers are still so large is another question. I have vague recollections of this happening to me when I deleted large amounts of data belonging to a user down to zero when I was clearing the file system up I mentioned before. Though to be honest most of my clearing up was identifying who the files really belonged to (there had in the distance past been a change of usernames; gone from local usernames to using the university wide ones and not everyone had claimed their files. All related to a move to using Active Directory) and doing chown's on the data. I think what happens is when the file number goes to zero the quota system stops updating for that user and if there is anything "in doubt" it never gets updated and sticks around forever. Might be worth creating a couple of files for the user in the appropriate filesets and then give it a bit of time and see if the output of mmrepquota matches what you believe is the real case. If this works and the "in doubt" number goes to zero I would at this point do a chown to a different user that is not going away and then delete the files. Something else to consider is that they might be in an ACL somewhere which is confusing the quota system. JAB. -- Jonathan A. Buzzard Email: jonathan (at) buzzard.me.uk Fife, United Kingdom. From peserocka at gmail.com Thu Nov 27 10:01:55 2014 From: peserocka at gmail.com (P Serocka) Date: Thu, 27 Nov 2014 18:01:55 +0800 Subject: [gpfsug-discuss] Quotas and non-existant users In-Reply-To: <1417081679.4171.99.camel@buzzard.phy.strath.ac.uk> References: <1416924029.4171.69.camel@buzzard.phy.strath.ac.uk> <1417081679.4171.99.camel@buzzard.phy.strath.ac.uk> Message-ID: Any chance to run mmcheckquota? which should remove all "doubt"... On 2014 Nov 27. md, at 17:47 st, Jonathan Buzzard wrote: > On Wed, 2014-11-26 at 10:14 +0000, Laurence Alexander Hurst wrote: >> Hmm, mmrepquota is reporting no files owned by any of the users in >> question. I?ll see if `find` disagrees. >> They have the default fileset >> user quotas applied, so they?re not users we?ve edited to grant quota >> extensions to. We have had a problem (which IBM have acknowledged, iirc) >> whereby it is not possible to reset a user?s quota back to the default if >> it has been modified, perhaps this is related? I?ll see if `find` turns >> anything up or I?ll raise a ticket with IBM and see what they think. >> >> I?ve pulled out a single example, but all 75 users I have are the same. >> >> mmrepquota gpfs | grep 8695 >> 8695 nbu USR 0 0 5368709120 0 >> none | 0 0 0 0 none >> 8695 bb USR 0 0 1073741824 0 >> none | 0 0 0 0 none >> > > While the number of files and usage is zero look at those "in doubt" > numbers. Until these also fall to zero then the users are not going to > disappear from the quota reporting would be my guess. Quite why the "in > doubt" numbers are still so large is another question. I have vague > recollections of this happening to me when I deleted large amounts of > data belonging to a user down to zero when I was clearing the file > system up I mentioned before. Though to be honest most of my clearing up > was identifying who the files really belonged to (there had in the > distance past been a change of usernames; gone from local usernames to > using the university wide ones and not everyone had claimed their files. > All related to a move to using Active Directory) and doing chown's on > the data. > > I think what happens is when the file number goes to zero the quota > system stops updating for that user and if there is anything "in doubt" > it never gets updated and sticks around forever. > > Might be worth creating a couple of files for the user in the > appropriate filesets and then give it a bit of time and see if the > output of mmrepquota matches what you believe is the real case. If this > works and the "in doubt" number goes to zero I would at this point do a > chown to a different user that is not going away and then delete the > files. > > Something else to consider is that they might be in an ACL somewhere > which is confusing the quota system. > > > JAB. > > -- > Jonathan A. Buzzard Email: jonathan (at) buzzard.me.uk > Fife, United Kingdom. > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at gpfsug.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss From jonathan at buzzard.me.uk Thu Nov 27 10:02:03 2014 From: jonathan at buzzard.me.uk (Jonathan Buzzard) Date: Thu, 27 Nov 2014 10:02:03 +0000 Subject: [gpfsug-discuss] Quotas and non-existant users In-Reply-To: <1417081679.4171.99.camel@buzzard.phy.strath.ac.uk> References: <1416924029.4171.69.camel@buzzard.phy.strath.ac.uk> <1417081679.4171.99.camel@buzzard.phy.strath.ac.uk> Message-ID: <1417082523.4171.104.camel@buzzard.phy.strath.ac.uk> On Thu, 2014-11-27 at 09:47 +0000, Jonathan Buzzard wrote: > On Wed, 2014-11-26 at 10:14 +0000, Laurence Alexander Hurst wrote: > > Hmm, mmrepquota is reporting no files owned by any of the users in > > question. I?ll see if `find` disagrees. > > They have the default fileset > > user quotas applied, so they?re not users we?ve edited to grant quota > > extensions to. We have had a problem (which IBM have acknowledged, iirc) > > whereby it is not possible to reset a user?s quota back to the default if > > it has been modified, perhaps this is related? I?ll see if `find` turns > > anything up or I?ll raise a ticket with IBM and see what they think. > > > > I?ve pulled out a single example, but all 75 users I have are the same. > > > > mmrepquota gpfs | grep 8695 > > 8695 nbu USR 0 0 5368709120 0 > > none | 0 0 0 0 none > > 8695 bb USR 0 0 1073741824 0 > > none | 0 0 0 0 none > > > > While the number of files and usage is zero look at those "in doubt" > numbers. Ignore that those are quota numbers. Hard when the column headings are missing. Anyway a "Homer Simpson" momentum coming up... Simple answer really remove the quotas for those users in those file sets (I am presuming they are per fileset user hard limits). They are sticking around in mmrepquota because they have a hard limit set. D'oh! JAB. -- Jonathan A. Buzzard Email: jonathan (at) buzzard.me.uk Fife, United Kingdom. From peserocka at gmail.com Thu Nov 27 10:06:31 2014 From: peserocka at gmail.com (P Serocka) Date: Thu, 27 Nov 2014 18:06:31 +0800 Subject: [gpfsug-discuss] Quotas and non-existant users In-Reply-To: <1417082523.4171.104.camel@buzzard.phy.strath.ac.uk> References: <1416924029.4171.69.camel@buzzard.phy.strath.ac.uk> <1417081679.4171.99.camel@buzzard.phy.strath.ac.uk> <1417082523.4171.104.camel@buzzard.phy.strath.ac.uk> Message-ID: <44A03A01-4010-4210-8892-2AE37451EEFA@gmail.com> ;-) Ignore my other message on mmcheckquota then. On 2014 Nov 27. md, at 18:02 st, Jonathan Buzzard wrote: > On Thu, 2014-11-27 at 09:47 +0000, Jonathan Buzzard wrote: >> On Wed, 2014-11-26 at 10:14 +0000, Laurence Alexander Hurst wrote: >>> Hmm, mmrepquota is reporting no files owned by any of the users in >>> question. I?ll see if `find` disagrees. >>> They have the default fileset >>> user quotas applied, so they?re not users we?ve edited to grant quota >>> extensions to. We have had a problem (which IBM have acknowledged, iirc) >>> whereby it is not possible to reset a user?s quota back to the default if >>> it has been modified, perhaps this is related? I?ll see if `find` turns >>> anything up or I?ll raise a ticket with IBM and see what they think. >>> >>> I?ve pulled out a single example, but all 75 users I have are the same. >>> >>> mmrepquota gpfs | grep 8695 >>> 8695 nbu USR 0 0 5368709120 0 >>> none | 0 0 0 0 none >>> 8695 bb USR 0 0 1073741824 0 >>> none | 0 0 0 0 none >>> >> >> While the number of files and usage is zero look at those "in doubt" >> numbers. > > Ignore that those are quota numbers. Hard when the column headings are > missing. > > Anyway a "Homer Simpson" momentum coming up... > > Simple answer really remove the quotas for those users in those file > sets (I am presuming they are per fileset user hard limits). They are > sticking around in mmrepquota because they have a hard limit set. D'oh! > > JAB. > > -- > Jonathan A. Buzzard Email: jonathan (at) buzzard.me.uk > Fife, United Kingdom. > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at gpfsug.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss From sdinardo at ebi.ac.uk Wed Nov 5 10:15:59 2014 From: sdinardo at ebi.ac.uk (Salvatore Di Nardo) Date: Wed, 05 Nov 2014 10:15:59 +0000 Subject: [gpfsug-discuss] maybe a silly question about "old school" gpfs Message-ID: <5459F8DF.2090806@ebi.ac.uk> Hello again, to understand better GPFS, recently i build up an test gpfs cluster using some old hardware that was going to be retired. THe storage was SAN devices, so instead to use native raids I went for the old school gpfs. the configuration is basically: 3x servers 3x san storages 2x san switches I did no zoning, so all the servers can see all the LUNs, but on nsd creation I gave each LUN a primary, secondary and third server. with the following rule: STORAGE primary secondary tertiary storage1 server1 server2 server3 storage2 server2 server3 server1 storage3 server3 server1 server2 looking at the mmcrnsd, it was my understanding that the primary server is the one that wrote on the NSD unless it fails, then the following server take the ownership of the lun. Now come the question: when i did from server 1 a dd surprisingly i discovered that server1 was writing to all the luns. the other 2 server was doing nothing. this behaviour surprises me because on GSS only the RG owner can write, so one server "ask" the other server to write to his own RG's.In fact on GSS can be seen a lot of ETH traffic and io/s on each server. While i understand that the situation it's different I'm puzzled about the fact that all the servers seems able to write to all the luns. SAN deviced usually should be connected to one server only, as paralled access could create data corruption. In environments where you connect a SAN to multiple servers ( example VMWARE cloud) its softeware task to avoid data overwriting between server ( and data corruption ). Honestly, what i was expecting is: server1 writing on his own luns, and data traffic ( ethernet) to the other 2 server , basically asking *them* to write on the other luns. I dont know if this behaviour its normal or not. I triied to find a documentation about that, but could not find any. Could somebody tell me if this _/"every server write to all the luns"/_ its intended or not? Thanks in advance, Salvatore -------------- next part -------------- An HTML attachment was scrubbed... URL: From viccornell at gmail.com Wed Nov 5 10:22:38 2014 From: viccornell at gmail.com (Vic Cornell) Date: Wed, 5 Nov 2014 10:22:38 +0000 Subject: [gpfsug-discuss] maybe a silly question about "old school" gpfs In-Reply-To: <5459F8DF.2090806@ebi.ac.uk> References: <5459F8DF.2090806@ebi.ac.uk> Message-ID: <3F74C441-C25D-4F19-AD05-04AD897A08D3@gmail.com> Hi Salvatore, If you are doing the IO on the NSD server itself and it can see all of the NSDs it will use its "local? access to write to the LUNS. You need some GPFS clients to see the workload spread across all of the NSD servers. Vic > On 5 Nov 2014, at 10:15, Salvatore Di Nardo wrote: > > Hello again, > to understand better GPFS, recently i build up an test gpfs cluster using some old hardware that was going to be retired. THe storage was SAN devices, so instead to use native raids I went for the old school gpfs. the configuration is basically: > > 3x servers > 3x san storages > 2x san switches > > I did no zoning, so all the servers can see all the LUNs, but on nsd creation I gave each LUN a primary, secondary and third server. with the following rule: > > STORAGE > primary > secondary > tertiary > storage1 > server1 > server2 server3 > storage2 server2 server3 server1 > storage3 server3 server1 server2 > > looking at the mmcrnsd, it was my understanding that the primary server is the one that wrote on the NSD unless it fails, then the following server take the ownership of the lun. > > Now come the question: > when i did from server 1 a dd surprisingly i discovered that server1 was writing to all the luns. the other 2 server was doing nothing. this behaviour surprises me because on GSS only the RG owner can write, so one server "ask" the other server to write to his own RG's.In fact on GSS can be seen a lot of ETH traffic and io/s on each server. While i understand that the situation it's different I'm puzzled about the fact that all the servers seems able to write to all the luns. > > SAN deviced usually should be connected to one server only, as paralled access could create data corruption. In environments where you connect a SAN to multiple servers ( example VMWARE cloud) its softeware task to avoid data overwriting between server ( and data corruption ). > > Honestly, what i was expecting is: server1 writing on his own luns, and data traffic ( ethernet) to the other 2 server , basically asking them to write on the other luns. I dont know if this behaviour its normal or not. I triied to find a documentation about that, but could not find any. > > Could somebody tell me if this "every server write to all the luns" its intended or not? > > Thanks in advance, > Salvatore > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at gpfsug.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From stijn.deweirdt at ugent.be Wed Nov 5 10:25:07 2014 From: stijn.deweirdt at ugent.be (Stijn De Weirdt) Date: Wed, 05 Nov 2014 11:25:07 +0100 Subject: [gpfsug-discuss] maybe a silly question about "old school" gpfs In-Reply-To: <5459F8DF.2090806@ebi.ac.uk> References: <5459F8DF.2090806@ebi.ac.uk> Message-ID: <5459FB03.8080801@ugent.be> yes, this behaviour is normal, and a bit annoying sometimes, but GPFS doesn't really like (or isn't designed) to run stuff on the NSDs directly. the GSS probably send the data to the other NSD to distribute the (possible) compute cost from the raid, where there is none for regular LUN access. (but you also shouldn't be running on the GSS NSDs ;) stijn On 11/05/2014 11:15 AM, Salvatore Di Nardo wrote: > Hello again, > to understand better GPFS, recently i build up an test gpfs cluster > using some old hardware that was going to be retired. THe storage was > SAN devices, so instead to use native raids I went for the old school > gpfs. the configuration is basically: > > 3x servers > 3x san storages > 2x san switches > > I did no zoning, so all the servers can see all the LUNs, but on nsd > creation I gave each LUN a primary, secondary and third server. with the > following rule: > > STORAGE > primary > secondary > tertiary > storage1 > server1 > server2 server3 > storage2 server2 server3 server1 > storage3 server3 server1 server2 > > > > looking at the mmcrnsd, it was my understanding that the primary server > is the one that wrote on the NSD unless it fails, then the following > server take the ownership of the lun. > > Now come the question: > when i did from server 1 a dd surprisingly i discovered that server1 was > writing to all the luns. the other 2 server was doing nothing. this > behaviour surprises me because on GSS only the RG owner can write, so > one server "ask" the other server to write to his own RG's.In fact on > GSS can be seen a lot of ETH traffic and io/s on each server. While i > understand that the situation it's different I'm puzzled about the fact > that all the servers seems able to write to all the luns. > > SAN deviced usually should be connected to one server only, as paralled > access could create data corruption. In environments where you connect a > SAN to multiple servers ( example VMWARE cloud) its softeware task to > avoid data overwriting between server ( and data corruption ). > > Honestly, what i was expecting is: server1 writing on his own luns, and > data traffic ( ethernet) to the other 2 server , basically asking *them* > to write on the other luns. I dont know if this behaviour its normal or > not. I triied to find a documentation about that, but could not find any. > > Could somebody tell me if this _/"every server write to all the luns"/_ > its intended or not? > > Thanks in advance, > Salvatore > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at gpfsug.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > From kgunda at in.ibm.com Wed Nov 5 10:25:07 2014 From: kgunda at in.ibm.com (Kalyan Gunda) Date: Wed, 5 Nov 2014 15:55:07 +0530 Subject: [gpfsug-discuss] maybe a silly question about "old school" gpfs In-Reply-To: <5459F8DF.2090806@ebi.ac.uk> References: <5459F8DF.2090806@ebi.ac.uk> Message-ID: In case of SAN connectivity, all nodes can write to disks. This avoids going over the network to get to disks. Only when local access isn't present either due to connectivity or zoning will it use the defined NSD server. If there is a need to have the node always use a NSD server, you can enforce it via mount option -o usensdserver=always If the first nsd server is down, it will use the next NSD server in the list. In general NSD servers are a priority list of servers rather than a primary/secondary config which is the case when using native raid. Also note that multiple nodes accessing the same disk will not cause corruption as higher level token mgmt in GPFS will take care of data consistency. Regards Kalyan C Gunda STSM, Elastic Storage Development Member of The IBM Academy of Technology EGL D Block, Bangalore From: Salvatore Di Nardo To: gpfsug main discussion list Date: 11/05/2014 03:44 PM Subject: [gpfsug-discuss] maybe a silly question about "old school" gpfs Sent by: gpfsug-discuss-bounces at gpfsug.org Hello again, to understand better GPFS, recently i build up an test gpfs cluster using some old hardware that was going to be retired. THe storage was SAN devices, so instead to use native raids I went for the old school gpfs. the configuration is basically: 3x servers 3x san storages 2x san switches I did no zoning, so all the servers can see all the LUNs, but on nsd creation I gave each LUN a primary, secondary and third server. with the following rule: |-------------------+---------------+--------------------+---------------| |STORAGE |primary |secondary |tertiary | |-------------------+---------------+--------------------+---------------| |storage1 |server1 |server2 |server3 | |-------------------+---------------+--------------------+---------------| |storage2 |server2 |server3 |server1 | |-------------------+---------------+--------------------+---------------| |storage3 |server3 |server1 |server2 | |-------------------+---------------+--------------------+---------------| looking at the mmcrnsd, it was my understanding that the primary server is the one that wrote on the NSD unless it fails, then the following server take the ownership of the lun. Now come the question: when i did from server 1 a dd surprisingly i discovered that server1 was writing to all the luns. the other 2 server was doing nothing. this behaviour surprises me because on GSS only the RG owner can write, so one server "ask" the other server to write to his own RG's.In fact on GSS can be seen a lot of ETH traffic and io/s on each server. While i understand that the situation it's different I'm puzzled about the fact that all the servers seems able to write to all the luns. SAN deviced usually should be connected to one server only, as paralled access could create data corruption. In environments where you connect a SAN to multiple servers ( example VMWARE cloud) its softeware task to avoid data overwriting between server ( and data corruption ). Honestly, what? i was expecting is: server1 writing on his own luns, and data traffic ( ethernet) to the other 2 server , basically asking them to write on the other luns. I dont know if this behaviour its normal or not. I triied to find a documentation about that, but could not find any. Could somebody? tell me if this "every server write to all the luns" its intended or not? Thanks in advance, Salvatore_______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss From sdinardo at ebi.ac.uk Wed Nov 5 10:33:57 2014 From: sdinardo at ebi.ac.uk (Salvatore Di Nardo) Date: Wed, 05 Nov 2014 10:33:57 +0000 Subject: [gpfsug-discuss] maybe a silly question about "old school" gpfs In-Reply-To: <3F74C441-C25D-4F19-AD05-04AD897A08D3@gmail.com> References: <5459F8DF.2090806@ebi.ac.uk> <3F74C441-C25D-4F19-AD05-04AD897A08D3@gmail.com> Message-ID: <5459FD15.3070105@ebi.ac.uk> I understand that my test its a bit particular because the client was also one of the servers. Usually clients don't have direct access to the storages, but still it made think, hot the things are supposed to work. For example i did another test with 3 dd's, one each server. All the servers was writing to all the luns. In other words a lun was accessed in parallel by 3 servers. Its that a problem, or gpfs manage properly the concurrency and avoid data corruption? I'm asking because i was not expecting a server to write to an NSD he doesn't own, even if its locally available. I thought that the general availablity was for failover, not for parallel access. Regards, Salvatore On 05/11/14 10:22, Vic Cornell wrote: > Hi Salvatore, > > If you are doing the IO on the NSD server itself and it can see all of > the NSDs it will use its "local? access to write to the LUNS. > > You need some GPFS clients to see the workload spread across all of > the NSD servers. > > Vic > > > >> On 5 Nov 2014, at 10:15, Salvatore Di Nardo > > wrote: >> >> Hello again, >> to understand better GPFS, recently i build up an test gpfs cluster >> using some old hardware that was going to be retired. THe storage was >> SAN devices, so instead to use native raids I went for the old school >> gpfs. the configuration is basically: >> >> 3x servers >> 3x san storages >> 2x san switches >> >> I did no zoning, so all the servers can see all the LUNs, but on nsd >> creation I gave each LUN a primary, secondary and third server. with >> the following rule: >> >> STORAGE >> primary >> secondary >> tertiary >> storage1 >> server1 >> server2 server3 >> storage2 server2 server3 server1 >> storage3 server3 server1 server2 >> >> >> >> looking at the mmcrnsd, it was my understanding that the primary >> server is the one that wrote on the NSD unless it fails, then the >> following server take the ownership of the lun. >> >> Now come the question: >> when i did from server 1 a dd surprisingly i discovered that server1 >> was writing to all the luns. the other 2 server was doing nothing. >> this behaviour surprises me because on GSS only the RG owner can >> write, so one server "ask" the other server to write to his own >> RG's.In fact on GSS can be seen a lot of ETH traffic and io/s on each >> server. While i understand that the situation it's different I'm >> puzzled about the fact that all the servers seems able to write to >> all the luns. >> >> SAN deviced usually should be connected to one server only, as >> paralled access could create data corruption. In environments where >> you connect a SAN to multiple servers ( example VMWARE cloud) its >> softeware task to avoid data overwriting between server ( and data >> corruption ). >> >> Honestly, what i was expecting is: server1 writing on his own luns, >> and data traffic ( ethernet) to the other 2 server , basically asking >> *them* to write on the other luns. I dont know if this behaviour its >> normal or not. I triied to find a documentation about that, but could >> not find any. >> >> Could somebody tell me if this _/"every server write to all the >> luns"/_ its intended or not? >> >> Thanks in advance, >> Salvatore >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at gpfsug.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at gpfsug.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From jonathan at buzzard.me.uk Wed Nov 5 10:38:48 2014 From: jonathan at buzzard.me.uk (Jonathan Buzzard) Date: Wed, 05 Nov 2014 10:38:48 +0000 Subject: [gpfsug-discuss] maybe a silly question about "old school" gpfs In-Reply-To: <5459F8DF.2090806@ebi.ac.uk> References: <5459F8DF.2090806@ebi.ac.uk> Message-ID: <1415183928.3474.4.camel@buzzard.phy.strath.ac.uk> On Wed, 2014-11-05 at 10:15 +0000, Salvatore Di Nardo wrote: [SNIP] > Now come the question: > when i did from server 1 a dd surprisingly i discovered that server1 > was writing to all the luns. the other 2 server was doing nothing. > this behaviour surprises me because on GSS only the RG owner can > write, so one server "ask" the other server to write to his own > RG's.In fact on GSS can be seen a lot of ETH traffic and io/s on each > server. While i understand that the situation it's different I'm > puzzled about the fact that all the servers seems able to write to all > the luns. The difference is that in GSS the NSD servers are in effect doing software RAID on the disks. Therefore they and they alone can write to the NSD. In the traditional setup the NSD is on a RAID device on SAN controller and multiple machines are able to access the block device at the same time with token management in GPFS preventing corruption. I guess from a technical perspective you could have the GSS software RAID distributed between the NSD servers, but that would be rather more complex software and it is no surprise IBM have gone down the easy route. JAB. -- Jonathan A. Buzzard Email: jonathan (at) buzzard.me.uk Fife, United Kingdom. From viccornell at gmail.com Wed Nov 5 10:42:22 2014 From: viccornell at gmail.com (Vic Cornell) Date: Wed, 5 Nov 2014 10:42:22 +0000 Subject: [gpfsug-discuss] maybe a silly question about "old school" gpfs In-Reply-To: <5459FD15.3070105@ebi.ac.uk> References: <5459F8DF.2090806@ebi.ac.uk> <3F74C441-C25D-4F19-AD05-04AD897A08D3@gmail.com> <5459FD15.3070105@ebi.ac.uk> Message-ID: <75801708-F65D-4B39-82CA-6DC4FB5AA6EB@gmail.com> > On 5 Nov 2014, at 10:33, Salvatore Di Nardo wrote: > > I understand that my test its a bit particular because the client was also one of the servers. > Usually clients don't have direct access to the storages, but still it made think, hot the things are supposed to work. > > For example i did another test with 3 dd's, one each server. All the servers was writing to all the luns. > In other words a lun was accessed in parallel by 3 servers. > > Its that a problem, or gpfs manage properly the concurrency and avoid data corruption? Its not a problem if you use locks. Remember the clients - even the ones running on the NSD servers are talking to the filesystem - not to the LUNS/NSDs directly. It is the NSD processes that talk to the NSDs. So loosely speaking it is as if all of the processes you are running were running on a single system with a local filesystem So yes - gpfs is designed to manage the problems created by having a distributed, shared filesystem, and does a pretty good job IMHO. > I'm asking because i was not expecting a server to write to an NSD he doesn't own, even if its locally available. > I thought that the general availablity was for failover, not for parallel access. Bear in mind that GPFS supports a number of access models, one of which is where all of the systems in the cluster have access to all of the disks. So parallel access is most commonly used for failover, but that is not the limit of its capabilities. Vic > > > Regards, > Salvatore > > > > On 05/11/14 10:22, Vic Cornell wrote: >> Hi Salvatore, >> >> If you are doing the IO on the NSD server itself and it can see all of the NSDs it will use its "local? access to write to the LUNS. >> >> You need some GPFS clients to see the workload spread across all of the NSD servers. >> >> Vic >> >> >> >>> On 5 Nov 2014, at 10:15, Salvatore Di Nardo > wrote: >>> >>> Hello again, >>> to understand better GPFS, recently i build up an test gpfs cluster using some old hardware that was going to be retired. THe storage was SAN devices, so instead to use native raids I went for the old school gpfs. the configuration is basically: >>> >>> 3x servers >>> 3x san storages >>> 2x san switches >>> >>> I did no zoning, so all the servers can see all the LUNs, but on nsd creation I gave each LUN a primary, secondary and third server. with the following rule: >>> >>> STORAGE >>> primary >>> secondary >>> tertiary >>> storage1 >>> server1 >>> server2 server3 >>> storage2 server2 server3 server1 >>> storage3 server3 server1 server2 >>> >>> looking at the mmcrnsd, it was my understanding that the primary server is the one that wrote on the NSD unless it fails, then the following server take the ownership of the lun. >>> >>> Now come the question: >>> when i did from server 1 a dd surprisingly i discovered that server1 was writing to all the luns. the other 2 server was doing nothing. this behaviour surprises me because on GSS only the RG owner can write, so one server "ask" the other server to write to his own RG's.In fact on GSS can be seen a lot of ETH traffic and io/s on each server. While i understand that the situation it's different I'm puzzled about the fact that all the servers seems able to write to all the luns. >>> >>> SAN deviced usually should be connected to one server only, as paralled access could create data corruption. In environments where you connect a SAN to multiple servers ( example VMWARE cloud) its softeware task to avoid data overwriting between server ( and data corruption ). >>> >>> Honestly, what i was expecting is: server1 writing on his own luns, and data traffic ( ethernet) to the other 2 server , basically asking them to write on the other luns. I dont know if this behaviour its normal or not. I triied to find a documentation about that, but could not find any. >>> >>> Could somebody tell me if this "every server write to all the luns" its intended or not? >>> >>> Thanks in advance, >>> Salvatore >>> _______________________________________________ >>> gpfsug-discuss mailing list >>> gpfsug-discuss at gpfsug.org >>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >> >> >> >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at gpfsug.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at gpfsug.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From sdinardo at ebi.ac.uk Wed Nov 5 10:46:52 2014 From: sdinardo at ebi.ac.uk (Salvatore Di Nardo) Date: Wed, 05 Nov 2014 10:46:52 +0000 Subject: [gpfsug-discuss] maybe a silly question about "old school" gpfs In-Reply-To: References: <5459F8DF.2090806@ebi.ac.uk> Message-ID: <545A001C.1040908@ebi.ac.uk> On 05/11/14 10:25, Kalyan Gunda wrote: > Also note that multiple nodes accessing the same disk will not cause > corruption as higher level token mgmt in GPFS will take care of data > consistency. This is exactly what I wanted to be sure. Thanks! From ewahl at osc.edu Wed Nov 5 13:56:38 2014 From: ewahl at osc.edu (Ed Wahl) Date: Wed, 5 Nov 2014 13:56:38 +0000 Subject: [gpfsug-discuss] maybe a silly question about "old school" gpfs In-Reply-To: <545A001C.1040908@ebi.ac.uk> References: <5459F8DF.2090806@ebi.ac.uk> , <545A001C.1040908@ebi.ac.uk> Message-ID: You can designate how many of the nodes do token management as well. mmlscluster should show which are "manager"s. Under some circumstances you may want to increase the defaults on heavily used file systems using mmchnode, especially with few NSDs and many writers. Ed Wahl OSC ________________________________________ From: gpfsug-discuss-bounces at gpfsug.org [gpfsug-discuss-bounces at gpfsug.org] on behalf of Salvatore Di Nardo [sdinardo at ebi.ac.uk] Sent: Wednesday, November 05, 2014 5:46 AM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] maybe a silly question about "old school" gpfs On 05/11/14 10:25, Kalyan Gunda wrote: > Also note that multiple nodes accessing the same disk will not cause > corruption as higher level token mgmt in GPFS will take care of data > consistency. This is exactly what I wanted to be sure. Thanks! _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss From pavel.pokorny at datera.cz Fri Nov 7 11:15:34 2014 From: pavel.pokorny at datera.cz (Pavel Pokorny) Date: Fri, 7 Nov 2014 12:15:34 +0100 Subject: [gpfsug-discuss] GPFS - pagepool data protection? Message-ID: Hello to all, I would like to ask question about pagepool and protection of data written through pagepool. Is there a possibility of loosing data written to GPFS in situation that data are stored in pagepool but still not written to disks? I think that for regular file system work this can be solved using GPFS journal. What about using GPFS as a NFS store for VMware datastores? Thank you for your answers, Pavel -- Ing. Pavel Pokorn? DATERA s.r.o. | Ovocn? trh 580/2 | Praha | Czech Republic www.datera.cz | Mobil: +420 602 357 194 | E-mail: pavel.pokorny at datera.cz -------------- next part -------------- An HTML attachment was scrubbed... URL: From lhorrocks-barlow at ocf.co.uk Wed Nov 5 10:47:06 2014 From: lhorrocks-barlow at ocf.co.uk (Laurence Horrocks- Barlow) Date: Wed, 5 Nov 2014 10:47:06 +0000 Subject: [gpfsug-discuss] maybe a silly question about "old school" gpfs In-Reply-To: <5459FD15.3070105@ebi.ac.uk> References: <5459F8DF.2090806@ebi.ac.uk> <3F74C441-C25D-4F19-AD05-04AD897A08D3@gmail.com> <5459FD15.3070105@ebi.ac.uk> Message-ID: <545A002A.4080301@ocf.co.uk> Hi Salvatore, GSS and GPFS systems are different beasts. In a traditional GPFS configuration I would expect any NSD server to write to any/all LUN's that it can see as a local disk providing it's part of the same FS. In GSS there is effectively a software RAID level added on top of the disks, with this I would expect only the RG owner to write down to the vdisk. As for corruption, GPFS uses a token system to manage access to LUN's, Metadata, etc. Kind Regards, Laurence Horrocks-Barlow Linux Systems Software Engineer OCF plc Tel: +44 (0)114 257 2200 Fax: +44 (0)114 257 0022 Web: www.ocf.co.uk Blog: blog.ocf.co.uk Twitter: @ocfplc OCF plc is a company registered in England and Wales. Registered number 4132533, VAT number GB 780 6803 14. Registered office address: OCF plc, 5 Rotunda Business Centre, Thorncliffe Park, Chapeltown, Sheffield, S35 2PG. This message is private and confidential. If you have received this message in error, please notify us and remove it from your system. On 11/05/2014 10:33 AM, Salvatore Di Nardo wrote: > I understand that my test its a bit particular because the client was > also one of the servers. > Usually clients don't have direct access to the storages, but still it > made think, hot the things are supposed to work. > > For example i did another test with 3 dd's, one each server. All the > servers was writing to all the luns. > In other words a lun was accessed in parallel by 3 servers. > > Its that a problem, or gpfs manage properly the concurrency and avoid > data corruption? > I'm asking because i was not expecting a server to write to an NSD he > doesn't own, even if its locally available. > I thought that the general availablity was for failover, not for > parallel access. > > > Regards, > Salvatore > > > > On 05/11/14 10:22, Vic Cornell wrote: >> Hi Salvatore, >> >> If you are doing the IO on the NSD server itself and it can see all >> of the NSDs it will use its "local? access to write to the LUNS. >> >> You need some GPFS clients to see the workload spread across all of >> the NSD servers. >> >> Vic >> >> >> >>> On 5 Nov 2014, at 10:15, Salvatore Di Nardo >> > wrote: >>> >>> Hello again, >>> to understand better GPFS, recently i build up an test gpfs cluster >>> using some old hardware that was going to be retired. THe storage >>> was SAN devices, so instead to use native raids I went for the old >>> school gpfs. the configuration is basically: >>> >>> 3x servers >>> 3x san storages >>> 2x san switches >>> >>> I did no zoning, so all the servers can see all the LUNs, but on nsd >>> creation I gave each LUN a primary, secondary and third server. with >>> the following rule: >>> >>> STORAGE >>> primary >>> secondary >>> tertiary >>> storage1 >>> server1 >>> server2 server3 >>> storage2 server2 server3 server1 >>> storage3 server3 server1 server2 >>> >>> >>> >>> looking at the mmcrnsd, it was my understanding that the primary >>> server is the one that wrote on the NSD unless it fails, then the >>> following server take the ownership of the lun. >>> >>> Now come the question: >>> when i did from server 1 a dd surprisingly i discovered that server1 >>> was writing to all the luns. the other 2 server was doing nothing. >>> this behaviour surprises me because on GSS only the RG owner can >>> write, so one server "ask" the other server to write to his own >>> RG's.In fact on GSS can be seen a lot of ETH traffic and io/s on >>> each server. While i understand that the situation it's different >>> I'm puzzled about the fact that all the servers seems able to write >>> to all the luns. >>> >>> SAN deviced usually should be connected to one server only, as >>> paralled access could create data corruption. In environments where >>> you connect a SAN to multiple servers ( example VMWARE cloud) its >>> softeware task to avoid data overwriting between server ( and data >>> corruption ). >>> >>> Honestly, what i was expecting is: server1 writing on his own luns, >>> and data traffic ( ethernet) to the other 2 server , basically >>> asking *them* to write on the other luns. I dont know if this >>> behaviour its normal or not. I triied to find a documentation about >>> that, but could not find any. >>> >>> Could somebody tell me if this _/"every server write to all the >>> luns"/_ its intended or not? >>> >>> Thanks in advance, >>> Salvatore >>> _______________________________________________ >>> gpfsug-discuss mailing list >>> gpfsug-discuss at gpfsug.org >>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >> >> >> >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at gpfsug.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: From dhildeb at us.ibm.com Fri Nov 7 22:42:06 2014 From: dhildeb at us.ibm.com (Dean Hildebrand) Date: Fri, 7 Nov 2014 23:42:06 +0100 Subject: [gpfsug-discuss] GPFS - pagepool data protection? In-Reply-To: References: Message-ID: Hi Paul, GPFS correctly implements POSIX semantics and NFS close-to-open semantics. Its a little complicated, but effectively what this means is that when the application issues certain calls to ensure data/metadata is "stable" (e.g., fsync), then it is guaranteed to be stable. It also controls ordering between nodes among many other things. As part of making sure data is stable, the GPFS recovery journal is used in a variety of instances. With VMWare ESX using NFS to GPFS, then the same thing occurs, except the situation is even more simple since every write request will have the 'stable' flag set, ensuring it does writethrough to the storage system. Dean Hildebrand IBM Almaden Research Center From: Pavel Pokorny To: gpfsug-discuss at gpfsug.org Date: 11/07/2014 03:15 AM Subject: [gpfsug-discuss] GPFS - pagepool data protection? Sent by: gpfsug-discuss-bounces at gpfsug.org Hello to all, I would like to ask question about pagepool and protection of data written through pagepool. Is there a possibility of loosing data written to GPFS in situation that data are stored in pagepool but still not written to disks? I think that for regular file system work this can be solved using GPFS journal. What about using GPFS as a NFS store for VMware datastores? Thank you for your answers, Pavel -- Ing. Pavel Pokorn? DATERA s.r.o.?|?Ovocn? trh 580/2?|?Praha?|?Czech Republic www.datera.cz?|?Mobil:?+420 602 357 194?|?E-mail:?pavel.pokorny at datera.cz _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: From jamiedavis at us.ibm.com Sat Nov 8 23:13:17 2014 From: jamiedavis at us.ibm.com (James Davis) Date: Sat, 8 Nov 2014 18:13:17 -0500 Subject: [gpfsug-discuss] Hi everybody Message-ID: Hey all, My name is Jamie Davis and I work for IBM on the GPFS test team. I'm interested in learning more about how customers use GPFS and what typical questions and issues are like, and I thought joining this mailing list would be a good start. If my presence seems inappropriate or makes anyone uncomfortable I can leave the list. --- I don't know how often GPFS users look in /usr/lpp/mmfs/samples...but while I'm sending a mass email, I thought I'd take a moment to point anyone running GPFS 4.1.0.4 to /usr/lpp/mmfs/samples/ilm/mmfind. mmfind is basically a find-esque wrapper around mmapplypolicy that I wrote in response to complaints I've heard about the learning curve associated with writing policies for mmapplypolicy. Since it's in samples, use-at-your-own-risk and I make no promise that everything works correctly. The -skipPolicy and -saveTmpFiles flags will do everything but actually run mmapplypolicy -- I suggest you double-check its work before you run it on a production system. Please send me any comments on it if you give it a try! Jamie Davis GPFS Test IBM -------------- next part -------------- An HTML attachment was scrubbed... URL: From chair at gpfsug.org Mon Nov 10 16:18:24 2014 From: chair at gpfsug.org (Jez Tucker) Date: Mon, 10 Nov 2014 16:18:24 +0000 Subject: [gpfsug-discuss] SC 14 and storagebeers events this week Message-ID: <5460E550.8020705@gpfsug.org> Hi all Just a quick reminder that the IBM GPFS User Group is at SC '14 in New Orleans Nov 17th. Also, there's a social in London W1 - #storagebeers on Nov 13th. For more info on both of these, please see the main website: www.gpfsug.org Best, Jez -------------- next part -------------- An HTML attachment was scrubbed... URL: From chair at gpfsug.org Tue Nov 11 13:59:38 2014 From: chair at gpfsug.org (Jez Tucker (Chair)) Date: Tue, 11 Nov 2014 13:59:38 +0000 Subject: [gpfsug-discuss] storagebeers postponed Message-ID: <5462164A.70607@gpfsug.org> Hi all I've just received notification that #storagebeers, due to happen 13th Nov, has unfortunately had to be postponed. I'll update you all with a new date when I receive it. Very best, Jez From jez at rib-it.org Tue Nov 11 16:49:48 2014 From: jez at rib-it.org (Jez Tucker) Date: Tue, 11 Nov 2014 16:49:48 +0000 Subject: [gpfsug-discuss] Hi everybody In-Reply-To: References: Message-ID: <54623E2C.2070903@rib-it.org> Hi Jamie, You're indeed very welcome. A few of the IBM devs are list members and their presence is appreciated. I suggest if you want to know more regarding use cases etc., ask some pointed questions. Discussion is good. Jez On 08/11/14 23:13, James Davis wrote: > > Hey all, > > My name is Jamie Davis and I work for IBM on the GPFS test team. I'm > interested in learning more about how customers use GPFS and what > typical questions and issues are like, and I thought joining this > mailing list would be a good start. If my presence seems inappropriate > or makes anyone uncomfortable I can leave the list. > > --- > > I don't know how often GPFS users look in /usr/lpp/mmfs/samples...but > while I'm sending a mass email, I thought I'd take a moment to point > anyone running GPFS 4.1.0.4 to /usr/lpp/mmfs/samples/ilm/mmfind. > mmfind is basically a find-esque wrapper around mmapplypolicy that I > wrote in response to complaints I've heard about the learning curve > associated with writing policies for mmapplypolicy. Since it's in > samples, use-at-your-own-risk and I make no promise that everything > works correctly. The -skipPolicy and -saveTmpFiles flags will do > everything but actually run mmapplypolicy -- I suggest you > double-check its work before you run it on a production system. > > Please send me any comments on it if you give it a try! > > Jamie Davis > GPFS Test > IBM > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at gpfsug.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From pavel.pokorny at datera.cz Wed Nov 12 12:20:57 2014 From: pavel.pokorny at datera.cz (Pavel Pokorny) Date: Wed, 12 Nov 2014 13:20:57 +0100 Subject: [gpfsug-discuss] GPFS - pagepool data protection? Message-ID: Hi, thanks. A I understand the write process to GPFS filesystem: 1. Application on a node makes write call 2. Token Manager stuff is done to coordinate the required-byte-range 3. mmfsd gets metadata from the file?s metanode 4. mmfsd acquires a buffer from the page pool 5. Data is moved from application data buffer to page pool buffer 6. VSD layer copies data from the page pool to the send pool and so on. What I am looking at and want to clarify is step 5. Situation when data is moved to page pool. What happen if the server will crash at tjis point? Will GPFS use journal to get to stable state? Thank you, Pavel -- Ing. Pavel Pokorn? DATERA s.r.o. | Ovocn? trh 580/2 | Praha | Czech Republic www.datera.cz | Mobil: +420 602 357 194 | E-mail: pavel.pokorny at datera.cz On Sat, Nov 8, 2014 at 1:00 PM, wrote: > Send gpfsug-discuss mailing list submissions to > gpfsug-discuss at gpfsug.org > > To subscribe or unsubscribe via the World Wide Web, visit > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > or, via email, send a message with subject or body 'help' to > gpfsug-discuss-request at gpfsug.org > > You can reach the person managing the list at > gpfsug-discuss-owner at gpfsug.org > > When replying, please edit your Subject line so it is more specific > than "Re: Contents of gpfsug-discuss digest..." > > > Today's Topics: > > 1. Re: GPFS - pagepool data protection? (Dean Hildebrand) > > > ---------------------------------------------------------------------- > > Message: 1 > Date: Fri, 7 Nov 2014 23:42:06 +0100 > From: Dean Hildebrand > To: gpfsug main discussion list > Subject: Re: [gpfsug-discuss] GPFS - pagepool data protection? > Message-ID: > < > OF1ED92A57.DD700837-ONC1257D89.007C4EF1-88257D89.007CB453 at us.ibm.com> > Content-Type: text/plain; charset="iso-8859-1" > > > Hi Paul, > > GPFS correctly implements POSIX semantics and NFS close-to-open semantics. > Its a little complicated, but effectively what this means is that when the > application issues certain calls to ensure data/metadata is "stable" (e.g., > fsync), then it is guaranteed to be stable. It also controls ordering > between nodes among many other things. As part of making sure data is > stable, the GPFS recovery journal is used in a variety of instances. > > With VMWare ESX using NFS to GPFS, then the same thing occurs, except the > situation is even more simple since every write request will have the > 'stable' flag set, ensuring it does writethrough to the storage system. > > Dean Hildebrand > IBM Almaden Research Center > > > > > From: Pavel Pokorny > To: gpfsug-discuss at gpfsug.org > Date: 11/07/2014 03:15 AM > Subject: [gpfsug-discuss] GPFS - pagepool data protection? > Sent by: gpfsug-discuss-bounces at gpfsug.org > > > > Hello to all, > I would like to ask question about pagepool and protection of data written > through pagepool. > Is there a possibility of loosing data written to GPFS in situation that > data are stored in pagepool but still not written to disks? > I think that for regular file system work this can be solved using GPFS > journal. What about using GPFS as a NFS store for VMware datastores? > Thank you for your answers, > Pavel > -- > Ing. Pavel Pokorn? > DATERA s.r.o.?|?Ovocn? trh 580/2?|?Praha?|?Czech Republic > www.datera.cz?|?Mobil:?+420 602 357 194?|?E-mail:?pavel.pokorny at datera.cz > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at gpfsug.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -------------- next part -------------- > An HTML attachment was scrubbed... > URL: < > http://gpfsug.org/pipermail/gpfsug-discuss/attachments/20141107/ecec5a47/attachment-0001.html > > > -------------- next part -------------- > A non-text attachment was scrubbed... > Name: graycol.gif > Type: image/gif > Size: 105 bytes > Desc: not available > URL: < > http://gpfsug.org/pipermail/gpfsug-discuss/attachments/20141107/ecec5a47/attachment-0001.gif > > > > ------------------------------ > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at gpfsug.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > End of gpfsug-discuss Digest, Vol 34, Issue 7 > ********************************************* > -------------- next part -------------- An HTML attachment was scrubbed... URL: From kraemerf at de.ibm.com Wed Nov 12 14:05:03 2014 From: kraemerf at de.ibm.com (Frank Kraemer) Date: Wed, 12 Nov 2014 15:05:03 +0100 Subject: [gpfsug-discuss] IBM Software Defined Infrastructure Roadshow 2014 (Frankfurt, London & Paris) Message-ID: FYI: IBM Software Defined Infrastructure Roadshow 2014 (Frankfurt 02. Dec 2014, London 03. Dec 2014 & Paris 04. Dec 2014) https://www-950.ibm.com/events/wwe/grp/grp019.nsf/v17_events?openform&lp=platform_computing_roadshow&locale=en_GB P.S. The German GPFS technical team will be available for discussions in Frankfurt. Feel free to contact me. -frank- Frank Kraemer IBM Consulting IT Specialist / Client Technical Architect Hechtsheimer Str. 2, 55131 Mainz mailto:kraemerf at de.ibm.com voice: +49171-3043699 IBM Germany From dhildeb at us.ibm.com Sat Nov 15 20:31:53 2014 From: dhildeb at us.ibm.com (Dean Hildebrand) Date: Sat, 15 Nov 2014 12:31:53 -0800 Subject: [gpfsug-discuss] GPFS - pagepool data protection? In-Reply-To: References: Message-ID: Hi Pavel, You are more or less right in your description, but the key that I tried to convey in my first email is that GPFS only obey's POSIX. So your question can be answered by looking at how your application performs the write and does your application ask to make the data live only in the pagepool or on stable storage. By default posix says that file create and writes are unstable, so just doing a write puts it in the pagepool and will be lost if a crash occurs immediately after. To make it stable, the application must do something in posix to make it stable, of which there are many ways to do so, including but not limited to O_SYNC, DIO, some form of fsync post write, etc, etc... Dean Hildebrand IBM Almaden Research Center From: Pavel Pokorny To: gpfsug-discuss at gpfsug.org Date: 11/12/2014 04:21 AM Subject: Re: [gpfsug-discuss] GPFS - pagepool data protection? Sent by: gpfsug-discuss-bounces at gpfsug.org Hi, thanks. A I understand the write process to GPFS filesystem: 1.?Application on a node makes write call 2.?Token Manager stuff is done to coordinate the required-byte-range 3.?mmfsd gets metadata from the file?s metanode 4.?mmfsd acquires a buffer from the page pool 5.?Data is moved from application data buffer to page pool buffer 6. VSD layer copies data from the page pool to the send pool ?and so on. What I am looking at and want to clarify is step 5. Situation when data is moved to page pool. What happen if the server will crash at tjis point? Will GPFS use journal to get to stable state? Thank you, Pavel -- Ing. Pavel Pokorn? DATERA s.r.o.?|?Ovocn? trh 580/2?|?Praha?|?Czech Republic www.datera.cz?|?Mobil:?+420 602 357 194?|?E-mail:?pavel.pokorny at datera.cz On Sat, Nov 8, 2014 at 1:00 PM, wrote: Send gpfsug-discuss mailing list submissions to ? ? ? ? gpfsug-discuss at gpfsug.org To subscribe or unsubscribe via the World Wide Web, visit ? ? ? ? http://gpfsug.org/mailman/listinfo/gpfsug-discuss or, via email, send a message with subject or body 'help' to ? ? ? ? gpfsug-discuss-request at gpfsug.org You can reach the person managing the list at ? ? ? ? gpfsug-discuss-owner at gpfsug.org When replying, please edit your Subject line so it is more specific than "Re: Contents of gpfsug-discuss digest..." Today's Topics: ? ?1. Re: GPFS - pagepool data protection? (Dean Hildebrand) ---------------------------------------------------------------------- Message: 1 Date: Fri, 7 Nov 2014 23:42:06 +0100 From: Dean Hildebrand To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] GPFS - pagepool data protection? Message-ID: ? ? ? ? < OF1ED92A57.DD700837-ONC1257D89.007C4EF1-88257D89.007CB453 at us.ibm.com> Content-Type: text/plain; charset="iso-8859-1" Hi Paul, GPFS correctly implements POSIX semantics and NFS close-to-open semantics. Its a little complicated, but effectively what this means is that when the application issues certain calls to ensure data/metadata is "stable" (e.g., fsync), then it is guaranteed to be stable.? It also controls ordering between nodes among many other things.? As part of making sure data is stable, the GPFS recovery journal is used in a variety of instances. With VMWare ESX using NFS to GPFS, then the same thing occurs, except the situation is even more simple since every write request will have the 'stable' flag set, ensuring it does writethrough to the storage system. Dean Hildebrand IBM Almaden Research Center From:? ?Pavel Pokorny To:? ? ?gpfsug-discuss at gpfsug.org Date:? ?11/07/2014 03:15 AM Subject:? ? ? ? [gpfsug-discuss] GPFS - pagepool data protection? Sent by:? ? ? ? gpfsug-discuss-bounces at gpfsug.org Hello to all, I would like to ask question about pagepool and protection of data written through pagepool. Is there a possibility of loosing data written to GPFS in situation that data are stored in pagepool but still not written to disks? I think that for regular file system work this can be solved using GPFS journal. What about using GPFS as a NFS store for VMware datastores? Thank you for your answers, Pavel -- Ing. Pavel Pokorn? DATERA s.r.o.?|?Ovocn? trh 580/2?|?Praha?|?Czech Republic www.datera.cz?|?Mobil:?+420 602 357 194?|?E-mail:?pavel.pokorny at datera.cz _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: < http://gpfsug.org/pipermail/gpfsug-discuss/attachments/20141107/ecec5a47/attachment-0001.html > -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: < http://gpfsug.org/pipermail/gpfsug-discuss/attachments/20141107/ecec5a47/attachment-0001.gif > ------------------------------ _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss End of gpfsug-discuss Digest, Vol 34, Issue 7 ********************************************* _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: From seanlee at tw.ibm.com Mon Nov 17 09:49:39 2014 From: seanlee at tw.ibm.com (Sean S Lee) Date: Mon, 17 Nov 2014 17:49:39 +0800 Subject: [gpfsug-discuss] GPFS - pagepool data protection? In-Reply-To: References: Message-ID: Hi Pavel, Most popular filesystems work that way. Write buffering improves the performance at the expense of some risk. Today most applications and all modern OS correctly handle "crash consistency", meaning they can recover from uncommitted writes. If you have data which absolutely cannot tolerate any "in-flight" data loss, it requires significant planning and resources on multiple levels, but as far as GPFS is concerned you could create a small file system and data (VMDK's) or serve GPFS or cNFS (mount GPFS with "syncfs", mount NFS with sync,no_wdelay) to VM clients from those filesystems. Your VM OS (VMDK) could be on a regular GPFS file system and your app data and logs could be on a small GPFS with synchronous writes. Regards Sean -------------- next part -------------- An HTML attachment was scrubbed... URL: From pavel.pokorny at datera.cz Mon Nov 17 12:49:26 2014 From: pavel.pokorny at datera.cz (Pavel Pokorny) Date: Mon, 17 Nov 2014 13:49:26 +0100 Subject: [gpfsug-discuss] GPFS - pagepool data protection? Message-ID: Hello, thanks you for all the answers, It is more clear now. Regards, Pavel -- Ing. Pavel Pokorn? DATERA s.r.o. | Ovocn? trh 580/2 | Praha | Czech Republic www.datera.cz | Mobil: +420 602 357 194 | E-mail: pavel.pokorny at datera.cz On Mon, Nov 17, 2014 at 1:00 PM, wrote: > Send gpfsug-discuss mailing list submissions to > gpfsug-discuss at gpfsug.org > > To subscribe or unsubscribe via the World Wide Web, visit > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > or, via email, send a message with subject or body 'help' to > gpfsug-discuss-request at gpfsug.org > > You can reach the person managing the list at > gpfsug-discuss-owner at gpfsug.org > > When replying, please edit your Subject line so it is more specific > than "Re: Contents of gpfsug-discuss digest..." > > > Today's Topics: > > 1. GPFS - pagepool data protection? (Sean S Lee) > > > ---------------------------------------------------------------------- > > Message: 1 > Date: Mon, 17 Nov 2014 17:49:39 +0800 > From: Sean S Lee > To: gpfsug-discuss at gpfsug.org > Subject: [gpfsug-discuss] GPFS - pagepool data protection? > Message-ID: > < > OF20A72494.9E59B93F-ON48257D93.00350BA6-48257D93.0035F912 at tw.ibm.com> > Content-Type: text/plain; charset="us-ascii" > > > Hi Pavel, > > Most popular filesystems work that way. > > Write buffering improves the performance at the expense of some risk. > Today most applications and all modern OS correctly handle "crash > consistency", meaning they can recover from uncommitted writes. > > If you have data which absolutely cannot tolerate any "in-flight" data > loss, it requires significant planning and resources on multiple levels, > but as far as GPFS is concerned you could create a small file system and > data (VMDK's) or serve GPFS or cNFS (mount GPFS with "syncfs", mount NFS > with sync,no_wdelay) to VM clients from those filesystems. > Your VM OS (VMDK) could be on a regular GPFS file system and your app data > and logs could be on a small GPFS with synchronous writes. > > Regards > Sean > -------------- next part -------------- > An HTML attachment was scrubbed... > URL: < > http://gpfsug.org/pipermail/gpfsug-discuss/attachments/20141117/1eb905cc/attachment-0001.html > > > > ------------------------------ > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at gpfsug.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > End of gpfsug-discuss Digest, Vol 34, Issue 13 > ********************************************** > -------------- next part -------------- An HTML attachment was scrubbed... URL: From orlando.richards at ed.ac.uk Wed Nov 19 16:35:44 2014 From: orlando.richards at ed.ac.uk (Orlando Richards) Date: Wed, 19 Nov 2014 16:35:44 +0000 Subject: [gpfsug-discuss] GPFS inside OpenStack guests Message-ID: <546CC6E0.1010800@ed.ac.uk> Hi folks, Does anyone have experience of running GPFS inside OpenStack guests, to connect to an existing (traditional, "bare metal") GPFS filesystem owning cluster? This is not using GPFS for openstack block/image storage - but using GPFS as a "NAS" service, with openstack guest instances as as a "GPFS client". --- Orlando -- -- Dr Orlando Richards Research Facilities (ECDF) Systems Leader Information Services IT Infrastructure Division Tel: 0131 650 4994 skype: orlando.richards The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336. From S.J.Thompson at bham.ac.uk Wed Nov 19 18:36:30 2014 From: S.J.Thompson at bham.ac.uk (Simon Thompson (Research Computing - IT Services)) Date: Wed, 19 Nov 2014 18:36:30 +0000 Subject: [gpfsug-discuss] GPFS inside OpenStack guests In-Reply-To: <546CC6E0.1010800@ed.ac.uk> References: <546CC6E0.1010800@ed.ac.uk> Message-ID: I was asking this question at the gpfs forum on Monday at sc, but there didn't seem to be much in how wr could do it. One of the suggestions was to basically use nfs, or there is the Manilla compnents of Openstack coming, but still that isn't really true gpfs access. I did wonder about virtio, but whether that would work with gpfs passed from the hosting system. Simon ________________________________________ From: gpfsug-discuss-bounces at gpfsug.org [gpfsug-discuss-bounces at gpfsug.org] on behalf of Orlando Richards [orlando.richards at ed.ac.uk] Sent: 19 November 2014 16:35 To: gpfsug-discuss at gpfsug.org Subject: [gpfsug-discuss] GPFS inside OpenStack guests Hi folks, Does anyone have experience of running GPFS inside OpenStack guests, to connect to an existing (traditional, "bare metal") GPFS filesystem owning cluster? This is not using GPFS for openstack block/image storage - but using GPFS as a "NAS" service, with openstack guest instances as as a "GPFS client". --- Orlando -- -- Dr Orlando Richards Research Facilities (ECDF) Systems Leader Information Services IT Infrastructure Division Tel: 0131 650 4994 skype: orlando.richards The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss From oehmes at gmail.com Wed Nov 19 19:00:50 2014 From: oehmes at gmail.com (Sven Oehme) Date: Wed, 19 Nov 2014 11:00:50 -0800 Subject: [gpfsug-discuss] GPFS inside OpenStack guests In-Reply-To: <546CC6E0.1010800@ed.ac.uk> References: <546CC6E0.1010800@ed.ac.uk> Message-ID: technically there are multiple ways to do this. 1. you can use the NSD protocol for this, just need to have adequate Network resources (or use PCI pass trough of the network adapter to the guest) 2. you attach the physical disks as virtio block devices 3. pass trough of the Block HBA (e.g. FC adapter) into the guest. if you use virtio you need to make sure all caching is disabled entirely or you end up with major issues and i am not sure about official support for this, 1 and 3 are straight forward ... Sven On Wed, Nov 19, 2014 at 8:35 AM, Orlando Richards wrote: > Hi folks, > > Does anyone have experience of running GPFS inside OpenStack guests, to > connect to an existing (traditional, "bare metal") GPFS filesystem owning > cluster? > > This is not using GPFS for openstack block/image storage - but using GPFS > as a "NAS" service, with openstack guest instances as as a "GPFS client". > > > --- > Orlando > > > > > -- > -- > Dr Orlando Richards > Research Facilities (ECDF) Systems Leader > Information Services > IT Infrastructure Division > Tel: 0131 650 4994 > skype: orlando.richards > > The University of Edinburgh is a charitable body, registered in Scotland, > with registration number SC005336. > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at gpfsug.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: From S.J.Thompson at bham.ac.uk Wed Nov 19 19:03:55 2014 From: S.J.Thompson at bham.ac.uk (Simon Thompson (Research Computing - IT Services)) Date: Wed, 19 Nov 2014 19:03:55 +0000 Subject: [gpfsug-discuss] GPFS inside OpenStack guests In-Reply-To: References: <546CC6E0.1010800@ed.ac.uk>, Message-ID: Yes, what about the random name nature of a vm image? For example I spin up a new vm, how does it join the gpfs cluster to be able to use nsd protocol? And how about attaching to the netowkrk as neutron networking uses per tenant networks, so how would you actually get access to the gpfs cluster? Simon ________________________________________ From: gpfsug-discuss-bounces at gpfsug.org [gpfsug-discuss-bounces at gpfsug.org] on behalf of Sven Oehme [oehmes at gmail.com] Sent: 19 November 2014 19:00 To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] GPFS inside OpenStack guests technically there are multiple ways to do this. 1. you can use the NSD protocol for this, just need to have adequate Network resources (or use PCI pass trough of the network adapter to the guest) 2. you attach the physical disks as virtio block devices 3. pass trough of the Block HBA (e.g. FC adapter) into the guest. if you use virtio you need to make sure all caching is disabled entirely or you end up with major issues and i am not sure about official support for this, 1 and 3 are straight forward ... Sven On Wed, Nov 19, 2014 at 8:35 AM, Orlando Richards > wrote: Hi folks, Does anyone have experience of running GPFS inside OpenStack guests, to connect to an existing (traditional, "bare metal") GPFS filesystem owning cluster? This is not using GPFS for openstack block/image storage - but using GPFS as a "NAS" service, with openstack guest instances as as a "GPFS client". --- Orlando -- -- Dr Orlando Richards Research Facilities (ECDF) Systems Leader Information Services IT Infrastructure Division Tel: 0131 650 4994 skype: orlando.richards The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss From chekh at stanford.edu Wed Nov 19 19:37:50 2014 From: chekh at stanford.edu (Alex Chekholko) Date: Wed, 19 Nov 2014 11:37:50 -0800 Subject: [gpfsug-discuss] GPFS inside OpenStack guests In-Reply-To: References: <546CC6E0.1010800@ed.ac.uk>, Message-ID: <546CF18E.3010802@stanford.edu> Just make the new VMs NFS clients, no? It's so much simpler and the performance is not much less. But you do need to run CNFS in the GPFS cluster. On 11/19/14 11:03 AM, Simon Thompson (Research Computing - IT Services) wrote: > > Yes, what about the random name nature of a vm image? > > For example I spin up a new vm, how does it join the gpfs cluster to be able to use nsd protocol? > > And how about attaching to the netowkrk as neutron networking uses per tenant networks, so how would you actually get access to the gpfs cluster? > > Simon > ________________________________________ > From: gpfsug-discuss-bounces at gpfsug.org [gpfsug-discuss-bounces at gpfsug.org] on behalf of Sven Oehme [oehmes at gmail.com] > Sent: 19 November 2014 19:00 > To: gpfsug main discussion list > Subject: Re: [gpfsug-discuss] GPFS inside OpenStack guests > > technically there are multiple ways to do this. > > 1. you can use the NSD protocol for this, just need to have adequate Network resources (or use PCI pass trough of the network adapter to the guest) > 2. you attach the physical disks as virtio block devices > 3. pass trough of the Block HBA (e.g. FC adapter) into the guest. > > if you use virtio you need to make sure all caching is disabled entirely or you end up with major issues and i am not sure about official support for this, 1 and 3 are straight forward ... > > Sven > > > > > > On Wed, Nov 19, 2014 at 8:35 AM, Orlando Richards > wrote: > Hi folks, > > Does anyone have experience of running GPFS inside OpenStack guests, to connect to an existing (traditional, "bare metal") GPFS filesystem owning cluster? > > This is not using GPFS for openstack block/image storage - but using GPFS as a "NAS" service, with openstack guest instances as as a "GPFS client". > > > --- > Orlando > > > > > -- > -- > Dr Orlando Richards > Research Facilities (ECDF) Systems Leader > Information Services > IT Infrastructure Division > Tel: 0131 650 4994 > skype: orlando.richards > > The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336. > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at gpfsug.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at gpfsug.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -- Alex Chekholko chekh at stanford.edu From orlando.richards at ed.ac.uk Wed Nov 19 20:56:32 2014 From: orlando.richards at ed.ac.uk (orlando.richards at ed.ac.uk) Date: Wed, 19 Nov 2014 20:56:32 +0000 (GMT) Subject: [gpfsug-discuss] GPFS inside OpenStack guests In-Reply-To: References: <546CC6E0.1010800@ed.ac.uk> Message-ID: On Wed, 19 Nov 2014, Simon Thompson (Research Computing - IT Services) wrote: > > I was asking this question at the gpfs forum on Monday at sc, but there didn't seem to be much in how wr could do it. > > One of the suggestions was to basically use nfs, or there is the Manilla compnents of Openstack coming, but still that isn't really true gpfs access. > NFS should be easy enough - but you can lose a lot of the gpfs good-ness by doing that (acl's, cloning, performance?, etc). > I did wonder about virtio, but whether that would work with gpfs passed from the hosting system. I was more looking for something fairly native - so that we don't have to, for example, start heavily customising the hypervisor stack. In fact - if you're pushing out to a third-party service provider cloud (and that could be your internal organisation's cloud run as a separate service) then you don't have that option at all. I've not dug into virtio much in a basic kvm hypervisor, but one of the guys in EPCC has been trying it out. Initial impressions (once he got it working!) were tarred by terrible performance. I've not caught up with how he got on after that initial look. > > Simon > ________________________________________ > From: gpfsug-discuss-bounces at gpfsug.org [gpfsug-discuss-bounces at gpfsug.org] on behalf of Orlando Richards [orlando.richards at ed.ac.uk] > Sent: 19 November 2014 16:35 > To: gpfsug-discuss at gpfsug.org > Subject: [gpfsug-discuss] GPFS inside OpenStack guests > > Hi folks, > > Does anyone have experience of running GPFS inside OpenStack guests, to > connect to an existing (traditional, "bare metal") GPFS filesystem > owning cluster? > > This is not using GPFS for openstack block/image storage - but using > GPFS as a "NAS" service, with openstack guest instances as as a "GPFS > client". > > > --- > Orlando > > > > > -- > -- > Dr Orlando Richards > Research Facilities (ECDF) Systems Leader > Information Services > IT Infrastructure Division > Tel: 0131 650 4994 > skype: orlando.richards > > The University of Edinburgh is a charitable body, registered in > Scotland, with registration number SC005336. > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at gpfsug.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at gpfsug.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > -- -- Dr Orlando Richards Research Facilities (ECDF) Systems Leader Information Services IT Infrastructure Division Tel: 0131 650 4994 skype: orlando.richards The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336. From orlando.richards at ed.ac.uk Wed Nov 19 20:56:38 2014 From: orlando.richards at ed.ac.uk (orlando.richards at ed.ac.uk) Date: Wed, 19 Nov 2014 20:56:38 +0000 (GMT) Subject: [gpfsug-discuss] GPFS inside OpenStack guests In-Reply-To: References: <546CC6E0.1010800@ed.ac.uk>, Message-ID: On Wed, 19 Nov 2014, Simon Thompson (Research Computing - IT Services) wrote: > > Yes, what about the random name nature of a vm image? > > > For example I spin up a new vm, how does it join the gpfs cluster to be able to use nsd protocol? I *think* this bit should be solvable - assuming one can pre-define the range of names the node will have, and can pre-populate your gpfs cluster config with these node names. The guest image should then have the full /var/mmfs tree (pulled from another gpfs node), but with the /var/mmfs/gen/mmfsNodeData file removed. When it starts up, it'll figure out "who" it is and regenerate that file, pull the latest cluster config from the primary config server, and start up. > And how about attaching to the netowkrk as neutron networking uses per tenant networks, so how would you actually get access to the gpfs cluster? This bit is where I can see the potential pitfall. OpenStack naturally uses NAT to handle traffic to and from guests - will GPFS cope with nat'ted clients in this way? Fair point on NFS from Alex - but will you get the same multi-threaded performance from NFS compared with GPFS? Also - could you make each hypervisor an NFS server for its guests, thus doing away with the need for CNFS, and removing the potential for the nfs server threads bottlenecking? For instance - if I have 300 worker nodes, and 7 NSD servers - I'd then have 300 NFS servers running, rather than 7 NFS servers. Direct block access to the storage from the hypervisor would also be possible (network configuration permitting), and the NFS traffic would flow only over a "virtual" network within the hypervisor, and so "should" (?) be more efficient. > > Simon > ________________________________________ > From: gpfsug-discuss-bounces at gpfsug.org [gpfsug-discuss-bounces at gpfsug.org] on behalf of Sven Oehme [oehmes at gmail.com] > Sent: 19 November 2014 19:00 > To: gpfsug main discussion list > Subject: Re: [gpfsug-discuss] GPFS inside OpenStack guests > > technically there are multiple ways to do this. > > 1. you can use the NSD protocol for this, just need to have adequate Network resources (or use PCI pass trough of the network adapter to the guest) > 2. you attach the physical disks as virtio block devices > 3. pass trough of the Block HBA (e.g. FC adapter) into the guest. > > if you use virtio you need to make sure all caching is disabled entirely or you end up with major issues and i am not sure about official support for this, 1 and 3 are straight forward ... > > Sven > > > > > > On Wed, Nov 19, 2014 at 8:35 AM, Orlando Richards > wrote: > Hi folks, > > Does anyone have experience of running GPFS inside OpenStack guests, to connect to an existing (traditional, "bare metal") GPFS filesystem owning cluster? > > This is not using GPFS for openstack block/image storage - but using GPFS as a "NAS" service, with openstack guest instances as as a "GPFS client". > > > --- > Orlando > > > > > -- > -- > Dr Orlando Richards > Research Facilities (ECDF) Systems Leader > Information Services > IT Infrastructure Division > Tel: 0131 650 4994 > skype: orlando.richards > > The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336. > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at gpfsug.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at gpfsug.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > -- -- Dr Orlando Richards Research Facilities (ECDF) Systems Leader Information Services IT Infrastructure Division Tel: 0131 650 4994 skype: orlando.richards The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336. From S.J.Thompson at bham.ac.uk Thu Nov 20 00:20:44 2014 From: S.J.Thompson at bham.ac.uk (Simon Thompson (Research Computing - IT Services)) Date: Thu, 20 Nov 2014 00:20:44 +0000 Subject: [gpfsug-discuss] GPFS inside OpenStack guests In-Reply-To: References: <546CC6E0.1010800@ed.ac.uk> Message-ID: On 19/11/2014 14:56, "orlando.richards at ed.ac.uk" wrote: >> >>And how about attaching to the netowkrk as neutron networking uses per >>tenant networks, so how would you actually get access to the gpfs >>cluster? > >This bit is where I can see the potential pitfall. OpenStack naturally >uses NAT to handle traffic to and from guests - will GPFS cope with >nat'ted clients in this way? Well, not necessarily, I was thinking about this and potentially you could create an external shared network which is bound to your GPFS interface, though there?s possible security questions maybe around exposing a real internal network device into a VM. I think there is also a Mellanox driver for the VPI Pro cards which allow you to pass the card through to instances. I can?t remember if that was just acceleration for Ethernet or if it could do IB as well. >Also - could you make each hypervisor an NFS server for its guests, thus >doing away with the need for CNFS, and removing the potential for the nfs >server threads bottlenecking? For instance - if I have 300 worker nodes, >and 7 NSD servers - I'd then have 300 NFS servers running, rather than 7 Would you then not need to have 300 server licenses though? Simon From jonathan at buzzard.me.uk Thu Nov 20 10:03:01 2014 From: jonathan at buzzard.me.uk (Jonathan Buzzard) Date: Thu, 20 Nov 2014 10:03:01 +0000 Subject: [gpfsug-discuss] GPFS inside OpenStack guests In-Reply-To: References: <546CC6E0.1010800@ed.ac.uk> , Message-ID: <1416477781.4171.23.camel@buzzard.phy.strath.ac.uk> On Wed, 2014-11-19 at 20:56 +0000, orlando.richards at ed.ac.uk wrote: > On Wed, 19 Nov 2014, Simon Thompson (Research Computing - IT Services) > wrote: > > > > > Yes, what about the random name nature of a vm image? > > > > > > For example I spin up a new vm, how does it join the gpfs cluster to be able to use nsd protocol? > > > I *think* this bit should be solvable - assuming one can pre-define the > range of names the node will have, and can pre-populate your gpfs cluster > config with these node names. The guest image should then have the full > /var/mmfs tree (pulled from another gpfs node), but with the > /var/mmfs/gen/mmfsNodeData file removed. When it starts up, it'll figure > out "who" it is and regenerate that file, pull the latest cluster config > from the primary config server, and start up. It's perfectly solvable with a bit of scripting and putting the cluster into admin mode central. > > > > > And how about attaching to the netowkrk as neutron networking uses per tenant networks, so how would you actually get access to the gpfs cluster? > > This bit is where I can see the potential pitfall. OpenStack naturally > uses NAT to handle traffic to and from guests - will GPFS cope with > nat'ted clients in this way? Not going to work with NAT. GPFS has some "funny" ideas about networking, but to put it succinctly all the nodes have to be on the same class A, B or C network. Though it considers every address in a class A network to be on the same network even though you may have divided it up internally into different networks. Consequently the network model in GPFS is broken. You would need to use bridged mode aka FlatNetworking in OpenStacks for this to work, but surely Jan knows all this. JAB. -- Jonathan A. Buzzard Email: jonathan (at) buzzard.me.uk Fife, United Kingdom. From janfrode at tanso.net Fri Nov 21 19:35:48 2014 From: janfrode at tanso.net (Jan-Frode Myklebust) Date: Fri, 21 Nov 2014 20:35:48 +0100 Subject: [gpfsug-discuss] Gathering node/fs statistics ? Message-ID: <20141121193548.GA11920@mushkin.tanso.net> I'm considering writing a Performance CoPilot agent (PMDA, Performance Metrics Domain Agent) for GPFS, and would like to collect all/most of the metrics that are already available in the gpfs SNMP agent -- ideally without using SNMP.. So, could someone help me with where to find GPFS performance data? I've noticed "mmfsadm" has a "resetstats" option, but what are these stats / where can I find them? All in mmpmon? Also the GPFS-MIB.txt seems to point at some commands I'm unfamiliar with: -- all other node data from EE "get nodes" command -- Status info from EE "get fs -b" command -- Performance data from mmpmon "gfis" command -- Storage pool table comes from EE "get pools" command -- Storage pool data comes from SDR and EE "get pools" command -- Disk data from EE "get fs" command -- Disk performance data from mmpmon "ds" command: -- From mmpmon nc: Any idea what 'EE "get nodes"' is? And what do they mean by 'mmpmon "gfis"', "nc" or "ds"? These commands doesn't work when fed to mmpmon.. -jf From oehmes at gmail.com Fri Nov 21 20:15:16 2014 From: oehmes at gmail.com (Sven Oehme) Date: Fri, 21 Nov 2014 12:15:16 -0800 Subject: [gpfsug-discuss] Gathering node/fs statistics ? In-Reply-To: <20141121193548.GA11920@mushkin.tanso.net> References: <20141121193548.GA11920@mushkin.tanso.net> Message-ID: Hi, you should take a look at the following 3 links : my performance talk about GPFS , take a look at the dstat plugin mentioned in the charts : http://www.gpfsug.org/wp-content/uploads/2014/05/UG10_GPFS_Performance_Session_v10.pdf documentation about the mmpmon interface and use in GPFS : http://www-01.ibm.com/support/knowledgecenter/SSFKCN_4.1.0/com.ibm.cluster.gpfs.v4r1.gpfs200.doc/bl1adv_mmpmonch.htm documentation about GSS/ESS/GNR in case you care about this as well and its additional mmpmon commands : http://www-01.ibm.com/support/knowledgecenter/SSFKCN/bl1du14a.pdf Sven On Fri, Nov 21, 2014 at 11:35 AM, Jan-Frode Myklebust wrote: > I'm considering writing a Performance CoPilot agent (PMDA, Performance > Metrics Domain Agent) for GPFS, and would like to collect all/most of > the metrics that are already available in the gpfs SNMP agent -- ideally > without using SNMP.. > > So, could someone help me with where to find GPFS performance data? I've > noticed "mmfsadm" has a "resetstats" option, but what are these stats / > where can I find them? All in mmpmon? > > Also the GPFS-MIB.txt seems to point at some commands I'm unfamiliar > with: > > -- all other node data from EE "get nodes" command > -- Status info from EE "get fs -b" command > -- Performance data from mmpmon "gfis" command > -- Storage pool table comes from EE "get pools" command > -- Storage pool data comes from SDR and EE "get pools" command > -- Disk data from EE "get fs" command > -- Disk performance data from mmpmon "ds" command: > -- From mmpmon nc: > > > Any idea what 'EE "get nodes"' is? And what do they mean by 'mmpmon > "gfis"', "nc" or "ds"? These commands doesn't work when fed to mmpmon.. > > > > -jf > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at gpfsug.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: From oester at gmail.com Fri Nov 21 20:29:05 2014 From: oester at gmail.com (Bob Oesterlin) Date: Fri, 21 Nov 2014 14:29:05 -0600 Subject: [gpfsug-discuss] Gathering node/fs statistics ? In-Reply-To: <20141121193548.GA11920@mushkin.tanso.net> References: <20141121193548.GA11920@mushkin.tanso.net> Message-ID: You might want to look at Arxview, www.arxscan.com. I've been working with them and they have good GPFS and Storage monitoring based on mmpmon. Lightweight too. Bob Oesterlin Nuance Communications On Friday, November 21, 2014, Jan-Frode Myklebust wrote: > I'm considering writing a Performance CoPilot agent (PMDA, Performance > Metrics Domain Agent) for GPFS, and would like to collect all/most of > the metrics that are already available in the gpfs SNMP agent -- ideally > without using SNMP.. > > So, could someone help me with where to find GPFS performance data? I've > noticed "mmfsadm" has a "resetstats" option, but what are these stats / > where can I find them? All in mmpmon? > > Also the GPFS-MIB.txt seems to point at some commands I'm unfamiliar > with: > > -- all other node data from EE "get nodes" command > -- Status info from EE "get fs -b" command > -- Performance data from mmpmon "gfis" command > -- Storage pool table comes from EE "get pools" command > -- Storage pool data comes from SDR and EE "get pools" command > -- Disk data from EE "get fs" command > -- Disk performance data from mmpmon "ds" command: > -- From mmpmon nc: > > > Any idea what 'EE "get nodes"' is? And what do they mean by 'mmpmon > "gfis"', "nc" or "ds"? These commands doesn't work when fed to mmpmon.. > > > > -jf > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at gpfsug.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -- Bob Oesterlin -------------- next part -------------- An HTML attachment was scrubbed... URL: From sabujp at gmail.com Fri Nov 21 22:50:02 2014 From: sabujp at gmail.com (Sabuj Pattanayek) Date: Fri, 21 Nov 2014 16:50:02 -0600 Subject: [gpfsug-discuss] any difference with the filespace view mmbackup sees from a global snapshot vs a snapshot on -j root with only 1 independent fileset (root)? Message-ID: Hi all, We're running 3.5.0.19 . Is there any difference in terms of the view of the filespace that mmbackup sees and then passes to TSM if we run mmbackup against a global snapshot vs a snapshot on -j root if we only have and ever plan on having one independent fileset (root)? It doesn't look like it to me just from ls, but just verifying. We want to get away from using a global snapshot if possible (and start using -j root snapshots) instead because for some reason it looks like it takes much much longer to run mmdelsnapshot on a global snapshot vs a snapshot on the root fileset. Thanks, Sabuj -------------- next part -------------- An HTML attachment was scrubbed... URL: From stijn.deweirdt at ugent.be Mon Nov 24 21:22:19 2014 From: stijn.deweirdt at ugent.be (Stijn De Weirdt) Date: Mon, 24 Nov 2014 22:22:19 +0100 Subject: [gpfsug-discuss] restripe or not Message-ID: <5473A18B.7000702@ugent.be> hi all, we are going to expand an existing filestytem with approx 50% capacity. the current filesystem is 75% full. we are in downtime (for more then just this reason), so we can take the IO rebalance hit for a while (say max 48hours). my questions: a. do we really need to rebalance? the mmadddisk page suggest normally it's going to be ok, but i never understood that. new data will end up mainly on new disks, so wrt to performance, this can't really work out, can it? b. can we change the priority of rebalancing somehow (fewer nodes taking part in the rebalance?) c. once we start the rebalance, how save is it to stop with kill or ctrl-c (or can we say eg. rebalance 25% now, rest later?) (and how often can we do this? eg a daily cron job to restripe at max one hour per day, would this cause issue in the long term many thanks, stijn From zgiles at gmail.com Mon Nov 24 23:14:21 2014 From: zgiles at gmail.com (Zachary Giles) Date: Mon, 24 Nov 2014 18:14:21 -0500 Subject: [gpfsug-discuss] restripe or not In-Reply-To: <5473A18B.7000702@ugent.be> References: <5473A18B.7000702@ugent.be> Message-ID: Interesting question.. Just some thoughts: Not an expert on restriping myself: * Your new storage -- is it the same size, shape, speed as the old storage? If not, then are you going to add it to the same storage pool, or an additional storage pool? If additional, restripe is not needed, as you can't / don't need to restripe across storage pools, the data will be in one or the other. However, you of course will need to make a policy to place data correctly. Of course, if you're going to double your storage and all your new data will be written to the new disks, then you may be leaving quite a bit of capacity on the floor. * mmadddisk man page and normal balancing -- yes, we've seen this suggestion as well -- that is, that new data will generally fill across the cluster and eventually fill in the gaps. We didn't restripe on a much smaller storage pool and it eventually did balance out, however, it was also a "tier 1" where data is migrated out often. If I were doubling my primary storage with more of the exact same disks, I'd probably restripe. * Stopping a restripe -- I'm """ Pretty Sure """ you can stop a restripe safely with a Ctrl-C. I'm """ Pretty Sure """ we've done that a few times ourselves with no problem. I remember I was going to restripe something but the estimates were too high and so I stopped it. I'd feel fairly confident in doing it, but I don't want to take responsibility for your storage. :) :) I don't think there's a need to restripe every hour or anything. If you're generally balanced at one point, you'd probably continue to be under normal operation. On Mon, Nov 24, 2014 at 4:22 PM, Stijn De Weirdt wrote: > hi all, > > we are going to expand an existing filestytem with approx 50% capacity. > the current filesystem is 75% full. > > we are in downtime (for more then just this reason), so we can take the IO > rebalance hit for a while (say max 48hours). > > my questions: > a. do we really need to rebalance? the mmadddisk page suggest normally > it's going to be ok, but i never understood that. new data will end up > mainly on new disks, so wrt to performance, this can't really work out, can > it? > b. can we change the priority of rebalancing somehow (fewer nodes taking > part in the rebalance?) > c. once we start the rebalance, how save is it to stop with kill or ctrl-c > (or can we say eg. rebalance 25% now, rest later?) > (and how often can we do this? eg a daily cron job to restripe at max one > hour per day, would this cause issue in the long term > > > many thanks, > > stijn > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at gpfsug.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -- Zach Giles zgiles at gmail.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From oester at gmail.com Tue Nov 25 02:01:06 2014 From: oester at gmail.com (Bob Oesterlin) Date: Mon, 24 Nov 2014 20:01:06 -0600 Subject: [gpfsug-discuss] restripe or not In-Reply-To: <5473A18B.7000702@ugent.be> References: <5473A18B.7000702@ugent.be> Message-ID: In general, the need to restripe after a disk add is dependent on a number of factors, as has been pointed out.. A couple of other thoughts/suggestions: - One thing you might consider (depending on your pattern of read/write traffic), is selectively suspending one or more of the existing NSDs, forcing GPFS to write new blocks to the new NSDs. That way at least some of the new data is being written to the new storage by default, rather than using up blocks on the existing NSDs. You can suspend/resume disks at any time. - You can pick a subset of nodes to perform the restripe with "mmrestripefs -N node1,node2,..." Keep in mind you'll get much better performance and less impact to the filesystem if you choose NSD servers with direct access to the disk. - Resume of restripe: Yes, you can do this, no harm, done it many times. You can track the balance of the disks using "mmdf ". This is a pretty intensive command, so I wouldn't run in frequently. Check it a few times each day, see if the data balance is improving by itself. When you stop/restart it, the restripe doesn't pick up exactly where it left off, it's going to scan the entire file system again. - You can also restripe single files if the are large and get a heavy I/O (mmrestripefile) Bob Oesterlin Nuance Communications On Mon, Nov 24, 2014 at 3:22 PM, Stijn De Weirdt wrote: > hi all, > > we are going to expand an existing filestytem with approx 50% capacity. > the current filesystem is 75% full. > > we are in downtime (for more then just this reason), so we can take the IO > rebalance hit for a while (say max 48hours). > > my questions: > a. do we really need to rebalance? the mmadddisk page suggest normally > it's going to be ok, but i never understood that. new data will end up > mainly on new disks, so wrt to performance, this can't really work out, can > it? > b. can we change the priority of rebalancing somehow (fewer nodes taking > part in the rebalance?) > c. once we start the rebalance, how save is it to stop with kill or ctrl-c > (or can we say eg. rebalance 25% now, rest later?) > (and how often can we do this? eg a daily cron job to restripe at max one > hour per day, would this cause issue in the long term > > > many thanks, > > stijn > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at gpfsug.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: From stijn.deweirdt at ugent.be Tue Nov 25 07:17:56 2014 From: stijn.deweirdt at ugent.be (Stijn De Weirdt) Date: Tue, 25 Nov 2014 08:17:56 +0100 Subject: [gpfsug-discuss] restripe or not In-Reply-To: References: <5473A18B.7000702@ugent.be> Message-ID: <54742D24.8090602@ugent.be> hi zachary, > * Your new storage -- is it the same size, shape, speed as the old storage? yes. we created and used it as "test" filesystem on the same hardware when we started. now we are shrinking the test filesystem and adding the free disks to the production one. > If not, then are you going to add it to the same storage pool, or an > additional storage pool? If additional, restripe is not needed, as you > can't / don't need to restripe across storage pools, the data will be in > one or the other. However, you of course will need to make a policy to > place data correctly. sure, but in this case, they end up in teh same pool. > Of course, if you're going to double your storage and all your new data > will be written to the new disks, then you may be leaving quite a bit of > capacity on the floor. > > * mmadddisk man page and normal balancing -- yes, we've seen this > suggestion as well -- that is, that new data will generally fill across the > cluster and eventually fill in the gaps. We didn't restripe on a much > smaller storage pool and it eventually did balance out, however, it was > also a "tier 1" where data is migrated out often. If I were doubling my > primary storage with more of the exact same disks, I'd probably restripe. more then half of the data on the current filesystem is more or less static (we expect it to stay there 2-3 year unmodified). similar data will be added in the near future. > > * Stopping a restripe -- I'm """ Pretty Sure """ you can stop a restripe > safely with a Ctrl-C. I'm """ Pretty Sure """ we've done that a few times > ourselves with no problem. I remember I was going to restripe something but > the estimates were too high and so I stopped it. I'd feel fairly confident > in doing it, but I don't want to take responsibility for your storage. :) yeah, i've also remember cancelling a restripe and i'm pretty sure it ddin't cause problems (i would certainly remember the problems ;) i'm looking for some further confirmation (or e.g. a reference to some docuemnt that says so. i vaguely remember sven(?) saying this on the lodon gpfs user day this year. > :) I don't think there's a need to restripe every hour or anything. If > you're generally balanced at one point, you'd probably continue to be under > normal operation. i was thinking to spread the total restripe over one or 2 hour periods each days the coming week(s); but i'm now realising this might not be the best idea, because it will rebalance any new data as well, slowing down the bulk rebalancing. anyway, thanks for the feedback. i'll probably let the rebalance run for 48 hours and see how far it got by that time. stijn > > > > > > On Mon, Nov 24, 2014 at 4:22 PM, Stijn De Weirdt > wrote: > >> hi all, >> >> we are going to expand an existing filestytem with approx 50% capacity. >> the current filesystem is 75% full. >> >> we are in downtime (for more then just this reason), so we can take the IO >> rebalance hit for a while (say max 48hours). >> >> my questions: >> a. do we really need to rebalance? the mmadddisk page suggest normally >> it's going to be ok, but i never understood that. new data will end up >> mainly on new disks, so wrt to performance, this can't really work out, can >> it? >> b. can we change the priority of rebalancing somehow (fewer nodes taking >> part in the rebalance?) >> c. once we start the rebalance, how save is it to stop with kill or ctrl-c >> (or can we say eg. rebalance 25% now, rest later?) >> (and how often can we do this? eg a daily cron job to restripe at max one >> hour per day, would this cause issue in the long term >> >> >> many thanks, >> >> stijn >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at gpfsug.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >> > > > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at gpfsug.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > From stijn.deweirdt at ugent.be Tue Nov 25 07:23:41 2014 From: stijn.deweirdt at ugent.be (Stijn De Weirdt) Date: Tue, 25 Nov 2014 08:23:41 +0100 Subject: [gpfsug-discuss] restripe or not In-Reply-To: References: <5473A18B.7000702@ugent.be> Message-ID: <54742E7D.7090009@ugent.be> hi bob, > - One thing you might consider (depending on your pattern of read/write > traffic), is selectively suspending one or more of the existing NSDs, > forcing GPFS to write new blocks to the new NSDs. That way at least some of > the new data is being written to the new storage by default, rather than > using up blocks on the existing NSDs. You can suspend/resume disks at any > time. is the gpfs placment weighted with the avalaible volume? i'd rather not make this a manual operation. > > - You can pick a subset of nodes to perform the restripe with "mmrestripefs > -N node1,node2,..." Keep in mind you'll get much better performance and > less impact to the filesystem if you choose NSD servers with direct access > to the disk. yes and i no i guess, our nsds see all disks, but the problem with nsds is that they don't honour any roles (our primary nsds have the preferred path to the controller and lun, meaning all access from non-primary nsd to that disk is suboptimal). > > - Resume of restripe: Yes, you can do this, no harm, done it many times. > You can track the balance of the disks using "mmdf ". This is a > pretty intensive command, so I wouldn't run in frequently. Check it a few > times each day, see if the data balance is improving by itself. When you thanks for the tip to monitor it with mmdf! > stop/restart it, the restripe doesn't pick up exactly where it left off, > it's going to scan the entire file system again. yeah, i realised that this is a flaw in my "one-hour a day" restripe idea ;) > > - You can also restripe single files if the are large and get a heavy I/O > (mmrestripefile) excellent tip! forgot about that one. if the rebalnce is to slow, i can run this on the static data. thanks a lot for the feedback stijn > > Bob Oesterlin > Nuance Communications > > > On Mon, Nov 24, 2014 at 3:22 PM, Stijn De Weirdt > wrote: > >> hi all, >> >> we are going to expand an existing filestytem with approx 50% capacity. >> the current filesystem is 75% full. >> >> we are in downtime (for more then just this reason), so we can take the IO >> rebalance hit for a while (say max 48hours). >> >> my questions: >> a. do we really need to rebalance? the mmadddisk page suggest normally >> it's going to be ok, but i never understood that. new data will end up >> mainly on new disks, so wrt to performance, this can't really work out, can >> it? >> b. can we change the priority of rebalancing somehow (fewer nodes taking >> part in the rebalance?) >> c. once we start the rebalance, how save is it to stop with kill or ctrl-c >> (or can we say eg. rebalance 25% now, rest later?) >> (and how often can we do this? eg a daily cron job to restripe at max one >> hour per day, would this cause issue in the long term >> >> >> many thanks, >> >> stijn >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at gpfsug.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >> > From L.A.Hurst at bham.ac.uk Tue Nov 25 10:45:51 2014 From: L.A.Hurst at bham.ac.uk (Laurence Alexander Hurst) Date: Tue, 25 Nov 2014 10:45:51 +0000 Subject: [gpfsug-discuss] Quotas and non-existant users Message-ID: Hi all, We have noticed that once users are deleted (gone entirely from the passwd backend) and all their files removed, their uid is still reported by GPFS? quota tools (albeit with zero files and space usage). Does anyone know if there is a way to clear out these spurious entries or is it a case that once a uid is in the quota database its there forever regardless of if that uid is still in use and has quota to record? Many Thanks, Laurence From ewahl at osc.edu Tue Nov 25 13:52:55 2014 From: ewahl at osc.edu (Wahl, Edward) Date: Tue, 25 Nov 2014 13:52:55 +0000 Subject: [gpfsug-discuss] Quotas and non-existant users In-Reply-To: References: Message-ID: <1416923575.2343.18.camel@localhost.localdomain> Do you still have policies or filesets associated with these users? Ed Wahl OSC On Tue, 2014-11-25 at 10:45 +0000, Laurence Alexander Hurst wrote: > Hi all, > > We have noticed that once users are deleted (gone entirely from the passwd backend) and all their files removed, their uid is still reported by GPFS? quota tools (albeit with zero files and space usage). > > Does anyone know if there is a way to clear out these spurious entries or is it a case that once a uid is in the quota database its there forever regardless of if that uid is still in use and has quota to record? > > Many Thanks, > > Laurence > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at gpfsug.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss From jonathan at buzzard.me.uk Tue Nov 25 14:00:29 2014 From: jonathan at buzzard.me.uk (Jonathan Buzzard) Date: Tue, 25 Nov 2014 14:00:29 +0000 Subject: [gpfsug-discuss] Quotas and non-existant users In-Reply-To: References: Message-ID: <1416924029.4171.69.camel@buzzard.phy.strath.ac.uk> On Tue, 2014-11-25 at 10:45 +0000, Laurence Alexander Hurst wrote: > Hi all, > > We have noticed that once users are deleted (gone entirely from the > passwd backend) and all their files removed, their uid is still > reported by GPFS? quota tools (albeit with zero files and space usage). > There is something somewhere that references them, because they do disappear. I know because I cleared out a GPFS file system that had files and directories used by "depreciated" user and group names, and the check I was using to make sure I had got everything belonging to a particular user or group was mmrepquota. JAB. -- Jonathan A. Buzzard Email: jonathan (at) buzzard.me.uk Fife, United Kingdom. From stijn.deweirdt at ugent.be Tue Nov 25 16:25:58 2014 From: stijn.deweirdt at ugent.be (Stijn De Weirdt) Date: Tue, 25 Nov 2014 17:25:58 +0100 Subject: [gpfsug-discuss] gpfs.gnr updates Message-ID: <5474AD96.3050006@ugent.be> hi all, does anyone know where we can find the release notes and update rpms for gpfs.gnr? we logged a case with ibm a while ago, and we assumed that the fix for the issue was part of the regular gpfs updates (we assumed as much from the conversation with ibm tech support). many thanks, stijn From L.A.Hurst at bham.ac.uk Wed Nov 26 10:14:26 2014 From: L.A.Hurst at bham.ac.uk (Laurence Alexander Hurst) Date: Wed, 26 Nov 2014 10:14:26 +0000 Subject: [gpfsug-discuss] Quotas and non-existant users In-Reply-To: <1416924029.4171.69.camel@buzzard.phy.strath.ac.uk> References: <1416924029.4171.69.camel@buzzard.phy.strath.ac.uk> Message-ID: Hmm, mmrepquota is reporting no files owned by any of the users in question. I?ll see if `find` disagrees. They have the default fileset user quotas applied, so they?re not users we?ve edited to grant quota extensions to. We have had a problem (which IBM have acknowledged, iirc) whereby it is not possible to reset a user?s quota back to the default if it has been modified, perhaps this is related? I?ll see if `find` turns anything up or I?ll raise a ticket with IBM and see what they think. I?ve pulled out a single example, but all 75 users I have are the same. mmrepquota gpfs | grep 8695 8695 nbu USR 0 0 5368709120 0 none | 0 0 0 0 none 8695 bb USR 0 0 1073741824 0 none | 0 0 0 0 none Thanks for your input. Laurence On 25/11/2014 14:00, "Jonathan Buzzard" wrote: >On Tue, 2014-11-25 at 10:45 +0000, Laurence Alexander Hurst wrote: >> Hi all, >> >> We have noticed that once users are deleted (gone entirely from the >> passwd backend) and all their files removed, their uid is still >> reported by GPFS? quota tools (albeit with zero files and space usage). >> > >There is something somewhere that references them, because they do >disappear. I know because I cleared out a GPFS file system that had >files and directories used by "depreciated" user and group names, and >the check I was using to make sure I had got everything belonging to a >particular user or group was mmrepquota. > >JAB. > >-- >Jonathan A. Buzzard Email: jonathan (at) buzzard.me.uk >Fife, United Kingdom. > > >_______________________________________________ >gpfsug-discuss mailing list >gpfsug-discuss at gpfsug.org >http://gpfsug.org/mailman/listinfo/gpfsug-discuss From chair at gpfsug.org Thu Nov 27 09:21:30 2014 From: chair at gpfsug.org (Jez Tucker (Chair)) Date: Thu, 27 Nov 2014 09:21:30 +0000 Subject: [gpfsug-discuss] gpfs.gnr updates In-Reply-To: <5474AD96.3050006@ugent.be> References: <5474AD96.3050006@ugent.be> Message-ID: <5476ED1A.8050504@gpfsug.org> Hi Stijn, As far as I am aware, GNR updates are not publicly available for download. You should approach your reseller / IBM Business partner who should be able to supply you with the updates. IBMers, please feel free to correct this statement if in error. Jez On 25/11/14 16:25, Stijn De Weirdt wrote: > hi all, > > does anyone know where we can find the release notes and update rpms > for gpfs.gnr? > we logged a case with ibm a while ago, and we assumed that the fix for > the issue was part of the regular gpfs updates (we assumed as much > from the conversation with ibm tech support). > > many thanks, > > stijn > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at gpfsug.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss From jonathan at buzzard.me.uk Thu Nov 27 09:47:59 2014 From: jonathan at buzzard.me.uk (Jonathan Buzzard) Date: Thu, 27 Nov 2014 09:47:59 +0000 Subject: [gpfsug-discuss] Quotas and non-existant users In-Reply-To: References: <1416924029.4171.69.camel@buzzard.phy.strath.ac.uk> Message-ID: <1417081679.4171.99.camel@buzzard.phy.strath.ac.uk> On Wed, 2014-11-26 at 10:14 +0000, Laurence Alexander Hurst wrote: > Hmm, mmrepquota is reporting no files owned by any of the users in > question. I?ll see if `find` disagrees. > They have the default fileset > user quotas applied, so they?re not users we?ve edited to grant quota > extensions to. We have had a problem (which IBM have acknowledged, iirc) > whereby it is not possible to reset a user?s quota back to the default if > it has been modified, perhaps this is related? I?ll see if `find` turns > anything up or I?ll raise a ticket with IBM and see what they think. > > I?ve pulled out a single example, but all 75 users I have are the same. > > mmrepquota gpfs | grep 8695 > 8695 nbu USR 0 0 5368709120 0 > none | 0 0 0 0 none > 8695 bb USR 0 0 1073741824 0 > none | 0 0 0 0 none > While the number of files and usage is zero look at those "in doubt" numbers. Until these also fall to zero then the users are not going to disappear from the quota reporting would be my guess. Quite why the "in doubt" numbers are still so large is another question. I have vague recollections of this happening to me when I deleted large amounts of data belonging to a user down to zero when I was clearing the file system up I mentioned before. Though to be honest most of my clearing up was identifying who the files really belonged to (there had in the distance past been a change of usernames; gone from local usernames to using the university wide ones and not everyone had claimed their files. All related to a move to using Active Directory) and doing chown's on the data. I think what happens is when the file number goes to zero the quota system stops updating for that user and if there is anything "in doubt" it never gets updated and sticks around forever. Might be worth creating a couple of files for the user in the appropriate filesets and then give it a bit of time and see if the output of mmrepquota matches what you believe is the real case. If this works and the "in doubt" number goes to zero I would at this point do a chown to a different user that is not going away and then delete the files. Something else to consider is that they might be in an ACL somewhere which is confusing the quota system. JAB. -- Jonathan A. Buzzard Email: jonathan (at) buzzard.me.uk Fife, United Kingdom. From peserocka at gmail.com Thu Nov 27 10:01:55 2014 From: peserocka at gmail.com (P Serocka) Date: Thu, 27 Nov 2014 18:01:55 +0800 Subject: [gpfsug-discuss] Quotas and non-existant users In-Reply-To: <1417081679.4171.99.camel@buzzard.phy.strath.ac.uk> References: <1416924029.4171.69.camel@buzzard.phy.strath.ac.uk> <1417081679.4171.99.camel@buzzard.phy.strath.ac.uk> Message-ID: Any chance to run mmcheckquota? which should remove all "doubt"... On 2014 Nov 27. md, at 17:47 st, Jonathan Buzzard wrote: > On Wed, 2014-11-26 at 10:14 +0000, Laurence Alexander Hurst wrote: >> Hmm, mmrepquota is reporting no files owned by any of the users in >> question. I?ll see if `find` disagrees. >> They have the default fileset >> user quotas applied, so they?re not users we?ve edited to grant quota >> extensions to. We have had a problem (which IBM have acknowledged, iirc) >> whereby it is not possible to reset a user?s quota back to the default if >> it has been modified, perhaps this is related? I?ll see if `find` turns >> anything up or I?ll raise a ticket with IBM and see what they think. >> >> I?ve pulled out a single example, but all 75 users I have are the same. >> >> mmrepquota gpfs | grep 8695 >> 8695 nbu USR 0 0 5368709120 0 >> none | 0 0 0 0 none >> 8695 bb USR 0 0 1073741824 0 >> none | 0 0 0 0 none >> > > While the number of files and usage is zero look at those "in doubt" > numbers. Until these also fall to zero then the users are not going to > disappear from the quota reporting would be my guess. Quite why the "in > doubt" numbers are still so large is another question. I have vague > recollections of this happening to me when I deleted large amounts of > data belonging to a user down to zero when I was clearing the file > system up I mentioned before. Though to be honest most of my clearing up > was identifying who the files really belonged to (there had in the > distance past been a change of usernames; gone from local usernames to > using the university wide ones and not everyone had claimed their files. > All related to a move to using Active Directory) and doing chown's on > the data. > > I think what happens is when the file number goes to zero the quota > system stops updating for that user and if there is anything "in doubt" > it never gets updated and sticks around forever. > > Might be worth creating a couple of files for the user in the > appropriate filesets and then give it a bit of time and see if the > output of mmrepquota matches what you believe is the real case. If this > works and the "in doubt" number goes to zero I would at this point do a > chown to a different user that is not going away and then delete the > files. > > Something else to consider is that they might be in an ACL somewhere > which is confusing the quota system. > > > JAB. > > -- > Jonathan A. Buzzard Email: jonathan (at) buzzard.me.uk > Fife, United Kingdom. > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at gpfsug.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss From jonathan at buzzard.me.uk Thu Nov 27 10:02:03 2014 From: jonathan at buzzard.me.uk (Jonathan Buzzard) Date: Thu, 27 Nov 2014 10:02:03 +0000 Subject: [gpfsug-discuss] Quotas and non-existant users In-Reply-To: <1417081679.4171.99.camel@buzzard.phy.strath.ac.uk> References: <1416924029.4171.69.camel@buzzard.phy.strath.ac.uk> <1417081679.4171.99.camel@buzzard.phy.strath.ac.uk> Message-ID: <1417082523.4171.104.camel@buzzard.phy.strath.ac.uk> On Thu, 2014-11-27 at 09:47 +0000, Jonathan Buzzard wrote: > On Wed, 2014-11-26 at 10:14 +0000, Laurence Alexander Hurst wrote: > > Hmm, mmrepquota is reporting no files owned by any of the users in > > question. I?ll see if `find` disagrees. > > They have the default fileset > > user quotas applied, so they?re not users we?ve edited to grant quota > > extensions to. We have had a problem (which IBM have acknowledged, iirc) > > whereby it is not possible to reset a user?s quota back to the default if > > it has been modified, perhaps this is related? I?ll see if `find` turns > > anything up or I?ll raise a ticket with IBM and see what they think. > > > > I?ve pulled out a single example, but all 75 users I have are the same. > > > > mmrepquota gpfs | grep 8695 > > 8695 nbu USR 0 0 5368709120 0 > > none | 0 0 0 0 none > > 8695 bb USR 0 0 1073741824 0 > > none | 0 0 0 0 none > > > > While the number of files and usage is zero look at those "in doubt" > numbers. Ignore that those are quota numbers. Hard when the column headings are missing. Anyway a "Homer Simpson" momentum coming up... Simple answer really remove the quotas for those users in those file sets (I am presuming they are per fileset user hard limits). They are sticking around in mmrepquota because they have a hard limit set. D'oh! JAB. -- Jonathan A. Buzzard Email: jonathan (at) buzzard.me.uk Fife, United Kingdom. From peserocka at gmail.com Thu Nov 27 10:06:31 2014 From: peserocka at gmail.com (P Serocka) Date: Thu, 27 Nov 2014 18:06:31 +0800 Subject: [gpfsug-discuss] Quotas and non-existant users In-Reply-To: <1417082523.4171.104.camel@buzzard.phy.strath.ac.uk> References: <1416924029.4171.69.camel@buzzard.phy.strath.ac.uk> <1417081679.4171.99.camel@buzzard.phy.strath.ac.uk> <1417082523.4171.104.camel@buzzard.phy.strath.ac.uk> Message-ID: <44A03A01-4010-4210-8892-2AE37451EEFA@gmail.com> ;-) Ignore my other message on mmcheckquota then. On 2014 Nov 27. md, at 18:02 st, Jonathan Buzzard wrote: > On Thu, 2014-11-27 at 09:47 +0000, Jonathan Buzzard wrote: >> On Wed, 2014-11-26 at 10:14 +0000, Laurence Alexander Hurst wrote: >>> Hmm, mmrepquota is reporting no files owned by any of the users in >>> question. I?ll see if `find` disagrees. >>> They have the default fileset >>> user quotas applied, so they?re not users we?ve edited to grant quota >>> extensions to. We have had a problem (which IBM have acknowledged, iirc) >>> whereby it is not possible to reset a user?s quota back to the default if >>> it has been modified, perhaps this is related? I?ll see if `find` turns >>> anything up or I?ll raise a ticket with IBM and see what they think. >>> >>> I?ve pulled out a single example, but all 75 users I have are the same. >>> >>> mmrepquota gpfs | grep 8695 >>> 8695 nbu USR 0 0 5368709120 0 >>> none | 0 0 0 0 none >>> 8695 bb USR 0 0 1073741824 0 >>> none | 0 0 0 0 none >>> >> >> While the number of files and usage is zero look at those "in doubt" >> numbers. > > Ignore that those are quota numbers. Hard when the column headings are > missing. > > Anyway a "Homer Simpson" momentum coming up... > > Simple answer really remove the quotas for those users in those file > sets (I am presuming they are per fileset user hard limits). They are > sticking around in mmrepquota because they have a hard limit set. D'oh! > > JAB. > > -- > Jonathan A. Buzzard Email: jonathan (at) buzzard.me.uk > Fife, United Kingdom. > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at gpfsug.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss