From secretary at gpfsug.org Wed Jul 1 09:00:51 2015 From: secretary at gpfsug.org (Secretary GPFS UG) Date: Wed, 01 Jul 2015 09:00:51 +0100 Subject: [gpfsug-discuss] Meet the Developers Message-ID: Dear All, We are planning the next 'Meet the Devs' event for Wednesday 29th July, 11am-3pm. Depending on interest, we are looking to hold in either Manchester or Warwick. The agenda promises to be hands on and give you the opportunity to speak face to face with the developers of GPFS. Guideline agenda: * Data analytic workloads - development to show and tell UK work on establishing use cases and tighter integration of Spark on top of GPFS * Show the GUI coming in 4.2 * Discuss 4.2 and beyond roadmap * How would you like IP management to work for protocol access? * Optional - Team can demo & discuss NFS/SMB/Object integration into Scale Lunch and refreshments will be provided. Please can you let me know by email if you are interested in attending along with your preferred venue by Friday 10th July. Thanks and we hope to see you there! -- Claire O'Toole (n?e Robson) GPFS User Group Secretary +44 (0)7508 033896 From chair at gpfsug.org Wed Jul 1 09:21:03 2015 From: chair at gpfsug.org (GPFS UG Chair) Date: Wed, 1 Jul 2015 09:21:03 +0100 Subject: [gpfsug-discuss] mailing list change Message-ID: Hi All, We've made a change to the mailing list so that only subscribers are now able to post to the list. We've done this as we've been getting a *lot* of spam held for moderation from non-members and the occasional legitimate post was getting lost in the spam. If you or colleagues routinely post from a different address from that subscribed to the list, you'll now need to be subscribed (you'll get an error back from the list when you try to post). As its a mailman list, if you do want to have multiple addresses subscribed, you can of course disable the address from the mailman interface from receiving posts. Simon -------------- next part -------------- An HTML attachment was scrubbed... URL: From oehmes at us.ibm.com Wed Jul 1 15:21:29 2015 From: oehmes at us.ibm.com (Sven Oehme) Date: Wed, 1 Jul 2015 07:21:29 -0700 Subject: [gpfsug-discuss] GPFS 4.1.1 without QoS for mmrestripefs? In-Reply-To: <2CDF270206A255459AC4FA6B08E52AF90114634DD0@ABCSYSEXC1.abcsystems.ch> References: <2CDF270206A255459AC4FA6B08E52AF90114634DD0@ABCSYSEXC1.abcsystems.ch> Message-ID: <201507011422.t61EMZmw011626@d01av01.pok.ibm.com> Daniel, as you know, we can't discuss future / confidential items on a mailing list. what i presented as an outlook to future releases hasn't changed from a technical standpoint, we just can't share a release date until we announce it official. there are multiple ways today to limit the impact on restripe and other tasks, the best way to do this is to run the task ( using -N) on a node (or very small number of nodes) that has no performance critical role. while this is not perfect, it should limit the impact significantly. . sven ------------------------------------------ Sven Oehme Scalable Storage Research email: oehmes at us.ibm.com Phone: +1 (408) 824-8904 IBM Almaden Research Lab ------------------------------------------ From: Daniel Vogel To: "'gpfsug-discuss at gpfsug.org'" Date: 07/01/2015 03:29 AM Subject: [gpfsug-discuss] GPFS 4.1.1 without QoS for mmrestripefs? Sent by: gpfsug-discuss-bounces at gpfsug.org Hi Years ago, IBM made some plan to do a implementation ?QoS for mmrestripefs, mmdeldisk??. If a ?mmfsrestripe? is running, very poor performance for NFS access. I opened a PMR to ask for QoS in version 4.1.1 (Spectrum Scale). PMR 61309,113,848: I discussed the question of QOS with the development team. These command changes that were noticed are not meant to be used as GA code which is why they are not documented. I cannot provide any further information from the support perspective. Anybody knows about QoS? The last hope was at ?GPFS Workshop Stuttgart M?rz 2015? with Sven Oehme as speaker. Daniel Vogel IT Consultant ABC SYSTEMS AG Hauptsitz Z?rich R?tistrasse 28 CH - 8952 Schlieren T +41 43 433 6 433 D +41 43 433 6 467 http://www.abcsystems.ch ABC - Always Better Concepts. Approved By Customers since 1981. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: From Kevin.Buterbaugh at Vanderbilt.Edu Wed Jul 1 15:32:50 2015 From: Kevin.Buterbaugh at Vanderbilt.Edu (Buterbaugh, Kevin L) Date: Wed, 1 Jul 2015 14:32:50 +0000 Subject: [gpfsug-discuss] GPFS 4.1.1 without QoS for mmrestripefs? In-Reply-To: <201507011422.t61EMZmw011626@d01av01.pok.ibm.com> References: <2CDF270206A255459AC4FA6B08E52AF90114634DD0@ABCSYSEXC1.abcsystems.ch> <201507011422.t61EMZmw011626@d01av01.pok.ibm.com> Message-ID: Sven, It?s been a while since I tried that, but the last time I tried to limit the impact of a restripe by only running it on a few NSD server nodes it made things worse. Everybody was as slowed down as they would?ve been if I?d thrown every last NSD server we have at it and they were slowed down for longer, since using fewer NSD servers meant the restripe ran longer. What we do is always kick off restripes on a Friday afternoon, throw every NSD server we have at them, and let them run over the weekend. Interactive use is lower then and people don?t notice or care if their batch jobs run longer. Of course, this is all just my experiences. YMMV... Kevin On Jul 1, 2015, at 9:21 AM, Sven Oehme > wrote: Daniel, as you know, we can't discuss future / confidential items on a mailing list. what i presented as an outlook to future releases hasn't changed from a technical standpoint, we just can't share a release date until we announce it official. there are multiple ways today to limit the impact on restripe and other tasks, the best way to do this is to run the task ( using -N) on a node (or very small number of nodes) that has no performance critical role. while this is not perfect, it should limit the impact significantly. . sven ------------------------------------------ Sven Oehme Scalable Storage Research email: oehmes at us.ibm.com Phone: +1 (408) 824-8904 IBM Almaden Research Lab ------------------------------------------ Daniel Vogel ---07/01/2015 03:29:11 AM---Hi Years ago, IBM made some plan to do a implementation "QoS for mmrestripefs, mmdeldisk...". If a " From: Daniel Vogel > To: "'gpfsug-discuss at gpfsug.org'" > Date: 07/01/2015 03:29 AM Subject: [gpfsug-discuss] GPFS 4.1.1 without QoS for mmrestripefs? Sent by: gpfsug-discuss-bounces at gpfsug.org ________________________________ Hi Years ago, IBM made some plan to do a implementation ?QoS for mmrestripefs, mmdeldisk??. If a ?mmfsrestripe? is running, very poor performance for NFS access. I opened a PMR to ask for QoS in version 4.1.1 (Spectrum Scale). PMR 61309,113,848: I discussed the question of QOS with the development team. These command changes that were noticed are not meant to be used as GA code which is why they are not documented. I cannot provide any further information from the support perspective. Anybody knows about QoS? The last hope was at ?GPFS Workshop Stuttgart M?rz 2015? with Sven Oehme as speaker. Daniel Vogel IT Consultant ABC SYSTEMS AG Hauptsitz Z?rich R?tistrasse 28 CH - 8952 Schlieren T +41 43 433 6 433 D +41 43 433 6 467 http://www.abcsystems.ch ABC - Always Better Concepts. Approved By Customers since 1981. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss ? Kevin Buterbaugh - Senior System Administrator Vanderbilt University - Advanced Computing Center for Research and Education Kevin.Buterbaugh at vanderbilt.edu - (615)875-9633 -------------- next part -------------- An HTML attachment was scrubbed... URL: From chris.hunter at yale.edu Wed Jul 1 16:52:07 2015 From: chris.hunter at yale.edu (Chris Hunter) Date: Wed, 01 Jul 2015 11:52:07 -0400 Subject: [gpfsug-discuss] gpfs rdma expels Message-ID: <55940CA7.9010506@yale.edu> Hi UG list, We have a large rdma/tcp multi-cluster gpfs filesystem, about 2/3 of clients use RDMA. We see a large number of expels of rdma clients but less of the tcp clients. Most of the gpfs config is at defaults. We are unclear if any of the non-RDMA config items (eg. Idle socket timeout) would help our issue. Any suggestions on gpfs config parameters we should investigate ? thank-you in advance, chris hunter yale hpc group From viccornell at gmail.com Wed Jul 1 16:58:31 2015 From: viccornell at gmail.com (Vic Cornell) Date: Wed, 1 Jul 2015 16:58:31 +0100 Subject: [gpfsug-discuss] gpfs rdma expels In-Reply-To: <55940CA7.9010506@yale.edu> References: <55940CA7.9010506@yale.edu> Message-ID: <6E28C0FB-2F99-4127-B1F2-272BA2532330@gmail.com> If it used to work then its probably not config. Most expels are the result of network connectivity problems. If your cluster is not too big try looking at ping from every node to every other node and look for large latencies. Also look to see who is expelling who. Ie - if your RDMA nodes are being expelled by non-RDMA nodes. It may point to a weakness in your network which GPFS ,being as it is a great finder of weaknesses, is having a problem with. Also more details (network config etc) will elicit more detailed suggestions. Cheers, Vic > On 1 Jul 2015, at 16:52, Chris Hunter wrote: > > Hi UG list, > We have a large rdma/tcp multi-cluster gpfs filesystem, about 2/3 of clients use RDMA. We see a large number of expels of rdma clients but less of the tcp clients. > Most of the gpfs config is at defaults. We are unclear if any of the non-RDMA config items (eg. Idle socket timeout) would help our issue. Any suggestions on gpfs config parameters we should investigate ? > > thank-you in advance, > chris hunter > yale hpc group > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at gpfsug.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss From stijn.deweirdt at ugent.be Thu Jul 2 07:42:30 2015 From: stijn.deweirdt at ugent.be (Stijn De Weirdt) Date: Thu, 02 Jul 2015 08:42:30 +0200 Subject: [gpfsug-discuss] gpfs rdma expels In-Reply-To: <6E28C0FB-2F99-4127-B1F2-272BA2532330@gmail.com> References: <55940CA7.9010506@yale.edu> <6E28C0FB-2F99-4127-B1F2-272BA2532330@gmail.com> Message-ID: <5594DD56.6010302@ugent.be> do you use ipoib for the rdma nodes or regular ethernet? and what OS are you on? we had issue with el7.1 kernel and ipoib; there's packet loss with ipoib and mlnx_ofed (and mlnx engineering told that it might be in basic ofed from el7.1 too). 7.0 kernels are ok) and client expels were the first signs on our setup. stijn On 07/01/2015 05:58 PM, Vic Cornell wrote: > If it used to work then its probably not config. Most expels are the result of network connectivity problems. > > If your cluster is not too big try looking at ping from every node to every other node and look for large latencies. > > Also look to see who is expelling who. Ie - if your RDMA nodes are being expelled by non-RDMA nodes. It may point to a weakness in your network which GPFS ,being as it is a great finder of weaknesses, is having a problem with. > > Also more details (network config etc) will elicit more detailed suggestions. > > Cheers, > > Vic > > > >> On 1 Jul 2015, at 16:52, Chris Hunter wrote: >> >> Hi UG list, >> We have a large rdma/tcp multi-cluster gpfs filesystem, about 2/3 of clients use RDMA. We see a large number of expels of rdma clients but less of the tcp clients. >> Most of the gpfs config is at defaults. We are unclear if any of the non-RDMA config items (eg. Idle socket timeout) would help our issue. Any suggestions on gpfs config parameters we should investigate ? >> >> thank-you in advance, >> chris hunter >> yale hpc group >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at gpfsug.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at gpfsug.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > From Daniel.Vogel at abcsystems.ch Thu Jul 2 08:12:32 2015 From: Daniel.Vogel at abcsystems.ch (Daniel Vogel) Date: Thu, 2 Jul 2015 07:12:32 +0000 Subject: [gpfsug-discuss] GPFS 4.1.1 without QoS for mmrestripefs? In-Reply-To: <201507011422.t61EMZmw011626@d01av01.pok.ibm.com> References: <2CDF270206A255459AC4FA6B08E52AF90114634DD0@ABCSYSEXC1.abcsystems.ch> <201507011422.t61EMZmw011626@d01av01.pok.ibm.com> Message-ID: <2CDF270206A255459AC4FA6B08E52AF901146351BD@ABCSYSEXC1.abcsystems.ch> Sven, Yes I agree, but ?using ?N? to reduce the load helps not really. If I use NFS, for example, as a ESX data store, ESX I/O latency for NFS goes very high, the VM?s hangs. By the way I use SSD PCIe cards, perfect ?mirror speed? but slow I/O on NFS. The GPFS cluster concept I use are different than GSS or traditional FC (shared storage). I use shared nothing with IB (no FPO), many GPFS nodes with NSD?s. I know the need to resync the FS with mmchdisk / mmrestripe will happen more often. The only one feature will help is QoS for the GPFS admin jobs. I hope we are not fare away from this. Thanks, Daniel Von: gpfsug-discuss-bounces at gpfsug.org [mailto:gpfsug-discuss-bounces at gpfsug.org] Im Auftrag von Sven Oehme Gesendet: Mittwoch, 1. Juli 2015 16:21 An: gpfsug main discussion list Betreff: Re: [gpfsug-discuss] GPFS 4.1.1 without QoS for mmrestripefs? Daniel, as you know, we can't discuss future / confidential items on a mailing list. what i presented as an outlook to future releases hasn't changed from a technical standpoint, we just can't share a release date until we announce it official. there are multiple ways today to limit the impact on restripe and other tasks, the best way to do this is to run the task ( using -N) on a node (or very small number of nodes) that has no performance critical role. while this is not perfect, it should limit the impact significantly. . sven ------------------------------------------ Sven Oehme Scalable Storage Research email: oehmes at us.ibm.com Phone: +1 (408) 824-8904 IBM Almaden Research Lab ------------------------------------------ [Inactive hide details for Daniel Vogel ---07/01/2015 03:29:11 AM---Hi Years ago, IBM made some plan to do a implementation "QoS]Daniel Vogel ---07/01/2015 03:29:11 AM---Hi Years ago, IBM made some plan to do a implementation "QoS for mmrestripefs, mmdeldisk...". If a " From: Daniel Vogel > To: "'gpfsug-discuss at gpfsug.org'" > Date: 07/01/2015 03:29 AM Subject: [gpfsug-discuss] GPFS 4.1.1 without QoS for mmrestripefs? Sent by: gpfsug-discuss-bounces at gpfsug.org ________________________________ Hi Years ago, IBM made some plan to do a implementation ?QoS for mmrestripefs, mmdeldisk??. If a ?mmfsrestripe? is running, very poor performance for NFS access. I opened a PMR to ask for QoS in version 4.1.1 (Spectrum Scale). PMR 61309,113,848: I discussed the question of QOS with the development team. These command changes that were noticed are not meant to be used as GA code which is why they are not documented. I cannot provide any further information from the support perspective. Anybody knows about QoS? The last hope was at ?GPFS Workshop Stuttgart M?rz 2015? with Sven Oehme as speaker. Daniel Vogel IT Consultant ABC SYSTEMS AG Hauptsitz Z?rich R?tistrasse 28 CH - 8952 Schlieren T +41 43 433 6 433 D +41 43 433 6 467 http://www.abcsystems.ch ABC - Always Better Concepts. Approved By Customers since 1981. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image001.gif Type: image/gif Size: 105 bytes Desc: image001.gif URL: From chris.howarth at citi.com Thu Jul 2 08:24:37 2015 From: chris.howarth at citi.com (Howarth, Chris ) Date: Thu, 2 Jul 2015 07:24:37 +0000 Subject: [gpfsug-discuss] GPFS 4.1.1 without QoS for mmrestripefs? In-Reply-To: <2CDF270206A255459AC4FA6B08E52AF901146351BD@ABCSYSEXC1.abcsystems.ch> References: <2CDF270206A255459AC4FA6B08E52AF90114634DD0@ABCSYSEXC1.abcsystems.ch> <201507011422.t61EMZmw011626@d01av01.pok.ibm.com> <2CDF270206A255459AC4FA6B08E52AF901146351BD@ABCSYSEXC1.abcsystems.ch> Message-ID: <0609A0AC1B1CA9408D88D4144C5C990B75D89CF5@EXLNMB52.eur.nsroot.net> Daniel ?in our environment we have data and metadata split out onto separate drives in separate servers. We also set the GPFS parameter ?mmchconfig defaultHelperNodes=?list_of_metadata_servers? which will automatically only use these nodes for the scan for restriping/rebalancing data (rather than having to specify the ?N option). This dramatically reduced the impact to clients accessing the data nodes while these activities are taking place. Also using SSDs for metadata nodes can make a big improvement. Chris From: gpfsug-discuss-bounces at gpfsug.org [mailto:gpfsug-discuss-bounces at gpfsug.org] On Behalf Of Daniel Vogel Sent: Thursday, July 02, 2015 8:13 AM To: 'gpfsug main discussion list' Subject: Re: [gpfsug-discuss] GPFS 4.1.1 without QoS for mmrestripefs? Sven, Yes I agree, but ?using ?N? to reduce the load helps not really. If I use NFS, for example, as a ESX data store, ESX I/O latency for NFS goes very high, the VM?s hangs. By the way I use SSD PCIe cards, perfect ?mirror speed? but slow I/O on NFS. The GPFS cluster concept I use are different than GSS or traditional FC (shared storage). I use shared nothing with IB (no FPO), many GPFS nodes with NSD?s. I know the need to resync the FS with mmchdisk / mmrestripe will happen more often. The only one feature will help is QoS for the GPFS admin jobs. I hope we are not fare away from this. Thanks, Daniel Von: gpfsug-discuss-bounces at gpfsug.org [mailto:gpfsug-discuss-bounces at gpfsug.org] Im Auftrag von Sven Oehme Gesendet: Mittwoch, 1. Juli 2015 16:21 An: gpfsug main discussion list Betreff: Re: [gpfsug-discuss] GPFS 4.1.1 without QoS for mmrestripefs? Daniel, as you know, we can't discuss future / confidential items on a mailing list. what i presented as an outlook to future releases hasn't changed from a technical standpoint, we just can't share a release date until we announce it official. there are multiple ways today to limit the impact on restripe and other tasks, the best way to do this is to run the task ( using -N) on a node (or very small number of nodes) that has no performance critical role. while this is not perfect, it should limit the impact significantly. . sven ------------------------------------------ Sven Oehme Scalable Storage Research email: oehmes at us.ibm.com Phone: +1 (408) 824-8904 IBM Almaden Research Lab ------------------------------------------ [Inactive hide details for Daniel Vogel ---07/01/2015 03:29:11 AM---Hi Years ago, IBM made some plan to do a implementation "QoS]Daniel Vogel ---07/01/2015 03:29:11 AM---Hi Years ago, IBM made some plan to do a implementation "QoS for mmrestripefs, mmdeldisk...". If a " From: Daniel Vogel > To: "'gpfsug-discuss at gpfsug.org'" > Date: 07/01/2015 03:29 AM Subject: [gpfsug-discuss] GPFS 4.1.1 without QoS for mmrestripefs? Sent by: gpfsug-discuss-bounces at gpfsug.org ________________________________ Hi Years ago, IBM made some plan to do a implementation ?QoS for mmrestripefs, mmdeldisk??. If a ?mmfsrestripe? is running, very poor performance for NFS access. I opened a PMR to ask for QoS in version 4.1.1 (Spectrum Scale). PMR 61309,113,848: I discussed the question of QOS with the development team. These command changes that were noticed are not meant to be used as GA code which is why they are not documented. I cannot provide any further information from the support perspective. Anybody knows about QoS? The last hope was at ?GPFS Workshop Stuttgart M?rz 2015? with Sven Oehme as speaker. Daniel Vogel IT Consultant ABC SYSTEMS AG Hauptsitz Z?rich R?tistrasse 28 CH - 8952 Schlieren T +41 43 433 6 433 D +41 43 433 6 467 http://www.abcsystems.ch ABC - Always Better Concepts. Approved By Customers since 1981. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image001.gif Type: image/gif Size: 105 bytes Desc: image001.gif URL: From chris.hunter at yale.edu Thu Jul 2 14:01:53 2015 From: chris.hunter at yale.edu (Chris Hunter) Date: Thu, 02 Jul 2015 09:01:53 -0400 Subject: [gpfsug-discuss] gpfs rdma expels Message-ID: <55953641.4010701@yale.edu> Thanks for the feedback. Our network is non-uniform, we have three (uniform) rdma networks connected by narrow uplinks. Previously we used gpfs on one network, now we wish to expand to the other networks. Previous experience shows we see "PortXmitWait" messages from traffic over the narrow uplinks. We find expels happen often from gpfs communication over the narrow uplinks. We acknowledge an inherent weakness with narrow uplinks but for practical reasons it would be difficult to resolve. So the question, is it possible to configure gpfs to be tolerant of non-uniform networks with narrow uplinks ? thanks, chris hunter > On 1 Jul 2015, at 16:52, Chris Hunter wrote: > > Hi UG list, > We have a large rdma/tcp multi-cluster gpfs filesystem, about 2/3 of > clients use RDMA. We see a large number of expels of rdma clients but > less of the tcp clients. Most of the gpfs config is at defaults. We > are unclear if any of the non-RDMA config items (eg. Idle socket > timeout) would help our issue. Any suggestions on gpfs config > parameters we should investigate ? From S.J.Thompson at bham.ac.uk Thu Jul 2 16:43:03 2015 From: S.J.Thompson at bham.ac.uk (Simon Thompson (Research Computing - IT Services)) Date: Thu, 2 Jul 2015 15:43:03 +0000 Subject: [gpfsug-discuss] 4.1.1 protocol support Message-ID: Hi, Just wondering if anyone has looked at the new protocol support stuff in 4.1.1 yet? >From what I can see, it wants to use the installer to add things like IBM Samba onto nodes in the cluster. The docs online seem to list manual installation as running the chef template, which is hardly manual... 1. Id like to know what is being run on my cluster 2. Its an existing install which was using sernet samba, so I don't want to go out and break anything inadvertently 3. My protocol nodes are in a multicluster, and I understand the installer doesn't support multicluster. (the docs state that multicluster isn't supported but something like its expected to work). So... Has anyone had a go at this yet and have a set of steps? I've started unpicking the chef recipe, but just wondering if anyone had already had a go at this? (and lets not start on the mildy bemusing error when you "enable" the service with "mmces service enable" (ces service not enabled) - there's other stuff to enable it)... Simon From GARWOODM at uk.ibm.com Thu Jul 2 16:55:42 2015 From: GARWOODM at uk.ibm.com (Michael Garwood7) Date: Thu, 2 Jul 2015 16:55:42 +0100 Subject: [gpfsug-discuss] 4.1.1 protocol support In-Reply-To: References: Message-ID: Hi Simon, 1. Most of the chef recipes involve installing the various packages required for the protocols, and some of the new performance monitoring packages required for mmperfquery. There is a series of steps for proper manual install at http://www-01.ibm.com/support/knowledgecenter/STXKQY_4.1.1/com.ibm.spectrum.scale.v4r11.adv.doc/bl1adv_ces_features.htm but this assumes you have all IBM Samba RPMs and prerequisites installed. The recipes *should* be split out so that at the very least, RPM install is done in its own recipe without configuring or enabling anything... 2. I am not 100% sure what deploying IBM Samba on the cluster will do with regards to sernet samba. As far as I am aware there is no code in the installer or chef recipes to check for other samba deployments running but I may be mistaken. Depending on how sernet samba hooks to GPFS I can't think of any reason why it would cause problems aside from the risk of the protocols not communicating and causing issues with file locks/data overwrites, depending on what workload you have running on samba. 3. I haven't personally seen multicluster deployments done or tested before, but no, it is not officially supported. The installer has been written with the assumption that you are installing to one cluster, so I wouldn't recommend trying with multiple clusters - unforseen consequences :) Regards, Michael Garwood IBM Systems Developer Phone: 44-161-905-4118 E-mail: GARWOODM at uk.ibm.com 40 Blackfriars Street Manchester, M3 2EG United Kingdom IBM United Kingdom Limited Registered in England and Wales with number 741598 Registered office: PO Box 41, North Harbour, Portsmouth, Hants. PO6 3AU From: "Simon Thompson (Research Computing - IT Services)" To: gpfsug main discussion list , Date: 02/07/2015 16:43 Subject: [gpfsug-discuss] 4.1.1 protocol support Sent by: gpfsug-discuss-bounces at gpfsug.org Hi, Just wondering if anyone has looked at the new protocol support stuff in 4.1.1 yet? >From what I can see, it wants to use the installer to add things like IBM Samba onto nodes in the cluster. The docs online seem to list manual installation as running the chef template, which is hardly manual... 1. Id like to know what is being run on my cluster 2. Its an existing install which was using sernet samba, so I don't want to go out and break anything inadvertently 3. My protocol nodes are in a multicluster, and I understand the installer doesn't support multicluster. (the docs state that multicluster isn't supported but something like its expected to work). So... Has anyone had a go at this yet and have a set of steps? I've started unpicking the chef recipe, but just wondering if anyone had already had a go at this? (and lets not start on the mildy bemusing error when you "enable" the service with "mmces service enable" (ces service not enabled) - there's other stuff to enable it)... Simon _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss Unless stated otherwise above: IBM United Kingdom Limited - Registered in England and Wales with number 741598. Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU -------------- next part -------------- An HTML attachment was scrubbed... URL: From oester at gmail.com Thu Jul 2 17:02:01 2015 From: oester at gmail.com (Bob Oesterlin) Date: Thu, 2 Jul 2015 11:02:01 -0500 Subject: [gpfsug-discuss] 4.1.1 protocol support In-Reply-To: References: Message-ID: Hi Simon I was part of a beta program for GPFS (ok, better start saying Spectrum Scale!) 4.1.1, so I've had some experience with the toolkit that installs the protocol nodes. The new protocol nodes MUST be RH7, so it's going to be a bit more of an involved process to migrate to this level than in the past. The GPFS server nodes/client nodes can remain at RH6 is needed. Overall it works pretty well. You do have the option of doing things manually as well. The guide that describes it is pretty good. If you want to discuss the process in detail, I'd be happy to do so - a bit too much to cover over a mailing list. Bob Oesterlin Sr Storage Engineer, Nuance Communications robert.oesterlin at nuance.com On Thu, Jul 2, 2015 at 10:43 AM, Simon Thompson (Research Computing - IT Services) wrote: > Hi, > > Just wondering if anyone has looked at the new protocol support stuff in > 4.1.1 yet? > > From what I can see, it wants to use the installer to add things like IBM > Samba onto nodes in the cluster. The docs online seem to list manual > installation as running the chef template, which is hardly manual... > > 1. Id like to know what is being run on my cluster > 2. Its an existing install which was using sernet samba, so I don't want > to go out and break anything inadvertently > 3. My protocol nodes are in a multicluster, and I understand the installer > doesn't support multicluster. > > (the docs state that multicluster isn't supported but something like its > expected to work). > > So... Has anyone had a go at this yet and have a set of steps? > > I've started unpicking the chef recipe, but just wondering if anyone had > already had a go at this? > > (and lets not start on the mildy bemusing error when you "enable" the > service with "mmces service enable" (ces service not enabled) - there's > other stuff to enable it)... > > Simon > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at gpfsug.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: From S.J.Thompson at bham.ac.uk Thu Jul 2 19:52:28 2015 From: S.J.Thompson at bham.ac.uk (Simon Thompson (Research Computing - IT Services)) Date: Thu, 2 Jul 2015 18:52:28 +0000 Subject: [gpfsug-discuss] 4.1.1 protocol support In-Reply-To: References: Message-ID: Hi Michael, Thanks for that link. This is the docs I?d found before: http://www-01.ibm.com/support/knowledgecenter/#!/STXKQY_4.1.1/com.ibm.spectrum.scale.v4r11.ins.doc/bl1ins_manualprotocols.htm I guess one of the reasons for wanting to unpick is because we already have configuration management tools all in place. I have no issue about GPFS config being inside GPFS, but we really need to know what is going on (and we can manage to get the RPMs all on etc if we know what is needed from the config management tool). I do note that it needs CCR enabled, which we currently don?t have. Now I think this was because we saw issues with mmsdrestore when adding a node that had been reinstalled back into the cluster. I need to check if that is still the case (we work on being able to pull clients, NSDs etc from the cluster and using xcat to reprovision and the a config tool to do the relevant bits to rejoin the cluster ? makes it easier for us to stage kernel, GPFS, OFED updates as we just blat on a new image). I don?t really want to have a mix of Sernet and IBM samba on there, so am happy to pull out those bits, but obviously need to get the IBM bits working as well. Multicluster ? well, our ?protocol? cluster is a separate cluster from the NSD cluster (can?t remote expel, might want to add other GPFS clusters to the protocol layer etc). Of course the multi cluster talks GPFS protocol, so I don?t see any reason why it shouldn?t work, but yes, noted its not supported. Simon From: Michael Garwood7 > Reply-To: gpfsug main discussion list > Date: Thursday, 2 July 2015 16:55 To: gpfsug main discussion list > Subject: Re: [gpfsug-discuss] 4.1.1 protocol support Hi Simon, 1. Most of the chef recipes involve installing the various packages required for the protocols, and some of the new performance monitoring packages required for mmperfquery. There is a series of steps for proper manual install at http://www-01.ibm.com/support/knowledgecenter/STXKQY_4.1.1/com.ibm.spectrum.scale.v4r11.adv.doc/bl1adv_ces_features.htm but this assumes you have all IBM Samba RPMs and prerequisites installed. The recipes *should* be split out so that at the very least, RPM install is done in its own recipe without configuring or enabling anything... 2. I am not 100% sure what deploying IBM Samba on the cluster will do with regards to sernet samba. As far as I am aware there is no code in the installer or chef recipes to check for other samba deployments running but I may be mistaken. Depending on how sernet samba hooks to GPFS I can't think of any reason why it would cause problems aside from the risk of the protocols not communicating and causing issues with file locks/data overwrites, depending on what workload you have running on samba. 3. I haven't personally seen multicluster deployments done or tested before, but no, it is not officially supported. The installer has been written with the assumption that you are installing to one cluster, so I wouldn't recommend trying with multiple clusters - unforseen consequences :) Regards, Michael Garwood IBM Systems Developer ________________________________ Phone: 44-161-905-4118 E-mail: GARWOODM at uk.ibm.com 40 Blackfriars Street Manchester, M3 2EG United Kingdom IBM United Kingdom Limited Registered in England and Wales with number 741598 Registered office: PO Box 41, North Harbour, Portsmouth, Hants. PO6 3AU From: "Simon Thompson (Research Computing - IT Services)" > To: gpfsug main discussion list >, Date: 02/07/2015 16:43 Subject: [gpfsug-discuss] 4.1.1 protocol support Sent by: gpfsug-discuss-bounces at gpfsug.org ________________________________ Hi, Just wondering if anyone has looked at the new protocol support stuff in 4.1.1 yet? >From what I can see, it wants to use the installer to add things like IBM Samba onto nodes in the cluster. The docs online seem to list manual installation as running the chef template, which is hardly manual... 1. Id like to know what is being run on my cluster 2. Its an existing install which was using sernet samba, so I don't want to go out and break anything inadvertently 3. My protocol nodes are in a multicluster, and I understand the installer doesn't support multicluster. (the docs state that multicluster isn't supported but something like its expected to work). So... Has anyone had a go at this yet and have a set of steps? I've started unpicking the chef recipe, but just wondering if anyone had already had a go at this? (and lets not start on the mildy bemusing error when you "enable" the service with "mmces service enable" (ces service not enabled) - there's other stuff to enable it)... Simon _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss Unless stated otherwise above: IBM United Kingdom Limited - Registered in England and Wales with number 741598. Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU -------------- next part -------------- An HTML attachment was scrubbed... URL: From S.J.Thompson at bham.ac.uk Thu Jul 2 19:58:12 2015 From: S.J.Thompson at bham.ac.uk (Simon Thompson (Research Computing - IT Services)) Date: Thu, 2 Jul 2015 18:58:12 +0000 Subject: [gpfsug-discuss] 4.1.1 protocol support In-Reply-To: References: Message-ID: Hi Bob, Thanks, I?ll have a look through the link Michael sent me and shout if I get stuck? Looks a bit different to the previous way were we running this with ctdb etc. Our protocol nodes are already running 7.1 (though CentOS which means the mmbuildgpl command doesn?t work, would be much nice of course if the init script detected the kernel had changed and did a build etc automagically ?). Simon From: Bob Oesterlin > Reply-To: gpfsug main discussion list > Date: Thursday, 2 July 2015 17:02 To: gpfsug main discussion list > Subject: Re: [gpfsug-discuss] 4.1.1 protocol support Hi Simon I was part of a beta program for GPFS (ok, better start saying Spectrum Scale!) 4.1.1, so I've had some experience with the toolkit that installs the protocol nodes. The new protocol nodes MUST be RH7, so it's going to be a bit more of an involved process to migrate to this level than in the past. The GPFS server nodes/client nodes can remain at RH6 is needed. Overall it works pretty well. You do have the option of doing things manually as well. The guide that describes it is pretty good. If you want to discuss the process in detail, I'd be happy to do so - a bit too much to cover over a mailing list. Bob Oesterlin Sr Storage Engineer, Nuance Communications robert.oesterlin at nuance.com On Thu, Jul 2, 2015 at 10:43 AM, Simon Thompson (Research Computing - IT Services) > wrote: Hi, Just wondering if anyone has looked at the new protocol support stuff in 4.1.1 yet? >From what I can see, it wants to use the installer to add things like IBM Samba onto nodes in the cluster. The docs online seem to list manual installation as running the chef template, which is hardly manual... 1. Id like to know what is being run on my cluster 2. Its an existing install which was using sernet samba, so I don't want to go out and break anything inadvertently 3. My protocol nodes are in a multicluster, and I understand the installer doesn't support multicluster. (the docs state that multicluster isn't supported but something like its expected to work). So... Has anyone had a go at this yet and have a set of steps? I've started unpicking the chef recipe, but just wondering if anyone had already had a go at this? (and lets not start on the mildy bemusing error when you "enable" the service with "mmces service enable" (ces service not enabled) - there's other stuff to enable it)... Simon _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From oester at gmail.com Thu Jul 2 20:03:02 2015 From: oester at gmail.com (Bob Oesterlin) Date: Thu, 2 Jul 2015 14:03:02 -0500 Subject: [gpfsug-discuss] 4.1.1 protocol support In-Reply-To: References: Message-ID: On Thu, Jul 2, 2015 at 1:52 PM, Simon Thompson (Research Computing - IT Services) wrote: > I do note that it needs CCR enabled, which we currently don?t have. Now I > think this was because we saw issues with mmsdrestore when adding a node > that had been reinstalled back into the cluster. I need to check if that is > still the case (we work on being able to pull clients, NSDs etc from the > cluster and using xcat to reprovision and the a config tool to do the > relevant bits to rejoin the cluster ? makes it easier for us to stage > kernel, GPFS, OFED updates as we just blat on a new image). > Yes, and this is why we couldn't use CCR - our compute nodes are netboot, so they go thru a mmsdrrestore every time they reboot. Now, they have fixed this in 4.1.1, which means if you can get (the cluster) to 4.1.1 and turn on CCR, mmsdrrestore should work. Note to self: Test this out in your sandbox cluster. :-) Bob Oesterlin -------------- next part -------------- An HTML attachment was scrubbed... URL: From S.J.Thompson at bham.ac.uk Fri Jul 3 12:22:06 2015 From: S.J.Thompson at bham.ac.uk (Simon Thompson (Research Computing - IT Services)) Date: Fri, 3 Jul 2015 11:22:06 +0000 Subject: [gpfsug-discuss] mmsdrestore with CCR enabled (was Re: 4.1.1 protocol support) Message-ID: Bob, (anyone?) Have you tried mmsdrestore to see if its working in 4.1.1? # mmsdrrestore -p PRIMARY -R /usr/bin/scp Fri 3 Jul 11:56:05 BST 2015: mmsdrrestore: Processing node PRIMARY ccrio initialization failed (err 811) mmsdrrestore: Unable to retrieve GPFS cluster files from CCR. mmsdrrestore: Unexpected error from updateMmfsEnvironment. Return code: 1 mmsdrrestore: Command failed. Examine previous error messages to determine cause. It seems to copy the mmsdrfs file to the local node into /var/mmfs/gen/mmsdrfs but then fails to actually work. Simon From: Bob Oesterlin > Reply-To: gpfsug main discussion list > Date: Thursday, 2 July 2015 20:03 To: gpfsug main discussion list > Subject: Re: [gpfsug-discuss] 4.1.1 protocol support On Thu, Jul 2, 2015 at 1:52 PM, Simon Thompson (Research Computing - IT Services) > wrote: I do note that it needs CCR enabled, which we currently don?t have. Now I think this was because we saw issues with mmsdrestore when adding a node that had been reinstalled back into the cluster. I need to check if that is still the case (we work on being able to pull clients, NSDs etc from the cluster and using xcat to reprovision and the a config tool to do the relevant bits to rejoin the cluster ? makes it easier for us to stage kernel, GPFS, OFED updates as we just blat on a new image). Yes, and this is why we couldn't use CCR - our compute nodes are netboot, so they go thru a mmsdrrestore every time they reboot. Now, they have fixed this in 4.1.1, which means if you can get (the cluster) to 4.1.1 and turn on CCR, mmsdrrestore should work. Note to self: Test this out in your sandbox cluster. :-) Bob Oesterlin -------------- next part -------------- An HTML attachment was scrubbed... URL: From S.J.Thompson at bham.ac.uk Fri Jul 3 12:50:31 2015 From: S.J.Thompson at bham.ac.uk (Simon Thompson (Research Computing - IT Services)) Date: Fri, 3 Jul 2015 11:50:31 +0000 Subject: [gpfsug-discuss] mmsdrestore with CCR enabled (was Re: 4.1.1 protocol support) In-Reply-To: References: Message-ID: Actually, no just ignore me, it does appear to be fixed in 4.1.1 * I cleaned up the node by removing the 4.1.1 packages, then cleaned up /var/mmfs, but then when the config tool reinstalled, it put 4.1.0 back on and didn?t apply the updates to 4.1.1, so it must have been an older version of mmsdrrestore Simon From: Simon Thompson > Reply-To: gpfsug main discussion list > Date: Friday, 3 July 2015 12:22 To: gpfsug main discussion list > Subject: [gpfsug-discuss] mmsdrestore with CCR enabled (was Re: 4.1.1 protocol support) Bob, (anyone?) Have you tried mmsdrestore to see if its working in 4.1.1? # mmsdrrestore -p PRIMARY -R /usr/bin/scp Fri 3 Jul 11:56:05 BST 2015: mmsdrrestore: Processing node PRIMARY ccrio initialization failed (err 811) mmsdrrestore: Unable to retrieve GPFS cluster files from CCR. mmsdrrestore: Unexpected error from updateMmfsEnvironment. Return code: 1 mmsdrrestore: Command failed. Examine previous error messages to determine cause. It seems to copy the mmsdrfs file to the local node into /var/mmfs/gen/mmsdrfs but then fails to actually work. Simon From: Bob Oesterlin > Reply-To: gpfsug main discussion list > Date: Thursday, 2 July 2015 20:03 To: gpfsug main discussion list > Subject: Re: [gpfsug-discuss] 4.1.1 protocol support On Thu, Jul 2, 2015 at 1:52 PM, Simon Thompson (Research Computing - IT Services) > wrote: I do note that it needs CCR enabled, which we currently don?t have. Now I think this was because we saw issues with mmsdrestore when adding a node that had been reinstalled back into the cluster. I need to check if that is still the case (we work on being able to pull clients, NSDs etc from the cluster and using xcat to reprovision and the a config tool to do the relevant bits to rejoin the cluster ? makes it easier for us to stage kernel, GPFS, OFED updates as we just blat on a new image). Yes, and this is why we couldn't use CCR - our compute nodes are netboot, so they go thru a mmsdrrestore every time they reboot. Now, they have fixed this in 4.1.1, which means if you can get (the cluster) to 4.1.1 and turn on CCR, mmsdrrestore should work. Note to self: Test this out in your sandbox cluster. :-) Bob Oesterlin -------------- next part -------------- An HTML attachment was scrubbed... URL: From oester at gmail.com Fri Jul 3 13:21:43 2015 From: oester at gmail.com (Bob Oesterlin) Date: Fri, 3 Jul 2015 07:21:43 -0500 Subject: [gpfsug-discuss] mmsdrestore with CCR enabled (was Re: 4.1.1 protocol support) In-Reply-To: References: Message-ID: On Fri, Jul 3, 2015 at 6:22 AM, Simon Thompson (Research Computing - IT Services) wrote: > Have you tried mmsdrestore to see if its working in 4.1.1? Well, no actually :) They told me it was fixed but I have never got 'round to checking it during my beta testing. If it's not, I say submit a PMR and let's get them to fix it - I will do the same. It would be nice to actually use CCR, especially if the new protocol support depends on it. Bob Oesterlin -------------- next part -------------- An HTML attachment was scrubbed... URL: From oester at gmail.com Fri Jul 3 13:22:37 2015 From: oester at gmail.com (Bob Oesterlin) Date: Fri, 3 Jul 2015 07:22:37 -0500 Subject: [gpfsug-discuss] mmsdrestore with CCR enabled (was Re: 4.1.1 protocol support) In-Reply-To: References: Message-ID: On Fri, Jul 3, 2015 at 6:22 AM, Simon Thompson (Research Computing - IT Services) wrote: > Have you tried mmsdrestore to see if its working in 4.1.1? One thing - did you try this on a pure 4.1.1 cluster with release=LATEST? Bob Oesterlin -------------- next part -------------- An HTML attachment was scrubbed... URL: From S.J.Thompson at bham.ac.uk Fri Jul 3 13:28:10 2015 From: S.J.Thompson at bham.ac.uk (Simon Thompson (Research Computing - IT Services)) Date: Fri, 3 Jul 2015 12:28:10 +0000 Subject: [gpfsug-discuss] mmsdrestore with CCR enabled (was Re: 4.1.1 protocol support) In-Reply-To: References: Message-ID: It was on a pure cluster with 4.1.1 only. (I had to do that a precursor to start enabling CES). As I mentioned, I messed up with 4.1.0 client installed so it doesn?t work from a mixed version, but did work from pure 4.1.1 Simon From: Bob Oesterlin > Reply-To: gpfsug main discussion list > Date: Friday, 3 July 2015 13:22 To: gpfsug main discussion list > Subject: Re: [gpfsug-discuss] mmsdrestore with CCR enabled (was Re: 4.1.1 protocol support) On Fri, Jul 3, 2015 at 6:22 AM, Simon Thompson (Research Computing - IT Services) > wrote: Have you tried mmsdrestore to see if its working in 4.1.1? One thing - did you try this on a pure 4.1.1 cluster with release=LATEST? Bob Oesterlin -------------- next part -------------- An HTML attachment was scrubbed... URL: From oehmes at us.ibm.com Fri Jul 3 23:48:38 2015 From: oehmes at us.ibm.com (Sven Oehme) Date: Fri, 3 Jul 2015 15:48:38 -0700 Subject: [gpfsug-discuss] GPFS 4.1.1 without QoS for mmrestripefs? In-Reply-To: <2CDF270206A255459AC4FA6B08E52AF901146351BD@ABCSYSEXC1.abcsystems.ch> References: <2CDF270206A255459AC4FA6B08E52AF90114634DD0@ABCSYSEXC1.abcsystems.ch><201507011422.t61EMZmw011626@d01av01.pok.ibm.com> <2CDF270206A255459AC4FA6B08E52AF901146351BD@ABCSYSEXC1.abcsystems.ch> Message-ID: <201507032249.t63Mnffp025995@d03av03.boulder.ibm.com> this triggers a few questions 1. have you tried running it only on a node that doesn't serve NFS data ? 2. what NFS stack are you using ? is this the kernel NFS Server as part of linux means you use cNFS ? if the answer to 2 is yes, have you adjusted the nfsd threads in /etc/sysconfig/nfs ? the default is only 8 and if you run with the default you have a very low number of threads from the outside competing with a larger number of threads doing restripe, increasing the nfsd threads could help. you could also reduce the number of internal restripe threads to try out if that helps mitigating the impact. to try an extreme low value set the following : mmchconfig pitWorkerThreadsPerNode=1 -i and retry the restripe again, to reset it back to default run mmchconfig pitWorkerThreadsPerNode=DEFAULT -i sven ------------------------------------------ Sven Oehme Scalable Storage Research email: oehmes at us.ibm.com Phone: +1 (408) 824-8904 IBM Almaden Research Lab ------------------------------------------ From: Daniel Vogel To: "'gpfsug main discussion list'" Date: 07/02/2015 12:12 AM Subject: Re: [gpfsug-discuss] GPFS 4.1.1 without QoS for mmrestripefs? Sent by: gpfsug-discuss-bounces at gpfsug.org Sven, Yes I agree, but ?using ?N? to reduce the load helps not really. If I use NFS, for example, as a ESX data store, ESX I/O latency for NFS goes very high, the VM?s hangs. By the way I use SSD PCIe cards, perfect ?mirror speed? but slow I/O on NFS. The GPFS cluster concept I use are different than GSS or traditional FC (shared storage). I use shared nothing with IB (no FPO), many GPFS nodes with NSD?s. I know the need to resync the FS with mmchdisk / mmrestripe will happen more often. The only one feature will help is QoS for the GPFS admin jobs. I hope we are not fare away from this. Thanks, Daniel Von: gpfsug-discuss-bounces at gpfsug.org [ mailto:gpfsug-discuss-bounces at gpfsug.org] Im Auftrag von Sven Oehme Gesendet: Mittwoch, 1. Juli 2015 16:21 An: gpfsug main discussion list Betreff: Re: [gpfsug-discuss] GPFS 4.1.1 without QoS for mmrestripefs? Daniel, as you know, we can't discuss future / confidential items on a mailing list. what i presented as an outlook to future releases hasn't changed from a technical standpoint, we just can't share a release date until we announce it official. there are multiple ways today to limit the impact on restripe and other tasks, the best way to do this is to run the task ( using -N) on a node (or very small number of nodes) that has no performance critical role. while this is not perfect, it should limit the impact significantly. . sven ------------------------------------------ Sven Oehme Scalable Storage Research email: oehmes at us.ibm.com Phone: +1 (408) 824-8904 IBM Almaden Research Lab ------------------------------------------ Inactive hide details for Daniel Vogel ---07/01/2015 03:29:11 AM---Hi Years ago, IBM made some plan to do a implementation "QoSDaniel Vogel ---07/01/2015 03:29:11 AM---Hi Years ago, IBM made some plan to do a implementation "QoS for mmrestripefs, mmdeldisk...". If a " From: Daniel Vogel To: "'gpfsug-discuss at gpfsug.org'" Date: 07/01/2015 03:29 AM Subject: [gpfsug-discuss] GPFS 4.1.1 without QoS for mmrestripefs? Sent by: gpfsug-discuss-bounces at gpfsug.org Hi Years ago, IBM made some plan to do a implementation ?QoS for mmrestripefs, mmdeldisk??. If a ?mmfsrestripe? is running, very poor performance for NFS access. I opened a PMR to ask for QoS in version 4.1.1 (Spectrum Scale). PMR 61309,113,848: I discussed the question of QOS with the development team. These command changes that were noticed are not meant to be used as GA code which is why they are not documented. I cannot provide any further information from the support perspective. Anybody knows about QoS? The last hope was at ?GPFS Workshop Stuttgart M?rz 2015? with Sven Oehme as speaker. Daniel Vogel IT Consultant ABC SYSTEMS AG Hauptsitz Z?rich R?tistrasse 28 CH - 8952 Schlieren T +41 43 433 6 433 D +41 43 433 6 467 http://www.abcsystems.ch ABC - Always Better Concepts. Approved By Customers since 1981. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: From S.J.Thompson at bham.ac.uk Mon Jul 6 11:09:08 2015 From: S.J.Thompson at bham.ac.uk (Simon Thompson (Research Computing - IT Services)) Date: Mon, 6 Jul 2015 10:09:08 +0000 Subject: [gpfsug-discuss] SMB support and config Message-ID: Hi, (sorry, lots of questions about this stuff at the moment!) I?m currently looking at removing the sernet smb configs we had previously and moving to IBM SMB. I?ve removed all the old packages and only now have gpfs.smb installed on the systems. I?m struggling to get the config tools to work for our environment. We have MS Windows AD Domain for authentication. For various reasons, however doesn?t hold the UIDs/GIDs, which are instead held in a different LDAP directory. In the past, we?d configure the Linux servers running Samba so that NSLCD was configured to get details from the LDAP server. (e.g. getent passwd would return the data for an AD user). The Linux boxes would also be configured to use KRB5 authentication where users were allowed to ssh etc in for password authentication. So as far as Samba was concerned, it would do ?security = ADS? and then we?d also have "idmap config * : backend = tdb2? I.e. Use Domain for authentication, but look locally for ID mapping data. Now I can configured IBM SMB to use ADS for authentication: mmuserauth service create --type ad --data-access-method file --netbios-name its-rds --user-name ADMINUSER --servers DOMAIN.ADF --idmap-role subordinate However I can?t see anyway for me to manipulate the config so that it doesn?t use autorid. Using this we end up with: mmsmb config list | grep -i idmap idmap config * : backend autorid idmap config * : range 10000000-299999999 idmap config * : rangesize 1000000 idmap config * : read only yes idmap:cache no It also adds: mmsmb config list | grep -i auth auth methods guest sam winbind (though I don?t think that is a problem). I also can?t change the idmap using the mmsmb command (I think would look like this): # mmsmb config change --option="idmap config * : backend=tdb2" idmap config * : backend=tdb2: [E] Unsupported smb option. More information about smb options is availabe in the man page. I can?t see anything in the docs at: http://www-01.ibm.com/support/knowledgecenter/#!/STXKQY_4.1.1/com.ibm.spect rum.scale.v4r11.adm.doc/bl1adm_configfileauthentication.htm That give me a clue how to do what I want. I?d be happy to do some mixture of AD for authentication and LDAP for lookups (rather than just falling back to ?local? from nslcd), but I can?t see a way to do this, and ?manual? seems to stop ADS authentication in Samba. Anyone got any suggestions? Thanks Simon From kallbac at iu.edu Mon Jul 6 23:06:00 2015 From: kallbac at iu.edu (Kallback-Rose, Kristy A) Date: Mon, 6 Jul 2015 22:06:00 +0000 Subject: [gpfsug-discuss] SMB support and config In-Reply-To: References: Message-ID: <0193637F-4448-4068-A049-3A40A9AFD998@iu.edu> Just to chime in as another interested party, we do something fairly similar but use sssd instead of nslcd. Very interested to see how accommodating the IBM Samba is to local configuration needs. Best, Kristy On Jul 6, 2015, at 6:09 AM, Simon Thompson (Research Computing - IT Services) wrote: > Hi, > > (sorry, lots of questions about this stuff at the moment!) > > I?m currently looking at removing the sernet smb configs we had previously > and moving to IBM SMB. I?ve removed all the old packages and only now have > gpfs.smb installed on the systems. > > I?m struggling to get the config tools to work for our environment. > > We have MS Windows AD Domain for authentication. For various reasons, > however doesn?t hold the UIDs/GIDs, which are instead held in a different > LDAP directory. > > In the past, we?d configure the Linux servers running Samba so that NSLCD > was configured to get details from the LDAP server. (e.g. getent passwd > would return the data for an AD user). The Linux boxes would also be > configured to use KRB5 authentication where users were allowed to ssh etc > in for password authentication. > > So as far as Samba was concerned, it would do ?security = ADS? and then > we?d also have "idmap config * : backend = tdb2? > > I.e. Use Domain for authentication, but look locally for ID mapping data. > > Now I can configured IBM SMB to use ADS for authentication: > > mmuserauth service create --type ad --data-access-method file > --netbios-name its-rds --user-name ADMINUSER --servers DOMAIN.ADF > --idmap-role subordinate > > > However I can?t see anyway for me to manipulate the config so that it > doesn?t use autorid. Using this we end up with: > > mmsmb config list | grep -i idmap > idmap config * : backend autorid > idmap config * : range 10000000-299999999 > idmap config * : rangesize 1000000 > idmap config * : read only yes > idmap:cache no > > > It also adds: > > mmsmb config list | grep -i auth > auth methods guest sam winbind > > (though I don?t think that is a problem). > > > I also can?t change the idmap using the mmsmb command (I think would look > like this): > # mmsmb config change --option="idmap config * : backend=tdb2" > idmap config * : backend=tdb2: [E] Unsupported smb option. More > information about smb options is availabe in the man page. > > > > I can?t see anything in the docs at: > http://www-01.ibm.com/support/knowledgecenter/#!/STXKQY_4.1.1/com.ibm.spect > rum.scale.v4r11.adm.doc/bl1adm_configfileauthentication.htm > > That give me a clue how to do what I want. > > I?d be happy to do some mixture of AD for authentication and LDAP for > lookups (rather than just falling back to ?local? from nslcd), but I can?t > see a way to do this, and ?manual? seems to stop ADS authentication in > Samba. > > Anyone got any suggestions? > > > Thanks > > Simon > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at gpfsug.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss From S.J.Thompson at bham.ac.uk Tue Jul 7 12:39:24 2015 From: S.J.Thompson at bham.ac.uk (Simon Thompson (Research Computing - IT Services)) Date: Tue, 7 Jul 2015 11:39:24 +0000 Subject: [gpfsug-discuss] SMB support and config In-Reply-To: <0193637F-4448-4068-A049-3A40A9AFD998@iu.edu> References: <0193637F-4448-4068-A049-3A40A9AFD998@iu.edu> Message-ID: So based on what I?m seeing ... When you run mmstartup, the start process edits /etc/nsswitch.conf. I?ve managed to make it work in my environment, but I had to edit the file /usr/lpp/mmfs/bin/mmcesop to make it put ldap instead of winbind when it starts up. I also had to do some studious use of "net conf delparm? ? Which is probably not a good idea. I did try using: mmuserauth service create --type userdefined --data-access-method file And the setting the "security = ADS? parameters by hand with "net conf? (can?t do it with mmsmb), and a manual ?net ads join" but I couldn?t get it to authenticate clients properly. I can?t work out why just at the moment. But even then when mmshutdown runs, it still goes ahead and edits /etc/nsswitch.conf I?ve got a ticket open with IBM at the moment via our integrator to see what they say. But I?m not sure I like something going off and poking things like /etc/nsswitch.conf at startup/shutdown. I can sorta see that at config time, but when service start etc, I?m not sure I really like that idea! Simon On 06/07/2015 23:06, "Kallback-Rose, Kristy A" wrote: >Just to chime in as another interested party, we do something fairly >similar but use sssd instead of nslcd. Very interested to see how >accommodating the IBM Samba is to local configuration needs. > >Best, >Kristy > >On Jul 6, 2015, at 6:09 AM, Simon Thompson (Research Computing - IT >Services) wrote: > >> Hi, >> >> (sorry, lots of questions about this stuff at the moment!) >> >> I?m currently looking at removing the sernet smb configs we had >>previously >> and moving to IBM SMB. I?ve removed all the old packages and only now >>have >> gpfs.smb installed on the systems. >> >> I?m struggling to get the config tools to work for our environment. >> >> We have MS Windows AD Domain for authentication. For various reasons, >> however doesn?t hold the UIDs/GIDs, which are instead held in a >>different >> LDAP directory. >> >> In the past, we?d configure the Linux servers running Samba so that >>NSLCD >> was configured to get details from the LDAP server. (e.g. getent passwd >> would return the data for an AD user). The Linux boxes would also be >> configured to use KRB5 authentication where users were allowed to ssh >>etc >> in for password authentication. >> >> So as far as Samba was concerned, it would do ?security = ADS? and then >> we?d also have "idmap config * : backend = tdb2? >> >> I.e. Use Domain for authentication, but look locally for ID mapping >>data. >> >> Now I can configured IBM SMB to use ADS for authentication: >> >> mmuserauth service create --type ad --data-access-method file >> --netbios-name its-rds --user-name ADMINUSER --servers DOMAIN.ADF >> --idmap-role subordinate >> >> >> However I can?t see anyway for me to manipulate the config so that it >> doesn?t use autorid. Using this we end up with: >> >> mmsmb config list | grep -i idmap >> idmap config * : backend autorid >> idmap config * : range 10000000-299999999 >> idmap config * : rangesize 1000000 >> idmap config * : read only yes >> idmap:cache no >> >> >> It also adds: >> >> mmsmb config list | grep -i auth >> auth methods guest sam winbind >> >> (though I don?t think that is a problem). >> >> >> I also can?t change the idmap using the mmsmb command (I think would >>look >> like this): >> # mmsmb config change --option="idmap config * : backend=tdb2" >> idmap config * : backend=tdb2: [E] Unsupported smb option. More >> information about smb options is availabe in the man page. >> >> >> >> I can?t see anything in the docs at: >> >>http://www-01.ibm.com/support/knowledgecenter/#!/STXKQY_4.1.1/com.ibm.spe >>ct >> rum.scale.v4r11.adm.doc/bl1adm_configfileauthentication.htm >> >> That give me a clue how to do what I want. >> >> I?d be happy to do some mixture of AD for authentication and LDAP for >> lookups (rather than just falling back to ?local? from nslcd), but I >>can?t >> see a way to do this, and ?manual? seems to stop ADS authentication in >> Samba. >> >> Anyone got any suggestions? >> >> >> Thanks >> >> Simon >> >> >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at gpfsug.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss > >_______________________________________________ >gpfsug-discuss mailing list >gpfsug-discuss at gpfsug.org >http://gpfsug.org/mailman/listinfo/gpfsug-discuss From TROPPENS at de.ibm.com Thu Jul 9 07:55:24 2015 From: TROPPENS at de.ibm.com (Ulf Troppens) Date: Thu, 9 Jul 2015 08:55:24 +0200 Subject: [gpfsug-discuss] ISC 2015 Message-ID: Anybody at ISC 2015 in Frankfurt next week? I am happy to share my experience with supporting four ESP (a.k.a beta) customers of the new protocol feature. You can find me at the IBM booth (Booth 928). -- IBM Spectrum Scale Development - Client Engagements & Solutions Delivery Consulting IT Specialist Author "Storage Networks Explained" IBM Deutschland Research & Development GmbH Vorsitzende des Aufsichtsrats: Martina Koederitz Gesch?ftsf?hrung: Dirk Wittkopp Sitz der Gesellschaft: B?blingen / Registergericht: Amtsgericht Stuttgart, HRB 243294 From daniel.kidger at uk.ibm.com Thu Jul 9 09:12:51 2015 From: daniel.kidger at uk.ibm.com (Daniel Kidger) Date: Thu, 9 Jul 2015 09:12:51 +0100 Subject: [gpfsug-discuss] ISC 2015 In-Reply-To: Message-ID: <1970894201.4637011436429559512.JavaMail.notes@d06wgw86.portsmouth.uk.ibm.com> Ulf, I am certainly interested. You can find me on the IBM booth too :-) Looking forward to meeting you. Daniel Sent from IBM Verse Ulf Troppens --- [gpfsug-discuss] ISC 2015 --- From:"Ulf Troppens" To:"gpfsug main discussion list" Date:Thu, 9 Jul 2015 08:55Subject:[gpfsug-discuss] ISC 2015 Anybody at ISC 2015 in Frankfurt next week? I am happy to share my experience with supporting four ESP (a.k.a beta) customers of the new protocol feature. You can find me at the IBM booth (Booth 928). -- IBM Spectrum Scale Development - Client Engagements & Solutions Delivery Consulting IT Specialist Author "Storage Networks Explained" IBM Deutschland Research & Development GmbH Vorsitzende des Aufsichtsrats: Martina Koederitz Gesch?ftsf?hrung: Dirk Wittkopp Sitz der Gesellschaft: B?blingen / Registergericht: Amtsgericht Stuttgart, HRB 243294 _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From ewahl at osc.edu Thu Jul 9 15:56:42 2015 From: ewahl at osc.edu (Wahl, Edward) Date: Thu, 9 Jul 2015 14:56:42 +0000 Subject: [gpfsug-discuss] 4.1.1 protocol support In-Reply-To: References: , Message-ID: <9DA9EC7A281AC7428A9618AFDC49049955A5DCC4@CIO-KRC-D1MBX02.osuad.osu.edu> Please please please please PLEASE tell me that support for RHEL 6 is in the plan for protocol nodes. Forcing us to 7 seems rather VERY premature. been out sick a week so I just saw this, FYI. I'd sell my co-workers to test out protocol nodes, but frankly NOT on RHEL 7. Definitely NOT an HPC ready release. ugh. Ed Wahl OSC ________________________________ From: gpfsug-discuss-bounces at gpfsug.org [gpfsug-discuss-bounces at gpfsug.org] on behalf of Bob Oesterlin [oester at gmail.com] Sent: Thursday, July 02, 2015 12:02 PM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] 4.1.1 protocol support Hi Simon I was part of a beta program for GPFS (ok, better start saying Spectrum Scale!) 4.1.1, so I've had some experience with the toolkit that installs the protocol nodes. The new protocol nodes MUST be RH7, so it's going to be a bit more of an involved process to migrate to this level than in the past. The GPFS server nodes/client nodes can remain at RH6 is needed. Overall it works pretty well. You do have the option of doing things manually as well. The guide that describes it is pretty good. If you want to discuss the process in detail, I'd be happy to do so - a bit too much to cover over a mailing list. Bob Oesterlin Sr Storage Engineer, Nuance Communications robert.oesterlin at nuance.com On Thu, Jul 2, 2015 at 10:43 AM, Simon Thompson (Research Computing - IT Services) > wrote: Hi, Just wondering if anyone has looked at the new protocol support stuff in 4.1.1 yet? >From what I can see, it wants to use the installer to add things like IBM Samba onto nodes in the cluster. The docs online seem to list manual installation as running the chef template, which is hardly manual... 1. Id like to know what is being run on my cluster 2. Its an existing install which was using sernet samba, so I don't want to go out and break anything inadvertently 3. My protocol nodes are in a multicluster, and I understand the installer doesn't support multicluster. (the docs state that multicluster isn't supported but something like its expected to work). So... Has anyone had a go at this yet and have a set of steps? I've started unpicking the chef recipe, but just wondering if anyone had already had a go at this? (and lets not start on the mildy bemusing error when you "enable" the service with "mmces service enable" (ces service not enabled) - there's other stuff to enable it)... Simon _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From sdinardo at ebi.ac.uk Fri Jul 10 11:07:28 2015 From: sdinardo at ebi.ac.uk (Salvatore Di Nardo) Date: Fri, 10 Jul 2015 11:07:28 +0100 Subject: [gpfsug-discuss] data interface and management infercace. Message-ID: <559F9960.7010509@ebi.ac.uk> Hello guys. Quite a while ago i mentioned that we have a big expel issue on our gss ( first gen) and white a lot people suggested that the root cause could be that we use the same interface for all the traffic, and that we should split the data network from the admin network. Finally we could plan a downtime and we are migrating the data out so, i can soon safelly play with the change, but looking what exactly i should to do i'm a bit puzzled. Our mmlscluster looks like this: GPFS cluster information ======================== GPFS cluster name: GSS.ebi.ac.uk GPFS cluster id: 17987981184946329605 GPFS UID domain: GSS.ebi.ac.uk Remote shell command: /usr/bin/ssh Remote file copy command: /usr/bin/scp GPFS cluster configuration servers: ----------------------------------- Primary server: gss01a.ebi.ac.uk Secondary server: gss02b.ebi.ac.uk Node Daemon node name IP address Admin node name Designation ----------------------------------------------------------------------- 1 gss01a.ebi.ac.uk 10.7.28.2 gss01a.ebi.ac.uk quorum-manager 2 gss01b.ebi.ac.uk 10.7.28.3 gss01b.ebi.ac.uk quorum-manager 3 gss02a.ebi.ac.uk 10.7.28.67 gss02a.ebi.ac.uk quorum-manager 4 gss02b.ebi.ac.uk 10.7.28.66 gss02b.ebi.ac.uk quorum-manager 5 gss03a.ebi.ac.uk 10.7.28.34 gss03a.ebi.ac.uk quorum-manager 6 gss03b.ebi.ac.uk 10.7.28.35 gss03b.ebi.ac.uk quorum-manager It was my understanding that the "admin node" should use a different interface ( a 1g link copper should be fine), while the daemon node is where the data was passing , so should point to the bonded 10g interfaces. but when i read the mmchnode man page i start to be quite confused. It says: --daemon-interface={hostname | ip_address} Specifies the host name or IP address _*to be used by the GPFS daemons for node-to-node communication*_. The host name or IP address must refer to the communication adapter over which the GPFS daemons communicate. Alias interfaces are not allowed. Use the original address or a name that is resolved by the host command to that original address. --admin-interface={hostname | ip_address} Specifies the name of the node to be used by GPFS administration commands when communicating between nodes. The admin node name must be specified as an IP address or a hostname that is resolved by the host command tothe desired IP address. If the keyword DEFAULT is specified, the admin interface for the node is set to be equal to the daemon interface for the node. What exactly means "node-to node-communications" ? Means DATA or also the "lease renew", and the token communication between the clients to get/steal the locks to be able to manage concurrent write to thr same file? Since we are getting expells ( especially when several clients contends the same file ) i assumed i have to split this type of packages from the data stream, but reading the documentation it looks to me that those internal comunication between nodes use the daemon-interface wich i suppose are used also for the data. so HOW exactly i can split them? _**_ Thanks in advance, Salvatore -------------- next part -------------- An HTML attachment was scrubbed... URL: From secretary at gpfsug.org Fri Jul 10 12:33:48 2015 From: secretary at gpfsug.org (Secretary GPFS UG) Date: Fri, 10 Jul 2015 12:33:48 +0100 Subject: [gpfsug-discuss] Places available: Meet the Devs Message-ID: Dear All, There are a couple of places remaining at the next 'Meet the Devs' event on Wednesday 29th July, 11am-3pm. The event is being held at IBM Warwick. The agenda promises to be hands on and give you the opportunity to speak face to face with the developers of GPFS. Guideline agenda: * Data analytic workloads - development to show and tell UK work on establishing use cases and tighter integration of Spark on top of GPFS * Show the GUI coming in 4.2 * Discuss 4.2 and beyond roadmap * How would you like IP management to work for protocol access? * Optional - Team can demo & discuss NFS/SMB/Object integration into Scale Lunch and refreshments will be provided. Please can you let me know by email if you are interested in attending and I'll register your place. Thanks and we hope to see you there! -- Claire O'Toole (n?e Robson) GPFS User Group Secretary +44 (0)7508 033896 www.gpfsug.org From S.J.Thompson at bham.ac.uk Fri Jul 10 12:59:19 2015 From: S.J.Thompson at bham.ac.uk (Simon Thompson (Research Computing - IT Services)) Date: Fri, 10 Jul 2015 11:59:19 +0000 Subject: [gpfsug-discuss] 4.1.1 protocol support In-Reply-To: <9DA9EC7A281AC7428A9618AFDC49049955A5DCC4@CIO-KRC-D1MBX02.osuad.osu.edu> References: <9DA9EC7A281AC7428A9618AFDC49049955A5DCC4@CIO-KRC-D1MBX02.osuad.osu.edu> Message-ID: Hi Ed, Well, technically: http://www-01.ibm.com/support/knowledgecenter/#!/STXKQY_4.1.1/com.ibm.spectrum.scale.v4r11.ins.doc/bl1ins_protocolsprerequisites.htm Says "The spectrumscale installation toolkit supports Red Hat Enterprise Linux 7.0 and 7.1 platforms on x86_64 and ppc64 architectures" So maybe if you don?t want to use the installer, you don't need RHEL 7. Of course where or not that is supported, only IBM would be able to say ? I?ve only looked at gpfs.smb, but as its provided as a binary RPM, it might or might not work in a 6 environment (it bundles ctdb etc all in). For object, as its a bundle of openstack RPMs, then potentially it won?t work on EL6 depending on the python requirements? And surely you aren?t running protocol support on HPC nodes anyway ... so maybe a few EL7 nodes could work for you? Simon From: , Edward > Reply-To: gpfsug main discussion list > Date: Thursday, 9 July 2015 15:56 To: gpfsug main discussion list > Subject: Re: [gpfsug-discuss] 4.1.1 protocol support Please please please please PLEASE tell me that support for RHEL 6 is in the plan for protocol nodes. Forcing us to 7 seems rather VERY premature. been out sick a week so I just saw this, FYI. I'd sell my co-workers to test out protocol nodes, but frankly NOT on RHEL 7. Definitely NOT an HPC ready release. ugh. Ed Wahl OSC ________________________________ From: gpfsug-discuss-bounces at gpfsug.org [gpfsug-discuss-bounces at gpfsug.org] on behalf of Bob Oesterlin [oester at gmail.com] Sent: Thursday, July 02, 2015 12:02 PM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] 4.1.1 protocol support Hi Simon I was part of a beta program for GPFS (ok, better start saying Spectrum Scale!) 4.1.1, so I've had some experience with the toolkit that installs the protocol nodes. The new protocol nodes MUST be RH7, so it's going to be a bit more of an involved process to migrate to this level than in the past. The GPFS server nodes/client nodes can remain at RH6 is needed. Overall it works pretty well. You do have the option of doing things manually as well. The guide that describes it is pretty good. If you want to discuss the process in detail, I'd be happy to do so - a bit too much to cover over a mailing list. Bob Oesterlin Sr Storage Engineer, Nuance Communications robert.oesterlin at nuance.com On Thu, Jul 2, 2015 at 10:43 AM, Simon Thompson (Research Computing - IT Services) > wrote: Hi, Just wondering if anyone has looked at the new protocol support stuff in 4.1.1 yet? >From what I can see, it wants to use the installer to add things like IBM Samba onto nodes in the cluster. The docs online seem to list manual installation as running the chef template, which is hardly manual... 1. Id like to know what is being run on my cluster 2. Its an existing install which was using sernet samba, so I don't want to go out and break anything inadvertently 3. My protocol nodes are in a multicluster, and I understand the installer doesn't support multicluster. (the docs state that multicluster isn't supported but something like its expected to work). So... Has anyone had a go at this yet and have a set of steps? I've started unpicking the chef recipe, but just wondering if anyone had already had a go at this? (and lets not start on the mildy bemusing error when you "enable" the service with "mmces service enable" (ces service not enabled) - there's other stuff to enable it)... Simon _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From S.J.Thompson at bham.ac.uk Fri Jul 10 13:06:01 2015 From: S.J.Thompson at bham.ac.uk (Simon Thompson (Research Computing - IT Services)) Date: Fri, 10 Jul 2015 12:06:01 +0000 Subject: [gpfsug-discuss] SMB support and config In-Reply-To: <0193637F-4448-4068-A049-3A40A9AFD998@iu.edu> References: <0193637F-4448-4068-A049-3A40A9AFD998@iu.edu> Message-ID: So IBM came back and said what I was doing wasn?t supported. They did say that you can use ?user defined? authentication. Which I?ve got working now on my environment (figured what I was doing wrong, and you can?t use mmsmb to do some of the bits I need for it to work for user defined mode for me...). But I still think it needs a patch to one of the files for CES for use in user defined authentication. (Right now it appears to remove all my ?user defined? settings from nsswitch.conf when you stop CES/GPFS on a node). I?ve supplied my patch to IBM which works for my case, we?ll see what they do about it? (If people are interested, I?ll gather my notes into a blog post). Simon On 06/07/2015 23:06, "Kallback-Rose, Kristy A" wrote: >Just to chime in as another interested party, we do something fairly >similar but use sssd instead of nslcd. Very interested to see how >accommodating the IBM Samba is to local configuration needs. > >Best, >Kristy > >On Jul 6, 2015, at 6:09 AM, Simon Thompson (Research Computing - IT >Services) wrote: > >> Hi, >> >> (sorry, lots of questions about this stuff at the moment!) >> >> I?m currently looking at removing the sernet smb configs we had >>previously >> and moving to IBM SMB. I?ve removed all the old packages and only now >>have >> gpfs.smb installed on the systems. >> >> I?m struggling to get the config tools to work for our environment. >> >> We have MS Windows AD Domain for authentication. For various reasons, >> however doesn?t hold the UIDs/GIDs, which are instead held in a >>different >> LDAP directory. >> >> In the past, we?d configure the Linux servers running Samba so that >>NSLCD >> was configured to get details from the LDAP server. (e.g. getent passwd >> would return the data for an AD user). The Linux boxes would also be >> configured to use KRB5 authentication where users were allowed to ssh >>etc >> in for password authentication. >> >> So as far as Samba was concerned, it would do ?security = ADS? and then >> we?d also have "idmap config * : backend = tdb2? >> >> I.e. Use Domain for authentication, but look locally for ID mapping >>data. >> >> Now I can configured IBM SMB to use ADS for authentication: >> >> mmuserauth service create --type ad --data-access-method file >> --netbios-name its-rds --user-name ADMINUSER --servers DOMAIN.ADF >> --idmap-role subordinate >> >> >> However I can?t see anyway for me to manipulate the config so that it >> doesn?t use autorid. Using this we end up with: >> >> mmsmb config list | grep -i idmap >> idmap config * : backend autorid >> idmap config * : range 10000000-299999999 >> idmap config * : rangesize 1000000 >> idmap config * : read only yes >> idmap:cache no >> >> >> It also adds: >> >> mmsmb config list | grep -i auth >> auth methods guest sam winbind >> >> (though I don?t think that is a problem). >> >> >> I also can?t change the idmap using the mmsmb command (I think would >>look >> like this): >> # mmsmb config change --option="idmap config * : backend=tdb2" >> idmap config * : backend=tdb2: [E] Unsupported smb option. More >> information about smb options is availabe in the man page. >> >> >> >> I can?t see anything in the docs at: >> >>http://www-01.ibm.com/support/knowledgecenter/#!/STXKQY_4.1.1/com.ibm.spe >>ct >> rum.scale.v4r11.adm.doc/bl1adm_configfileauthentication.htm >> >> That give me a clue how to do what I want. >> >> I?d be happy to do some mixture of AD for authentication and LDAP for >> lookups (rather than just falling back to ?local? from nslcd), but I >>can?t >> see a way to do this, and ?manual? seems to stop ADS authentication in >> Samba. >> >> Anyone got any suggestions? >> >> >> Thanks >> >> Simon >> >> >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at gpfsug.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss > >_______________________________________________ >gpfsug-discuss mailing list >gpfsug-discuss at gpfsug.org >http://gpfsug.org/mailman/listinfo/gpfsug-discuss From Daniel.Vogel at abcsystems.ch Fri Jul 10 15:19:11 2015 From: Daniel.Vogel at abcsystems.ch (Daniel Vogel) Date: Fri, 10 Jul 2015 14:19:11 +0000 Subject: [gpfsug-discuss] GPFS 4.1.1 without QoS for mmrestripefs? In-Reply-To: <201507032249.t63Mnffp025995@d03av03.boulder.ibm.com> References: <2CDF270206A255459AC4FA6B08E52AF90114634DD0@ABCSYSEXC1.abcsystems.ch><201507011422.t61EMZmw011626@d01av01.pok.ibm.com> <2CDF270206A255459AC4FA6B08E52AF901146351BD@ABCSYSEXC1.abcsystems.ch> <201507032249.t63Mnffp025995@d03av03.boulder.ibm.com> Message-ID: <2CDF270206A255459AC4FA6B08E52AF90114635E8E@ABCSYSEXC1.abcsystems.ch> For ?1? we use the quorum node to do ?start disk? or ?restripe file system? (quorum node without disks). For ?2? we use kernel NFS with cNFS I used the command ?cnfsNFSDprocs 64? to set the NFS threads. Is this correct? gpfs01:~ # cat /proc/fs/nfsd/threads 64 I will verify the settings in our lab, will use the following configuration: mmchconfig worker1Threads=128 mmchconfig prefetchThreads=128 mmchconfig nsdMaxWorkerThreads=128 mmchconfig cnfsNFSDprocs=256 daniel Von: gpfsug-discuss-bounces at gpfsug.org [mailto:gpfsug-discuss-bounces at gpfsug.org] Im Auftrag von Sven Oehme Gesendet: Samstag, 4. Juli 2015 00:49 An: gpfsug main discussion list Betreff: Re: [gpfsug-discuss] GPFS 4.1.1 without QoS for mmrestripefs? this triggers a few questions 1. have you tried running it only on a node that doesn't serve NFS data ? 2. what NFS stack are you using ? is this the kernel NFS Server as part of linux means you use cNFS ? if the answer to 2 is yes, have you adjusted the nfsd threads in /etc/sysconfig/nfs ? the default is only 8 and if you run with the default you have a very low number of threads from the outside competing with a larger number of threads doing restripe, increasing the nfsd threads could help. you could also reduce the number of internal restripe threads to try out if that helps mitigating the impact. to try an extreme low value set the following : mmchconfig pitWorkerThreadsPerNode=1 -i and retry the restripe again, to reset it back to default run mmchconfig pitWorkerThreadsPerNode=DEFAULT -i sven ------------------------------------------ Sven Oehme Scalable Storage Research email: oehmes at us.ibm.com Phone: +1 (408) 824-8904 IBM Almaden Research Lab ------------------------------------------ [Beschreibung: Inactive hide details for Daniel Vogel ---07/02/2015 12:12:46 AM---Sven, Yes I agree, but ?using ?N? to reduce the load help]Daniel Vogel ---07/02/2015 12:12:46 AM---Sven, Yes I agree, but ?using ?N? to reduce the load helps not really. If I use NFS, for example, as From: Daniel Vogel > To: "'gpfsug main discussion list'" > Date: 07/02/2015 12:12 AM Subject: Re: [gpfsug-discuss] GPFS 4.1.1 without QoS for mmrestripefs? Sent by: gpfsug-discuss-bounces at gpfsug.org ________________________________ Sven, Yes I agree, but ?using ?N? to reduce the load helps not really. If I use NFS, for example, as a ESX data store, ESX I/O latency for NFS goes very high, the VM?s hangs. By the way I use SSD PCIe cards, perfect ?mirror speed? but slow I/O on NFS. The GPFS cluster concept I use are different than GSS or traditional FC (shared storage). I use shared nothing with IB (no FPO), many GPFS nodes with NSD?s. I know the need to resync the FS with mmchdisk / mmrestripe will happen more often. The only one feature will help is QoS for the GPFS admin jobs. I hope we are not fare away from this. Thanks, Daniel Von: gpfsug-discuss-bounces at gpfsug.org [mailto:gpfsug-discuss-bounces at gpfsug.org] Im Auftrag von Sven Oehme Gesendet: Mittwoch, 1. Juli 2015 16:21 An: gpfsug main discussion list Betreff: Re: [gpfsug-discuss] GPFS 4.1.1 without QoS for mmrestripefs? Daniel, as you know, we can't discuss future / confidential items on a mailing list. what i presented as an outlook to future releases hasn't changed from a technical standpoint, we just can't share a release date until we announce it official. there are multiple ways today to limit the impact on restripe and other tasks, the best way to do this is to run the task ( using -N) on a node (or very small number of nodes) that has no performance critical role. while this is not perfect, it should limit the impact significantly. . sven ------------------------------------------ Sven Oehme Scalable Storage Research email: oehmes at us.ibm.com Phone: +1 (408) 824-8904 IBM Almaden Research Lab ------------------------------------------ [Beschreibung: Inactive hide details for Daniel Vogel ---07/01/2015 03:29:11 AM---Hi Years ago, IBM made some plan to do a implementation "QoS]Daniel Vogel ---07/01/2015 03:29:11 AM---Hi Years ago, IBM made some plan to do a implementation "QoS for mmrestripefs, mmdeldisk...". If a " From: Daniel Vogel > To: "'gpfsug-discuss at gpfsug.org'" > Date: 07/01/2015 03:29 AM Subject: [gpfsug-discuss] GPFS 4.1.1 without QoS for mmrestripefs? Sent by: gpfsug-discuss-bounces at gpfsug.org ________________________________ Hi Years ago, IBM made some plan to do a implementation ?QoS for mmrestripefs, mmdeldisk??. If a ?mmfsrestripe? is running, very poor performance for NFS access. I opened a PMR to ask for QoS in version 4.1.1 (Spectrum Scale). PMR 61309,113,848: I discussed the question of QOS with the development team. These command changes that were noticed are not meant to be used as GA code which is why they are not documented. I cannot provide any further information from the support perspective. Anybody knows about QoS? The last hope was at ?GPFS Workshop Stuttgart M?rz 2015? with Sven Oehme as speaker. Daniel Vogel IT Consultant ABC SYSTEMS AG Hauptsitz Z?rich R?tistrasse 28 CH - 8952 Schlieren T +41 43 433 6 433 D +41 43 433 6 467 http://www.abcsystems.ch ABC - Always Better Concepts. Approved By Customers since 1981. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image001.gif Type: image/gif Size: 105 bytes Desc: image001.gif URL: From Kevin.Buterbaugh at Vanderbilt.Edu Fri Jul 10 15:56:04 2015 From: Kevin.Buterbaugh at Vanderbilt.Edu (Buterbaugh, Kevin L) Date: Fri, 10 Jul 2015 14:56:04 +0000 Subject: [gpfsug-discuss] Fwd: GPFS 4.1, NFSv4, and authenticating against AD References: <69C83493-2E22-4B11-BF15-A276DA6D4901@vanderbilt.edu> Message-ID: <55426129-67A0-4071-91F4-715BAC1F0DBE@vanderbilt.edu> Begin forwarded message: From: buterbkl > Subject: GPFS 4.1, NFSv4, and authenticating against AD Date: July 10, 2015 at 9:52:38 AM CDT To: gpfs-general at sdsc.edu Hi All, We are under the (hopefully not mistaken) impression that with GPFS 4.1 supporting NFSv4 it should be possible to have a CNFS setup authenticate against Active Directory as long as you use NFSv4. I also thought that I had seen somewhere (possibly one of the two GPFS related mailing lists I?m on, or in a DeveloperWorks article, or ???) that IBM has published documentation on how to set this up (a kind of cookbook). I?ve done a fair amount of Googling looking for such a document, but I seem to be uniquely talented in not being able to find things with Google! :-( Does anyone know of such a document and could send me the link to it? It would be very helpful to us as I?ve got essentially zero experience with Kerberos (which I think is required to talk to AD) and the institutions? AD environment is managed by a separate department. Thanks in advance? Kevin ? Kevin Buterbaugh - Senior System Administrator Vanderbilt University - Advanced Computing Center for Research and Education Kevin.Buterbaugh at vanderbilt.edu - (615)875-9633 -------------- next part -------------- An HTML attachment was scrubbed... URL: From sdinardo at ebi.ac.uk Mon Jul 13 13:31:18 2015 From: sdinardo at ebi.ac.uk (Salvatore Di Nardo) Date: Mon, 13 Jul 2015 13:31:18 +0100 Subject: [gpfsug-discuss] data interface and management infercace. In-Reply-To: <559F9960.7010509@ebi.ac.uk> References: <559F9960.7010509@ebi.ac.uk> Message-ID: <55A3AF96.3060303@ebi.ac.uk> Anyone? On 10/07/15 11:07, Salvatore Di Nardo wrote: > Hello guys. > Quite a while ago i mentioned that we have a big expel issue on our > gss ( first gen) and white a lot people suggested that the root cause > could be that we use the same interface for all the traffic, and that > we should split the data network from the admin network. Finally we > could plan a downtime and we are migrating the data out so, i can soon > safelly play with the change, but looking what exactly i should to do > i'm a bit puzzled. Our mmlscluster looks like this: > > GPFS cluster information > ======================== > GPFS cluster name: GSS.ebi.ac.uk > GPFS cluster id: 17987981184946329605 > GPFS UID domain: GSS.ebi.ac.uk > Remote shell command: /usr/bin/ssh > Remote file copy command: /usr/bin/scp > > GPFS cluster configuration servers: > ----------------------------------- > Primary server: gss01a.ebi.ac.uk > Secondary server: gss02b.ebi.ac.uk > > Node Daemon node name IP address Admin node name > Designation > ----------------------------------------------------------------------- > 1 gss01a.ebi.ac.uk 10.7.28.2 gss01a.ebi.ac.uk > quorum-manager > 2 gss01b.ebi.ac.uk 10.7.28.3 gss01b.ebi.ac.uk > quorum-manager > 3 gss02a.ebi.ac.uk 10.7.28.67 gss02a.ebi.ac.uk > quorum-manager > 4 gss02b.ebi.ac.uk 10.7.28.66 gss02b.ebi.ac.uk > quorum-manager > 5 gss03a.ebi.ac.uk 10.7.28.34 gss03a.ebi.ac.uk > quorum-manager > 6 gss03b.ebi.ac.uk 10.7.28.35 gss03b.ebi.ac.uk > quorum-manager > > > It was my understanding that the "admin node" should use a different > interface ( a 1g link copper should be fine), while the daemon node is > where the data was passing , so should point to the bonded 10g > interfaces. but when i read the mmchnode man page i start to be quite > confused. It says: > > --daemon-interface={hostname | ip_address} > Specifies the host name or IP address _*to > be used by the GPFS daemons for node-to-node communication*_. The > host name or IP address must refer to the communication adapter over > which the GPFS daemons communicate. > Alias interfaces are not allowed. Use the > original address or a name that is resolved by the host command to > that original address. > > --admin-interface={hostname | ip_address} > Specifies the name of the node to be used by > GPFS administration commands when communicating between nodes. The > admin node name must be specified as an IP address or a hostname that > is resolved by the host command > tothe desired IP address. If the keyword > DEFAULT is specified, the admin interface for the node is set to be > equal to the daemon interface for the node. > > What exactly means "node-to node-communications" ? > Means DATA or also the "lease renew", and the token communication > between the clients to get/steal the locks to be able to manage > concurrent write to thr same file? > Since we are getting expells ( especially when several clients > contends the same file ) i assumed i have to split this type of > packages from the data stream, but reading the documentation it looks > to me that those internal comunication between nodes use the > daemon-interface wich i suppose are used also for the data. so HOW > exactly i can split them? > > > Thanks in advance, > Salvatore > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at gpfsug.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From sdinardo at ebi.ac.uk Mon Jul 13 14:29:50 2015 From: sdinardo at ebi.ac.uk (Salvatore Di Nardo) Date: Mon, 13 Jul 2015 14:29:50 +0100 Subject: [gpfsug-discuss] data interface and management infercace. In-Reply-To: <5269A4E9-416B-4D70-AAE0-B86042FC96B9@ddn.com> References: <559F9960.7010509@ebi.ac.uk> <55A3AF96.3060303@ebi.ac.uk> <5269A4E9-416B-4D70-AAE0-B86042FC96B9@ddn.com> Message-ID: <55A3BD4E.3000205@ebi.ac.uk> Hello Vic. We are currently draining our gpfs to do all the recabling to add a management network, but looking what the admin interface does ( man mmchnode ) it says something different: --admin-interface={hostname | ip_address} Specifies the name of the node to be used by GPFS administration commands when communicating between nodes. The admin node name must be specified as an IP address or a hostname that is resolved by the host command to the desired IP address. If the keyword DEFAULT is specified, the admin interface for the node is set to be equal to the daemon interface for the node. So, seems used only for commands propagation, hence have nothing to do with the node-to-node traffic. Infact the other interface description is: --daemon-interface={hostname | ip_address} Specifies the host name or IP address _*to be used by the GPFS daemons for node-to-node communication*_. The host name or IP address must refer to the commu- nication adapter over which the GPFS daemons communicate. Alias interfaces are not allowed. Use the original address or a name that is resolved by the host command to that original address. The "expired lease" issue and file locking mechanism a( most of our expells happens when 2 clients try to write in the same file) are exactly node-to node-comunication, so im wondering what's the point to separate the "admin network". I want to be sure to plan the right changes before we do a so massive task. We are talking about adding a new interface on 700 clients, so the recabling work its not small. Regards, Salvatore On 13/07/15 14:00, Vic Cornell wrote: > Hi Salavatore, > > Does your GSS have the facility for a 1GbE ?management? network? If so > I think that changing the ?admin? node names of the cluster members to > a set of IPs on the management network would give you the split that > you need. > > What about the clients? Can they also connect to a separate admin network? > > Remember that if you are using multi-cluster all of the nodes in both > networks must share the same admin network. > > Kind Regards, > > Vic > > >> On 13 Jul 2015, at 13:31, Salvatore Di Nardo > > wrote: >> >> Anyone? >> >> On 10/07/15 11:07, Salvatore Di Nardo wrote: >>> Hello guys. >>> Quite a while ago i mentioned that we have a big expel issue on our >>> gss ( first gen) and white a lot people suggested that the root >>> cause could be that we use the same interface for all the traffic, >>> and that we should split the data network from the admin network. >>> Finally we could plan a downtime and we are migrating the data out >>> so, i can soon safelly play with the change, but looking what >>> exactly i should to do i'm a bit puzzled. Our mmlscluster looks like >>> this: >>> >>> GPFS cluster information >>> ======================== >>> GPFS cluster name: GSS.ebi.ac.uk >>> GPFS cluster id: 17987981184946329605 >>> GPFS UID domain: GSS.ebi.ac.uk >>> Remote shell command: /usr/bin/ssh >>> Remote file copy command: /usr/bin/scp >>> >>> GPFS cluster configuration servers: >>> ----------------------------------- >>> Primary server: gss01a.ebi.ac.uk >>> Secondary server: gss02b.ebi.ac.uk >>> >>> Node Daemon node name IP address Admin node >>> name Designation >>> ----------------------------------------------------------------------- >>> 1 gss01a.ebi.ac.uk >>> 10.7.28.2 gss01a.ebi.ac.uk >>> quorum-manager >>> 2 gss01b.ebi.ac.uk >>> 10.7.28.3 gss01b.ebi.ac.uk >>> quorum-manager >>> 3 gss02a.ebi.ac.uk >>> 10.7.28.67 gss02a.ebi.ac.uk >>> quorum-manager >>> 4 gss02b.ebi.ac.uk >>> 10.7.28.66 gss02b.ebi.ac.uk >>> quorum-manager >>> 5 gss03a.ebi.ac.uk >>> 10.7.28.34 gss03a.ebi.ac.uk >>> quorum-manager >>> 6 gss03b.ebi.ac.uk >>> 10.7.28.35 gss03b.ebi.ac.uk >>> quorum-manager >>> >>> >>> It was my understanding that the "admin node" should use a different >>> interface ( a 1g link copper should be fine), while the daemon node >>> is where the data was passing , so should point to the bonded 10g >>> interfaces. but when i read the mmchnode man page i start to be >>> quite confused. It says: >>> >>> --daemon-interface={hostname | ip_address} >>> Specifies the host name or IP address >>> _*to be used by the GPFS daemons for node-to-node communication*_. >>> The host name or IP address must refer to the communication adapter >>> over which the GPFS daemons communicate. >>> Alias interfaces are not allowed. Use the >>> original address or a name that is resolved by the host command to >>> that original address. >>> >>> --admin-interface={hostname | ip_address} >>> Specifies the name of the node to be used >>> by GPFS administration commands when communicating between nodes. >>> The admin node name must be specified as an IP address or a hostname >>> that is resolved by the host command >>> tothe desired IP address. If the keyword >>> DEFAULT is specified, the admin interface for the node is set to be >>> equal to the daemon interface for the node. >>> >>> What exactly means "node-to node-communications" ? >>> Means DATA or also the "lease renew", and the token communication >>> between the clients to get/steal the locks to be able to manage >>> concurrent write to thr same file? >>> Since we are getting expells ( especially when several clients >>> contends the same file ) i assumed i have to split this type of >>> packages from the data stream, but reading the documentation it >>> looks to me that those internal comunication between nodes use the >>> daemon-interface wich i suppose are used also for the data. so HOW >>> exactly i can split them? >>> >>> >>> Thanks in advance, >>> Salvatore >>> >>> >>> >>> _______________________________________________ >>> gpfsug-discuss mailing list >>> gpfsug-discuss atgpfsug.org >>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >> >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at gpfsug.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: From viccornell at gmail.com Mon Jul 13 15:25:32 2015 From: viccornell at gmail.com (Vic Cornell) Date: Mon, 13 Jul 2015 15:25:32 +0100 Subject: [gpfsug-discuss] data interface and management infercace. In-Reply-To: <55A3BD4E.3000205@ebi.ac.uk> References: <559F9960.7010509@ebi.ac.uk> <55A3AF96.3060303@ebi.ac.uk> <5269A4E9-416B-4D70-AAE0-B86042FC96B9@ddn.com> <55A3BD4E.3000205@ebi.ac.uk> Message-ID: Hi Salvatore, I agree that that is what the manual - and some of the wiki entries say. However , when we have had problems (typically congestion) with ethernet networks in the past (20GbE or 40GbE) we have resolved them by setting up a separate ?Admin? network. The before and after cluster health we have seen measured in number of expels and waiters has been very marked. Maybe someone ?in the know? could comment on this split. Regards, Vic > On 13 Jul 2015, at 14:29, Salvatore Di Nardo wrote: > > Hello Vic. > We are currently draining our gpfs to do all the recabling to add a management network, but looking what the admin interface does ( man mmchnode ) it says something different: > > --admin-interface={hostname | ip_address} > Specifies the name of the node to be used by GPFS administration commands when communicating between nodes. The admin node name must be specified as an IP > address or a hostname that is resolved by the host command to the desired IP address. If the keyword DEFAULT is specified, the admin interface for the > node is set to be equal to the daemon interface for the node. > > So, seems used only for commands propagation, hence have nothing to do with the node-to-node traffic. Infact the other interface description is: > > --daemon-interface={hostname | ip_address} > Specifies the host name or IP address to be used by the GPFS daemons for node-to-node communication. The host name or IP address must refer to the commu- > nication adapter over which the GPFS daemons communicate. Alias interfaces are not allowed. Use the original address or a name that is resolved by the > host command to that original address. > > The "expired lease" issue and file locking mechanism a( most of our expells happens when 2 clients try to write in the same file) are exactly node-to node-comunication, so im wondering what's the point to separate the "admin network". I want to be sure to plan the right changes before we do a so massive task. We are talking about adding a new interface on 700 clients, so the recabling work its not small. > > > Regards, > Salvatore > > > > On 13/07/15 14:00, Vic Cornell wrote: >> Hi Salavatore, >> >> Does your GSS have the facility for a 1GbE ?management? network? If so I think that changing the ?admin? node names of the cluster members to a set of IPs on the management network would give you the split that you need. >> >> What about the clients? Can they also connect to a separate admin network? >> >> Remember that if you are using multi-cluster all of the nodes in both networks must share the same admin network. >> >> Kind Regards, >> >> Vic >> >> >>> On 13 Jul 2015, at 13:31, Salvatore Di Nardo > wrote: >>> >>> Anyone? >>> >>> On 10/07/15 11:07, Salvatore Di Nardo wrote: >>>> Hello guys. >>>> Quite a while ago i mentioned that we have a big expel issue on our gss ( first gen) and white a lot people suggested that the root cause could be that we use the same interface for all the traffic, and that we should split the data network from the admin network. Finally we could plan a downtime and we are migrating the data out so, i can soon safelly play with the change, but looking what exactly i should to do i'm a bit puzzled. Our mmlscluster looks like this: >>>> >>>> GPFS cluster information >>>> ======================== >>>> GPFS cluster name: GSS.ebi.ac.uk >>>> GPFS cluster id: 17987981184946329605 >>>> GPFS UID domain: GSS.ebi.ac.uk >>>> Remote shell command: /usr/bin/ssh >>>> Remote file copy command: /usr/bin/scp >>>> >>>> GPFS cluster configuration servers: >>>> ----------------------------------- >>>> Primary server: gss01a.ebi.ac.uk >>>> Secondary server: gss02b.ebi.ac.uk >>>> >>>> Node Daemon node name IP address Admin node name Designation >>>> ----------------------------------------------------------------------- >>>> 1 gss01a.ebi.ac.uk 10.7.28.2 gss01a.ebi.ac.uk quorum-manager >>>> 2 gss01b.ebi.ac.uk 10.7.28.3 gss01b.ebi.ac.uk quorum-manager >>>> 3 gss02a.ebi.ac.uk 10.7.28.67 gss02a.ebi.ac.uk quorum-manager >>>> 4 gss02b.ebi.ac.uk 10.7.28.66 gss02b.ebi.ac.uk quorum-manager >>>> 5 gss03a.ebi.ac.uk 10.7.28.34 gss03a.ebi.ac.uk quorum-manager >>>> 6 gss03b.ebi.ac.uk 10.7.28.35 gss03b.ebi.ac.uk quorum-manager >>>> >>>> It was my understanding that the "admin node" should use a different interface ( a 1g link copper should be fine), while the daemon node is where the data was passing , so should point to the bonded 10g interfaces. but when i read the mmchnode man page i start to be quite confused. It says: >>>> >>>> --daemon-interface={hostname | ip_address} >>>> Specifies the host name or IP address to be used by the GPFS daemons for node-to-node communication. The host name or IP address must refer to the communication adapter over which the GPFS daemons communicate. >>>> Alias interfaces are not allowed. Use the original address or a name that is resolved by the host command to that original address. >>>> >>>> --admin-interface={hostname | ip_address} >>>> Specifies the name of the node to be used by GPFS administration commands when communicating between nodes. The admin node name must be specified as an IP address or a hostname that is resolved by the host command >>>> to the desired IP address. If the keyword DEFAULT is specified, the admin interface for the node is set to be equal to the daemon interface for the node. >>>> >>>> What exactly means "node-to node-communications" ? >>>> Means DATA or also the "lease renew", and the token communication between the clients to get/steal the locks to be able to manage concurrent write to thr same file? >>>> Since we are getting expells ( especially when several clients contends the same file ) i assumed i have to split this type of packages from the data stream, but reading the documentation it looks to me that those internal comunication between nodes use the daemon-interface wich i suppose are used also for the data. so HOW exactly i can split them? >>>> >>>> >>>> Thanks in advance, >>>> Salvatore >>>> >>>> >>>> >>>> _______________________________________________ >>>> gpfsug-discuss mailing list >>>> gpfsug-discuss at gpfsug.org >>>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >>> >>> _______________________________________________ >>> gpfsug-discuss mailing list >>> gpfsug-discuss at gpfsug.org >>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >> > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at gpfsug.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From jhick at lbl.gov Mon Jul 13 16:22:58 2015 From: jhick at lbl.gov (Jason Hick) Date: Mon, 13 Jul 2015 08:22:58 -0700 Subject: [gpfsug-discuss] data interface and management infercace. In-Reply-To: References: <559F9960.7010509@ebi.ac.uk> <55A3AF96.3060303@ebi.ac.uk> <5269A4E9-416B-4D70-AAE0-B86042FC96B9@ddn.com> <55A3BD4E.3000205@ebi.ac.uk> Message-ID: Hi, Yes having separate data and management networks has been critical for us for keeping health monitoring/communication unimpeded by data movement. Not as important, but you can also tune the networks differently (packet sizes, buffer sizes, SAK, etc) which can help. Jason > On Jul 13, 2015, at 7:25 AM, Vic Cornell wrote: > > Hi Salvatore, > > I agree that that is what the manual - and some of the wiki entries say. > > However , when we have had problems (typically congestion) with ethernet networks in the past (20GbE or 40GbE) we have resolved them by setting up a separate ?Admin? network. > > The before and after cluster health we have seen measured in number of expels and waiters has been very marked. > > Maybe someone ?in the know? could comment on this split. > > Regards, > > Vic > > >> On 13 Jul 2015, at 14:29, Salvatore Di Nardo wrote: >> >> Hello Vic. >> We are currently draining our gpfs to do all the recabling to add a management network, but looking what the admin interface does ( man mmchnode ) it says something different: >> >> --admin-interface={hostname | ip_address} >> Specifies the name of the node to be used by GPFS administration commands when communicating between nodes. The admin node name must be specified as an IP >> address or a hostname that is resolved by the host command to the desired IP address. If the keyword DEFAULT is specified, the admin interface for the >> node is set to be equal to the daemon interface for the node. >> >> So, seems used only for commands propagation, hence have nothing to do with the node-to-node traffic. Infact the other interface description is: >> >> --daemon-interface={hostname | ip_address} >> Specifies the host name or IP address to be used by the GPFS daemons for node-to-node communication. The host name or IP address must refer to the commu- >> nication adapter over which the GPFS daemons communicate. Alias interfaces are not allowed. Use the original address or a name that is resolved by the >> host command to that original address. >> >> The "expired lease" issue and file locking mechanism a( most of our expells happens when 2 clients try to write in the same file) are exactly node-to node-comunication, so im wondering what's the point to separate the "admin network". I want to be sure to plan the right changes before we do a so massive task. We are talking about adding a new interface on 700 clients, so the recabling work its not small. >> >> >> Regards, >> Salvatore >> >> >> >>> On 13/07/15 14:00, Vic Cornell wrote: >>> Hi Salavatore, >>> >>> Does your GSS have the facility for a 1GbE ?management? network? If so I think that changing the ?admin? node names of the cluster members to a set of IPs on the management network would give you the split that you need. >>> >>> What about the clients? Can they also connect to a separate admin network? >>> >>> Remember that if you are using multi-cluster all of the nodes in both networks must share the same admin network. >>> >>> Kind Regards, >>> >>> Vic >>> >>> >>>> On 13 Jul 2015, at 13:31, Salvatore Di Nardo wrote: >>>> >>>> Anyone? >>>> >>>>> On 10/07/15 11:07, Salvatore Di Nardo wrote: >>>>> Hello guys. >>>>> Quite a while ago i mentioned that we have a big expel issue on our gss ( first gen) and white a lot people suggested that the root cause could be that we use the same interface for all the traffic, and that we should split the data network from the admin network. Finally we could plan a downtime and we are migrating the data out so, i can soon safelly play with the change, but looking what exactly i should to do i'm a bit puzzled. Our mmlscluster looks like this: >>>>> >>>>> GPFS cluster information >>>>> ======================== >>>>> GPFS cluster name: GSS.ebi.ac.uk >>>>> GPFS cluster id: 17987981184946329605 >>>>> GPFS UID domain: GSS.ebi.ac.uk >>>>> Remote shell command: /usr/bin/ssh >>>>> Remote file copy command: /usr/bin/scp >>>>> >>>>> GPFS cluster configuration servers: >>>>> ----------------------------------- >>>>> Primary server: gss01a.ebi.ac.uk >>>>> Secondary server: gss02b.ebi.ac.uk >>>>> >>>>> Node Daemon node name IP address Admin node name Designation >>>>> ----------------------------------------------------------------------- >>>>> 1 gss01a.ebi.ac.uk 10.7.28.2 gss01a.ebi.ac.uk quorum-manager >>>>> 2 gss01b.ebi.ac.uk 10.7.28.3 gss01b.ebi.ac.uk quorum-manager >>>>> 3 gss02a.ebi.ac.uk 10.7.28.67 gss02a.ebi.ac.uk quorum-manager >>>>> 4 gss02b.ebi.ac.uk 10.7.28.66 gss02b.ebi.ac.uk quorum-manager >>>>> 5 gss03a.ebi.ac.uk 10.7.28.34 gss03a.ebi.ac.uk quorum-manager >>>>> 6 gss03b.ebi.ac.uk 10.7.28.35 gss03b.ebi.ac.uk quorum-manager >>>>> >>>>> It was my understanding that the "admin node" should use a different interface ( a 1g link copper should be fine), while the daemon node is where the data was passing , so should point to the bonded 10g interfaces. but when i read the mmchnode man page i start to be quite confused. It says: >>>>> >>>>> --daemon-interface={hostname | ip_address} >>>>> Specifies the host name or IP address to be used by the GPFS daemons for node-to-node communication. The host name or IP address must refer to the communication adapter over which the GPFS daemons communicate. >>>>> Alias interfaces are not allowed. Use the original address or a name that is resolved by the host command to that original address. >>>>> >>>>> --admin-interface={hostname | ip_address} >>>>> Specifies the name of the node to be used by GPFS administration commands when communicating between nodes. The admin node name must be specified as an IP address or a hostname that is resolved by the host command >>>>> to the desired IP address. If the keyword DEFAULT is specified, the admin interface for the node is set to be equal to the daemon interface for the node. >>>>> >>>>> What exactly means "node-to node-communications" ? >>>>> Means DATA or also the "lease renew", and the token communication between the clients to get/steal the locks to be able to manage concurrent write to thr same file? >>>>> Since we are getting expells ( especially when several clients contends the same file ) i assumed i have to split this type of packages from the data stream, but reading the documentation it looks to me that those internal comunication between nodes use the daemon-interface wich i suppose are used also for the data. so HOW exactly i can split them? >>>>> >>>>> >>>>> Thanks in advance, >>>>> Salvatore >>>>> >>>>> >>>>> >>>>> _______________________________________________ >>>>> gpfsug-discuss mailing list >>>>> gpfsug-discuss at gpfsug.org >>>>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >>>> >>>> _______________________________________________ >>>> gpfsug-discuss mailing list >>>> gpfsug-discuss at gpfsug.org >>>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >> >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at gpfsug.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at gpfsug.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From sdenham at gmail.com Mon Jul 13 17:45:48 2015 From: sdenham at gmail.com (Scott D) Date: Mon, 13 Jul 2015 11:45:48 -0500 Subject: [gpfsug-discuss] data interface and management infercace. Message-ID: I spent a good deal of time exploring this topic when I was at IBM. I think there are two key aspects here; the congestion of the actual interfaces on the [cluster, FS, token] management nodes and competition for other resources like CPU cycles on those nodes. When using a single Ethernet interface (or for that matter IB RDMA + IPoIB over the same interface), at some point the two kinds of traffic begin to conflict. The management traffic being much more time sensitive suffers as a result. One solution is to separate the traffic. For larger clusters though (1000s of nodes), a better solution, that may avoid having to have a 2nd interface on every client node, is to add dedicated nodes as managers and not rely on NSD servers for this. It does cost you some modest servers and GPFS server licenses. My previous client generally used previous-generation retired compute nodes for this job. Scott Date: Mon, 13 Jul 2015 15:25:32 +0100 > From: Vic Cornell > Subject: Re: [gpfsug-discuss] data interface and management infercace. > > Hi Salvatore, > > I agree that that is what the manual - and some of the wiki entries say. > > However , when we have had problems (typically congestion) with ethernet > networks in the past (20GbE or 40GbE) we have resolved them by setting up a > separate ?Admin? network. > > The before and after cluster health we have seen measured in number of > expels and waiters has been very marked. > > Maybe someone ?in the know? could comment on this split. > > Regards, > > Vic > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mhabib73 at gmail.com Mon Jul 13 18:19:36 2015 From: mhabib73 at gmail.com (Muhammad Habib) Date: Mon, 13 Jul 2015 13:19:36 -0400 Subject: [gpfsug-discuss] data interface and management infercace. In-Reply-To: References: <559F9960.7010509@ebi.ac.uk> <55A3AF96.3060303@ebi.ac.uk> <5269A4E9-416B-4D70-AAE0-B86042FC96B9@ddn.com> <55A3BD4E.3000205@ebi.ac.uk> Message-ID: Did you look at "subnets" parameter used with "mmchconfig" command. I think you can use order list of subnets for daemon communication and then actual daemon interface can be used for data transfer. When the GPFS will start it will use actual daemon interface for communication , however , once its started , it will use the IPs from the subnet list whichever coming first in the list. To further validate , you can put network sniffer before you do actual implementation or alternatively you can open a PMR with IBM. If your cluster having expel situation , you may fine tune your cluster e.g. increase ping timeout period , having multiple NSD servers and distributing filesystems across these NSD servers. Also critical servers can have HBA cards installed for direct I/O through fiber. Thanks On Mon, Jul 13, 2015 at 11:22 AM, Jason Hick wrote: > Hi, > > Yes having separate data and management networks has been critical for us > for keeping health monitoring/communication unimpeded by data movement. > > Not as important, but you can also tune the networks differently (packet > sizes, buffer sizes, SAK, etc) which can help. > > Jason > > On Jul 13, 2015, at 7:25 AM, Vic Cornell wrote: > > Hi Salvatore, > > I agree that that is what the manual - and some of the wiki entries say. > > However , when we have had problems (typically congestion) with ethernet > networks in the past (20GbE or 40GbE) we have resolved them by setting up a > separate ?Admin? network. > > The before and after cluster health we have seen measured in number of > expels and waiters has been very marked. > > Maybe someone ?in the know? could comment on this split. > > Regards, > > Vic > > > On 13 Jul 2015, at 14:29, Salvatore Di Nardo wrote: > > Hello Vic. > We are currently draining our gpfs to do all the recabling to add a > management network, but looking what the admin interface does ( man > mmchnode ) it says something different: > > --admin-interface={hostname | ip_address} > Specifies the name of the node to be used by GPFS > administration commands when communicating between nodes. The admin node > name must be specified as an IP > address or a hostname that is resolved by the > host command to the desired IP address. If the keyword DEFAULT is > specified, the admin interface for the > node is set to be equal to the daemon interface > for the node. > > > So, seems used only for commands propagation, hence have nothing to do > with the node-to-node traffic. Infact the other interface description is: > > --daemon-interface={hostname | ip_address} > Specifies the host name or IP address *to be > used by the GPFS daemons for node-to-node communication*. The host name > or IP address must refer to the commu- > nication adapter over which the GPFS daemons > communicate. Alias interfaces are not allowed. Use the original address or > a name that is resolved by the > host command to that original address. > > > The "expired lease" issue and file locking mechanism a( most of our > expells happens when 2 clients try to write in the same file) are exactly > node-to node-comunication, so im wondering what's the point to separate > the "admin network". I want to be sure to plan the right changes before we > do a so massive task. We are talking about adding a new interface on 700 > clients, so the recabling work its not small. > > > Regards, > Salvatore > > > > On 13/07/15 14:00, Vic Cornell wrote: > > Hi Salavatore, > > Does your GSS have the facility for a 1GbE ?management? network? If so I > think that changing the ?admin? node names of the cluster members to a set > of IPs on the management network would give you the split that you need. > > What about the clients? Can they also connect to a separate admin > network? > > Remember that if you are using multi-cluster all of the nodes in both > networks must share the same admin network. > > Kind Regards, > > Vic > > > On 13 Jul 2015, at 13:31, Salvatore Di Nardo wrote: > > Anyone? > > On 10/07/15 11:07, Salvatore Di Nardo wrote: > > Hello guys. > Quite a while ago i mentioned that we have a big expel issue on our gss ( > first gen) and white a lot people suggested that the root cause could be > that we use the same interface for all the traffic, and that we should > split the data network from the admin network. Finally we could plan a > downtime and we are migrating the data out so, i can soon safelly play with > the change, but looking what exactly i should to do i'm a bit puzzled. Our > mmlscluster looks like this: > > GPFS cluster information > ======================== > GPFS cluster name: GSS.ebi.ac.uk > GPFS cluster id: 17987981184946329605 > GPFS UID domain: GSS.ebi.ac.uk > Remote shell command: /usr/bin/ssh > Remote file copy command: /usr/bin/scp > > GPFS cluster configuration servers: > ----------------------------------- > Primary server: gss01a.ebi.ac.uk > Secondary server: gss02b.ebi.ac.uk > > Node Daemon node name IP address Admin node name Designation > ----------------------------------------------------------------------- > 1 gss01a.ebi.ac.uk 10.7.28.2 gss01a.ebi.ac.uk quorum-manager > 2 gss01b.ebi.ac.uk 10.7.28.3 gss01b.ebi.ac.uk quorum-manager > 3 gss02a.ebi.ac.uk 10.7.28.67 gss02a.ebi.ac.uk quorum-manager > 4 gss02b.ebi.ac.uk 10.7.28.66 gss02b.ebi.ac.uk quorum-manager > 5 gss03a.ebi.ac.uk 10.7.28.34 gss03a.ebi.ac.uk quorum-manager > 6 gss03b.ebi.ac.uk 10.7.28.35 gss03b.ebi.ac.uk quorum-manager > > > It was my understanding that the "admin node" should use a different > interface ( a 1g link copper should be fine), while the daemon node is > where the data was passing , so should point to the bonded 10g interfaces. > but when i read the mmchnode man page i start to be quite confused. It says: > > --daemon-interface={hostname | ip_address} > Specifies the host name or IP address *to be > used by the GPFS daemons for node-to-node communication*. The host name > or IP address must refer to the communication adapter over which the GPFS > daemons communicate. > Alias interfaces are not allowed. Use the > original address or a name that is resolved by the host command to that > original address. > > --admin-interface={hostname | ip_address} > Specifies the name of the node to be used by GPFS > administration commands when communicating between nodes. The admin node > name must be specified as an IP address or a hostname that is resolved by > the host command > to the desired IP address. If the keyword > DEFAULT is specified, the admin interface for the node is set to be equal > to the daemon interface for the node. > > What exactly means "node-to node-communications" ? > Means DATA or also the "lease renew", and the token communication between > the clients to get/steal the locks to be able to manage concurrent write to > thr same file? > Since we are getting expells ( especially when several clients contends > the same file ) i assumed i have to split this type of packages from the > data stream, but reading the documentation it looks to me that those > internal comunication between nodes use the daemon-interface wich i suppose > are used also for the data. so HOW exactly i can split them? > > > Thanks in advance, > Salvatore > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at gpfsug.orghttp://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at gpfsug.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at gpfsug.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at gpfsug.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at gpfsug.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > -- This communication contains confidential information intended only for the persons to whom it is addressed. Any other distribution, copying or disclosure is strictly prohibited. If you have received this communication in error, please notify the sender and delete this e-mail message immediately. Le pr?sent message contient des renseignements de nature confidentielle r?serv?s uniquement ? l'usage du destinataire. Toute diffusion, distribution, divulgation, utilisation ou reproduction de la pr?sente communication, et de tout fichier qui y est joint, est strictement interdite. Si vous avez re?u le pr?sent message ?lectronique par erreur, veuillez informer imm?diatement l'exp?diteur et supprimer le message de votre ordinateur et de votre serveur. -------------- next part -------------- An HTML attachment was scrubbed... URL: From oester at gmail.com Mon Jul 13 18:42:47 2015 From: oester at gmail.com (Bob Oesterlin) Date: Mon, 13 Jul 2015 12:42:47 -0500 Subject: [gpfsug-discuss] data interface and management infercace. In-Reply-To: References: Message-ID: Some thoughts on node expels, based on the last 2-3 months of "expel hell" here. We've spent a lot of time looking at this issue, across multiple clusters. A big thanks to IBM for helping us center in on the right issues. First, you need to understand if the expels are due to "expired lease" message, or expels due to "communication issues". It sounds like you are talking about the latter. In the case of nodes being expelled due to communication issues, it's more likely the problem in related to network congestion. This can occur at many levels - the node, the network, or the switch. When it's a communication issue, changing prams like "missed ping timeout" isn't going to help you. The problem for us ended up being that GPFS wasn't getting a response to a periodic "keep alive" poll to the node, and after 300 seconds, it declared the node dead and expelled it. You can tell if this is the issue by starting to look at the RPC waiters just before the expel. If you see something like "Waiting for poll on sock" RPC, that the node is waiting for that periodic poll to return, and it's not seeing it. The response is either lost in the network, sitting on the network queue, or the node is too busy to send it. You may also see RPC's like "waiting for exclusive use of connection" RPC - this is another clear indication of network congestion. Look at the GPFSUG presentions (http://www.gpfsug.org/presentations/) for one by Jason Hick (NERSC) - he also talks about these issues. You need to take a look at net.ipv4.tcp_wmem and net.ipv4.tcp_rmem, especially if you have client nodes that are on slower network interfaces. In our case, it was a number of factors - adjusting these settings, looking at congestion at the switch level, and some physical hardware issues. I would be happy to discuss in more detail (offline) if you want). There are no simple solutions. :-) Bob Oesterlin, Sr Storage Engineer, Nuance Communications robert.oesterlin at nuance.com On Mon, Jul 13, 2015 at 11:45 AM, Scott D wrote: > I spent a good deal of time exploring this topic when I was at IBM. I > think there are two key aspects here; the congestion of the actual > interfaces on the [cluster, FS, token] management nodes and competition for > other resources like CPU cycles on those nodes. When using a single > Ethernet interface (or for that matter IB RDMA + IPoIB over the same > interface), at some point the two kinds of traffic begin to conflict. The > management traffic being much more time sensitive suffers as a result. One > solution is to separate the traffic. For larger clusters though (1000s of > nodes), a better solution, that may avoid having to have a 2nd interface on > every client node, is to add dedicated nodes as managers and not rely on > NSD servers for this. It does cost you some modest servers and GPFS server > licenses. My previous client generally used previous-generation retired > compute nodes for this job. > -------------- next part -------------- An HTML attachment was scrubbed... URL: From hagley at cscs.ch Tue Jul 14 08:31:04 2015 From: hagley at cscs.ch (Hagley Birgit) Date: Tue, 14 Jul 2015 07:31:04 +0000 Subject: [gpfsug-discuss] data interface and management infercace. In-Reply-To: <55A3BD4E.3000205@ebi.ac.uk> References: <559F9960.7010509@ebi.ac.uk> <55A3AF96.3060303@ebi.ac.uk> <5269A4E9-416B-4D70-AAE0-B86042FC96B9@ddn.com>, <55A3BD4E.3000205@ebi.ac.uk> Message-ID: <97B2355E006F044E9B8518711889B13719CF3810@MBX114.d.ethz.ch> Hello Salvatore, as you wrote that you have about 700 clients, maybe also the tuning recommendations for large GPFS clusters are helpful for you. They are on the developerworks GPFS wiki: https://www.ibm.com/developerworks/community/wikis/home?lang=en#!/wiki/General%20Parallel%20File%20System%20%28GPFS%29/page/Best%20Practices%20Network%20Tuning To my experience especially "failureDetectionTime" and "minMissedPingTimeout" may help in case of expelled nodes. In case you use InfiniBand, for RDMA, there also is a "Best Practices RDMA Tuning" page: https://www.ibm.com/developerworks/community/wikis/home?lang=en#!/wiki/General%20Parallel%20File%20System%20%28GPFS%29/page/Best%20Practices%20RDMA%20Tuning Regards Birgit ________________________________ From: gpfsug-discuss-bounces at gpfsug.org [gpfsug-discuss-bounces at gpfsug.org] on behalf of Salvatore Di Nardo [sdinardo at ebi.ac.uk] Sent: Monday, July 13, 2015 3:29 PM To: Vic Cornell Cc: gpfsug main discussion list Subject: Re: [gpfsug-discuss] data interface and management infercace. Hello Vic. We are currently draining our gpfs to do all the recabling to add a management network, but looking what the admin interface does ( man mmchnode ) it says something different: --admin-interface={hostname | ip_address} Specifies the name of the node to be used by GPFS administration commands when communicating between nodes. The admin node name must be specified as an IP address or a hostname that is resolved by the host command to the desired IP address. If the keyword DEFAULT is specified, the admin interface for the node is set to be equal to the daemon interface for the node. So, seems used only for commands propagation, hence have nothing to do with the node-to-node traffic. Infact the other interface description is: --daemon-interface={hostname | ip_address} Specifies the host name or IP address to be used by the GPFS daemons for node-to-node communication. The host name or IP address must refer to the commu- nication adapter over which the GPFS daemons communicate. Alias interfaces are not allowed. Use the original address or a name that is resolved by the host command to that original address. The "expired lease" issue and file locking mechanism a( most of our expells happens when 2 clients try to write in the same file) are exactly node-to node-comunication, so im wondering what's the point to separate the "admin network". I want to be sure to plan the right changes before we do a so massive task. We are talking about adding a new interface on 700 clients, so the recabling work its not small. Regards, Salvatore On 13/07/15 14:00, Vic Cornell wrote: Hi Salavatore, Does your GSS have the facility for a 1GbE ?management? network? If so I think that changing the ?admin? node names of the cluster members to a set of IPs on the management network would give you the split that you need. What about the clients? Can they also connect to a separate admin network? Remember that if you are using multi-cluster all of the nodes in both networks must share the same admin network. Kind Regards, Vic On 13 Jul 2015, at 13:31, Salvatore Di Nardo > wrote: Anyone? On 10/07/15 11:07, Salvatore Di Nardo wrote: Hello guys. Quite a while ago i mentioned that we have a big expel issue on our gss ( first gen) and white a lot people suggested that the root cause could be that we use the same interface for all the traffic, and that we should split the data network from the admin network. Finally we could plan a downtime and we are migrating the data out so, i can soon safelly play with the change, but looking what exactly i should to do i'm a bit puzzled. Our mmlscluster looks like this: GPFS cluster information ======================== GPFS cluster name: GSS.ebi.ac.uk GPFS cluster id: 17987981184946329605 GPFS UID domain: GSS.ebi.ac.uk Remote shell command: /usr/bin/ssh Remote file copy command: /usr/bin/scp GPFS cluster configuration servers: ----------------------------------- Primary server: gss01a.ebi.ac.uk Secondary server: gss02b.ebi.ac.uk Node Daemon node name IP address Admin node name Designation ----------------------------------------------------------------------- 1 gss01a.ebi.ac.uk 10.7.28.2 gss01a.ebi.ac.uk quorum-manager 2 gss01b.ebi.ac.uk 10.7.28.3 gss01b.ebi.ac.uk quorum-manager 3 gss02a.ebi.ac.uk 10.7.28.67 gss02a.ebi.ac.uk quorum-manager 4 gss02b.ebi.ac.uk 10.7.28.66 gss02b.ebi.ac.uk quorum-manager 5 gss03a.ebi.ac.uk 10.7.28.34 gss03a.ebi.ac.uk quorum-manager 6 gss03b.ebi.ac.uk 10.7.28.35 gss03b.ebi.ac.uk quorum-manager It was my understanding that the "admin node" should use a different interface ( a 1g link copper should be fine), while the daemon node is where the data was passing , so should point to the bonded 10g interfaces. but when i read the mmchnode man page i start to be quite confused. It says: --daemon-interface={hostname | ip_address} Specifies the host name or IP address to be used by the GPFS daemons for node-to-node communication. The host name or IP address must refer to the communication adapter over which the GPFS daemons communicate. Alias interfaces are not allowed. Use the original address or a name that is resolved by the host command to that original address. --admin-interface={hostname | ip_address} Specifies the name of the node to be used by GPFS administration commands when communicating between nodes. The admin node name must be specified as an IP address or a hostname that is resolved by the host command to the desired IP address. If the keyword DEFAULT is specified, the admin interface for the node is set to be equal to the daemon interface for the node. What exactly means "node-to node-communications" ? Means DATA or also the "lease renew", and the token communication between the clients to get/steal the locks to be able to manage concurrent write to thr same file? Since we are getting expells ( especially when several clients contends the same file ) i assumed i have to split this type of packages from the data stream, but reading the documentation it looks to me that those internal comunication between nodes use the daemon-interface wich i suppose are used also for the data. so HOW exactly i can split them? Thanks in advance, Salvatore _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From sdinardo at ebi.ac.uk Tue Jul 14 09:15:26 2015 From: sdinardo at ebi.ac.uk (Salvatore Di Nardo) Date: Tue, 14 Jul 2015 09:15:26 +0100 Subject: [gpfsug-discuss] data interface and management infercace. In-Reply-To: <97B2355E006F044E9B8518711889B13719CF3810@MBX114.d.ethz.ch> References: <559F9960.7010509@ebi.ac.uk> <55A3AF96.3060303@ebi.ac.uk> <5269A4E9-416B-4D70-AAE0-B86042FC96B9@ddn.com>, <55A3BD4E.3000205@ebi.ac.uk> <97B2355E006F044E9B8518711889B13719CF3810@MBX114.d.ethz.ch> Message-ID: <55A4C51E.8050606@ebi.ac.uk> Thanks, this has already been done ( without too much success). We need to rearrange the networking and since somebody experience was to add a copper interface for management i want to do the same, so i'm digging a bit to aundertsand the best way yo do it. Regards, Salvatore On 14/07/15 08:31, Hagley Birgit wrote: > Hello Salvatore, > > as you wrote that you have about 700 clients, maybe also the tuning > recommendations for large GPFS clusters are helpful for you. They are > on the developerworks GPFS wiki: > > https://www.ibm.com/developerworks/community/wikis/home?lang=en#!/wiki/General%20Parallel%20File%20System%20%28GPFS%29/page/Best%20Practices%20Network%20Tuning > > > > To my experience especially "failureDetectionTime" and > "minMissedPingTimeout" may help in case of expelled nodes. > > > In case you use InfiniBand, for RDMA, there also is a "Best Practices > RDMA Tuning" page: > > https://www.ibm.com/developerworks/community/wikis/home?lang=en#!/wiki/General%20Parallel%20File%20System%20%28GPFS%29/page/Best%20Practices%20RDMA%20Tuning > > > > > Regards > Birgit > > ------------------------------------------------------------------------ > *From:* gpfsug-discuss-bounces at gpfsug.org > [gpfsug-discuss-bounces at gpfsug.org] on behalf of Salvatore Di Nardo > [sdinardo at ebi.ac.uk] > *Sent:* Monday, July 13, 2015 3:29 PM > *To:* Vic Cornell > *Cc:* gpfsug main discussion list > *Subject:* Re: [gpfsug-discuss] data interface and management infercace. > > Hello Vic. > We are currently draining our gpfs to do all the recabling to add a > management network, but looking what the admin interface does ( man > mmchnode ) it says something different: > > --admin-interface={hostname | ip_address} > Specifies the name of the node to be used by GPFS > administration commands when communicating between nodes. The > admin node name must be specified as an IP > address or a hostname that is resolved by the host command to > the desired IP address. If the keyword DEFAULT is specified, > the admin interface for the > node is set to be equal to the daemon interface for the node. > > > So, seems used only for commands propagation, hence have nothing to > do with the node-to-node traffic. Infact the other interface > description is: > > --daemon-interface={hostname | ip_address} > Specifies the host name or IP address _*to be used by the GPFS > daemons for node-to-node communication*_. The host name or IP > address must refer to the commu- > nication adapter over which the GPFS daemons communicate. > Alias interfaces are not allowed. Use the original address or > a name that is resolved by the > host command to that original address. > > > The "expired lease" issue and file locking mechanism a( most of our > expells happens when 2 clients try to write in the same file) are > exactly node-to node-comunication, so im wondering what's the point to > separate the "admin network". I want to be sure to plan the right > changes before we do a so massive task. We are talking about adding a > new interface on 700 clients, so the recabling work its not small. > > > Regards, > Salvatore > > > > On 13/07/15 14:00, Vic Cornell wrote: >> Hi Salavatore, >> >> Does your GSS have the facility for a 1GbE ?management? network? If >> so I think that changing the ?admin? node names of the cluster >> members to a set of IPs on the management network would give you the >> split that you need. >> >> What about the clients? Can they also connect to a separate admin >> network? >> >> Remember that if you are using multi-cluster all of the nodes in both >> networks must share the same admin network. >> >> Kind Regards, >> >> Vic >> >> >>> On 13 Jul 2015, at 13:31, Salvatore Di Nardo >> > wrote: >>> >>> Anyone? >>> >>> On 10/07/15 11:07, Salvatore Di Nardo wrote: >>>> Hello guys. >>>> Quite a while ago i mentioned that we have a big expel issue on >>>> our gss ( first gen) and white a lot people suggested that the root >>>> cause could be that we use the same interface for all the traffic, >>>> and that we should split the data network from the admin network. >>>> Finally we could plan a downtime and we are migrating the data out >>>> so, i can soon safelly play with the change, but looking what >>>> exactly i should to do i'm a bit puzzled. Our mmlscluster looks >>>> like this: >>>> >>>> GPFS cluster information >>>> ======================== >>>> GPFS cluster name: GSS.ebi.ac.uk >>>> GPFS cluster id: 17987981184946329605 >>>> GPFS UID domain: GSS.ebi.ac.uk >>>> Remote shell command: /usr/bin/ssh >>>> Remote file copy command: /usr/bin/scp >>>> >>>> GPFS cluster configuration servers: >>>> ----------------------------------- >>>> Primary server: gss01a.ebi.ac.uk >>>> Secondary server: gss02b.ebi.ac.uk >>>> >>>> >>>> Node Daemon node name IP address Admin node >>>> name Designation >>>> ----------------------------------------------------------------------- >>>> 1 gss01a.ebi.ac.uk >>>> 10.7.28.2 gss01a.ebi.ac.uk >>>> quorum-manager >>>> 2 gss01b.ebi.ac.uk >>>> 10.7.28.3 gss01b.ebi.ac.uk >>>> quorum-manager >>>> 3 gss02a.ebi.ac.uk >>>> 10.7.28.67 gss02a.ebi.ac.uk >>>> quorum-manager >>>> 4 gss02b.ebi.ac.uk >>>> 10.7.28.66 gss02b.ebi.ac.uk >>>> quorum-manager >>>> 5 gss03a.ebi.ac.uk >>>> 10.7.28.34 gss03a.ebi.ac.uk >>>> quorum-manager >>>> 6 gss03b.ebi.ac.uk >>>> 10.7.28.35 gss03b.ebi.ac.uk >>>> quorum-manager >>>> >>>> >>>> It was my understanding that the "admin node" should use a >>>> different interface ( a 1g link copper should be fine), while the >>>> daemon node is where the data was passing , so should point to the >>>> bonded 10g interfaces. but when i read the mmchnode man page i >>>> start to be quite confused. It says: >>>> >>>> --daemon-interface={hostname | ip_address} >>>> Specifies the host name or IP address _*to be used by the GPFS >>>> daemons for node-to-node communication*_. The host name or IP >>>> address must refer to the communication adapter over which the GPFS >>>> daemons communicate. >>>> Alias interfaces are not allowed. Use the >>>> original address or a name that is resolved by the host command to >>>> that original address. >>>> >>>> --admin-interface={hostname | ip_address} >>>> Specifies the name of the node to be used by GPFS administration >>>> commands when communicating between nodes. The admin node name must >>>> be specified as an IP address or a hostname that is resolved by >>>> the host command >>>> tothe desired IP address. If the keyword >>>> DEFAULT is specified, the admin interface for the node is set to be >>>> equal to the daemon interface for the node. >>>> >>>> What exactly means "node-to node-communications" ? >>>> Means DATA or also the "lease renew", and the token communication >>>> between the clients to get/steal the locks to be able to manage >>>> concurrent write to thr same file? >>>> Since we are getting expells ( especially when several clients >>>> contends the same file ) i assumed i have to split this type of >>>> packages from the data stream, but reading the documentation it >>>> looks to me that those internal comunication between nodes use the >>>> daemon-interface wich i suppose are used also for the data. so HOW >>>> exactly i can split them? >>>> >>>> >>>> Thanks in advance, >>>> Salvatore >>>> >>>> >>>> >>>> _______________________________________________ >>>> gpfsug-discuss mailing list >>>> gpfsug-discuss atgpfsug.org >>>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >>> >>> _______________________________________________ >>> gpfsug-discuss mailing list >>> gpfsug-discuss at gpfsug.org >>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >> > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at gpfsug.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From jtucker at pixitmedia.com Tue Jul 14 16:11:51 2015 From: jtucker at pixitmedia.com (Jez Tucker) Date: Tue, 14 Jul 2015 16:11:51 +0100 Subject: [gpfsug-discuss] Vim highlighting for GPFS available Message-ID: <55A526B7.6080602@pixitmedia.com> Hi everyone, I've released vim highlighting for GPFS policies as a public git repo. https://github.com/arcapix/vim-gpfs Pull requests welcome. Please enjoy your new colourful world. Jez p.s. Apologies to Emacs users. Head of R&D ArcaStream/Pixit Media -- This email is confidential in that it is intended for the exclusive attention of the addressee(s) indicated. If you are not the intended recipient, this email should not be read or disclosed to any other person. Please notify the sender immediately and delete this email from your computer system. Any opinions expressed are not necessarily those of the company from which this email was sent and, whilst to the best of our knowledge no viruses or defects exist, no responsibility can be accepted for any loss or damage arising from its receipt or subsequent use of this email. From jonbernard at gmail.com Wed Jul 15 09:19:49 2015 From: jonbernard at gmail.com (Jon Bernard) Date: Wed, 15 Jul 2015 10:19:49 +0200 Subject: [gpfsug-discuss] GPFS UG 10 Presentations - Sven Oehme In-Reply-To: References: Message-ID: If I may revive this: is trcio publicly available? Jon Bernard On Fri, May 2, 2014 at 5:06 PM, Bob Oesterlin wrote: > It Sven's presentation, he mentions a tools "trcio" (in > /xcat/oehmes/gpfs-clone) > > Where can I find that? > > Bob Oesterlin > > > > On Fri, May 2, 2014 at 9:49 AM, Jez Tucker (Chair) > wrote: > >> Hello all >> >> Firstly, thanks for the feedback we've had so far. Very much >> appreciated. >> >> Secondly, GPFS UG 10 Presentations are now available on the Presentations >> section of the website. >> Any outstanding presentations will follow shortly. >> >> See: http://www.gpfsug.org/ >> >> Best regards, >> >> Jez >> >> UG Chair >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at gpfsug.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >> > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at gpfsug.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From sdinardo at ebi.ac.uk Wed Jul 15 10:19:58 2015 From: sdinardo at ebi.ac.uk (Salvatore Di Nardo) Date: Wed, 15 Jul 2015 10:19:58 +0100 Subject: [gpfsug-discuss] data interface and management infercace. In-Reply-To: References: <559F9960.7010509@ebi.ac.uk> <55A3AF96.3060303@ebi.ac.uk> <5269A4E9-416B-4D70-AAE0-B86042FC96B9@ddn.com> <55A3BD4E.3000205@ebi.ac.uk> Message-ID: <55A625BE.9000809@ebi.ac.uk> Thanks for the input.. this is actually very interesting! Reading here: https://www.ibm.com/developerworks/community/wikis/home?lang=en#!/wiki/General+Parallel+File+System+%28GPFS%29/page/GPFS+Network+Communication+Overview , specifically the " Using more than one network" part it seems to me that this way we should be able to split the lease/token/ping from the data. Supposing that I implement a GSS cluster with only NDS and a second cluster with only clients: As far i understood if on the NDS cluster add first the subnet 10.20.0.0/16 and then 10.30.0.0 is should use the internal network for all the node-to-node comunication, leaving the 10.30.0.0/30 only for data traffic witht he remote cluster ( the clients). Similarly, in the client cluster, adding first 10.10.0.0/16 and then 10.30.0.0, will guarantee than the node-to-node comunication pass trough a different interface there the data is passing. Since the client are just "clients" the traffic trough 10.10.0.0/16 should be minimal (only token ,lease, ping and so on ) and not affected by the rest. Should be possible at this point move aldo the "admin network" on the internal interface, so we effectively splitted all the "non data" traffic on a dedicated interface. I'm wondering if I'm missing something, and in case i didn't, what could be the real traffic in the internal (black) networks ( 1g link its fine or i still need 10g for that). Another thing I I'm wondering its the load of the "non data" traffic between the clusters.. i suppose some "daemon traffic" goes trough the blue interface for the inter-cluster communication. Any thoughts ? Salvatore On 13/07/15 18:19, Muhammad Habib wrote: > Did you look at "subnets" parameter used with "mmchconfig" command. I > think you can use order list of subnets for daemon communication and > then actual daemon interface can be used for data transfer. When the > GPFS will start it will use actual daemon interface for communication > , however , once its started , it will use the IPs from the subnet > list whichever coming first in the list. To further validate , you > can put network sniffer before you do actual implementation or > alternatively you can open a PMR with IBM. > > If your cluster having expel situation , you may fine tune your > cluster e.g. increase ping timeout period , having multiple NSD > servers and distributing filesystems across these NSD servers. Also > critical servers can have HBA cards installed for direct I/O through > fiber. > > Thanks > > On Mon, Jul 13, 2015 at 11:22 AM, Jason Hick > wrote: > > Hi, > > Yes having separate data and management networks has been critical > for us for keeping health monitoring/communication unimpeded by > data movement. > > Not as important, but you can also tune the networks differently > (packet sizes, buffer sizes, SAK, etc) which can help. > > Jason > > On Jul 13, 2015, at 7:25 AM, Vic Cornell > wrote: > >> Hi Salvatore, >> >> I agree that that is what the manual - and some of the wiki >> entries say. >> >> However , when we have had problems (typically congestion) with >> ethernet networks in the past (20GbE or 40GbE) we have resolved >> them by setting up a separate ?Admin? network. >> >> The before and after cluster health we have seen measured in >> number of expels and waiters has been very marked. >> >> Maybe someone ?in the know? could comment on this split. >> >> Regards, >> >> Vic >> >> >>> On 13 Jul 2015, at 14:29, Salvatore Di Nardo >> > wrote: >>> >>> Hello Vic. >>> We are currently draining our gpfs to do all the recabling to >>> add a management network, but looking what the admin interface >>> does ( man mmchnode ) it says something different: >>> >>> --admin-interface={hostname | ip_address} >>> Specifies the name of the node to be used by GPFS >>> administration commands when communicating between >>> nodes. The admin node name must be specified as an IP >>> address or a hostname that is resolved by the host >>> command to the desired IP address. If the keyword >>> DEFAULT is specified, the admin interface for the >>> node is set to be equal to the daemon interface for the >>> node. >>> >>> >>> So, seems used only for commands propagation, hence have >>> nothing to do with the node-to-node traffic. Infact the other >>> interface description is: >>> >>> --daemon-interface={hostname | ip_address} >>> Specifies the host name or IP address _*to be used by >>> the GPFS daemons for node-to-node communication*_. The >>> host name or IP address must refer to the commu- >>> nication adapter over which the GPFS daemons >>> communicate. Alias interfaces are not allowed. Use the >>> original address or a name that is resolved by the >>> host command to that original address. >>> >>> >>> The "expired lease" issue and file locking mechanism a( most of >>> our expells happens when 2 clients try to write in the same >>> file) are exactly node-to node-comunication, so im wondering >>> what's the point to separate the "admin network". I want to be >>> sure to plan the right changes before we do a so massive task. >>> We are talking about adding a new interface on 700 clients, so >>> the recabling work its not small. >>> >>> >>> Regards, >>> Salvatore >>> >>> >>> >>> On 13/07/15 14:00, Vic Cornell wrote: >>>> Hi Salavatore, >>>> >>>> Does your GSS have the facility for a 1GbE ?management? >>>> network? If so I think that changing the ?admin? node names of >>>> the cluster members to a set of IPs on the management network >>>> would give you the split that you need. >>>> >>>> What about the clients? Can they also connect to a separate >>>> admin network? >>>> >>>> Remember that if you are using multi-cluster all of the nodes >>>> in both networks must share the same admin network. >>>> >>>> Kind Regards, >>>> >>>> Vic >>>> >>>> >>>>> On 13 Jul 2015, at 13:31, Salvatore Di Nardo >>>>> > wrote: >>>>> >>>>> Anyone? >>>>> >>>>> On 10/07/15 11:07, Salvatore Di Nardo wrote: >>>>>> Hello guys. >>>>>> Quite a while ago i mentioned that we have a big expel issue >>>>>> on our gss ( first gen) and white a lot people suggested that >>>>>> the root cause could be that we use the same interface for >>>>>> all the traffic, and that we should split the data network >>>>>> from the admin network. Finally we could plan a downtime and >>>>>> we are migrating the data out so, i can soon safelly play >>>>>> with the change, but looking what exactly i should to do i'm >>>>>> a bit puzzled. Our mmlscluster looks like this: >>>>>> >>>>>> GPFS cluster information >>>>>> ======================== >>>>>> GPFS cluster name: GSS.ebi.ac.uk >>>>>> >>>>>> GPFS cluster id: 17987981184946329605 >>>>>> GPFS UID domain: GSS.ebi.ac.uk >>>>>> >>>>>> Remote shell command: /usr/bin/ssh >>>>>> Remote file copy command: /usr/bin/scp >>>>>> >>>>>> GPFS cluster configuration servers: >>>>>> ----------------------------------- >>>>>> Primary server: gss01a.ebi.ac.uk >>>>>> >>>>>> Secondary server: gss02b.ebi.ac.uk >>>>>> >>>>>> >>>>>> Node Daemon node name IP address Admin node >>>>>> name Designation >>>>>> ----------------------------------------------------------------------- >>>>>> 1 gss01a.ebi.ac.uk >>>>>> 10.7.28.2 gss01a.ebi.ac.uk >>>>>> quorum-manager >>>>>> 2 gss01b.ebi.ac.uk >>>>>> 10.7.28.3 gss01b.ebi.ac.uk >>>>>> quorum-manager >>>>>> 3 gss02a.ebi.ac.uk >>>>>> 10.7.28.67 gss02a.ebi.ac.uk >>>>>> quorum-manager >>>>>> 4 gss02b.ebi.ac.uk >>>>>> 10.7.28.66 gss02b.ebi.ac.uk >>>>>> quorum-manager >>>>>> 5 gss03a.ebi.ac.uk >>>>>> 10.7.28.34 gss03a.ebi.ac.uk >>>>>> quorum-manager >>>>>> 6 gss03b.ebi.ac.uk >>>>>> 10.7.28.35 gss03b.ebi.ac.uk >>>>>> quorum-manager >>>>>> >>>>>> >>>>>> It was my understanding that the "admin node" should use a >>>>>> different interface ( a 1g link copper should be fine), while >>>>>> the daemon node is where the data was passing , so should >>>>>> point to the bonded 10g interfaces. but when i read the >>>>>> mmchnode man page i start to be quite confused. It says: >>>>>> >>>>>> --daemon-interface={hostname | ip_address} >>>>>> Specifies the host name or IP address _*to be used by the >>>>>> GPFS daemons for node-to-node communication*_. The host name >>>>>> or IP address must refer to the communication adapter over >>>>>> which the GPFS daemons communicate. >>>>>> Alias interfaces are not allowed. Use the original address or >>>>>> a name that is resolved by the host command to that original >>>>>> address. >>>>>> >>>>>> --admin-interface={hostname | ip_address} >>>>>> Specifies the name of the node to be used by GPFS >>>>>> administration commands when communicating between nodes. The >>>>>> admin node name must be specified as an IP address or a >>>>>> hostname that is resolved by the host command >>>>>> tothe desired IP address. If the >>>>>> keyword DEFAULT is specified, the admin interface for the >>>>>> node is set to be equal to the daemon interface for the node. >>>>>> >>>>>> What exactly means "node-to node-communications" ? >>>>>> Means DATA or also the "lease renew", and the token >>>>>> communication between the clients to get/steal the locks to >>>>>> be able to manage concurrent write to thr same file? >>>>>> Since we are getting expells ( especially when several >>>>>> clients contends the same file ) i assumed i have to split >>>>>> this type of packages from the data stream, but reading the >>>>>> documentation it looks to me that those internal comunication >>>>>> between nodes use the daemon-interface wich i suppose are >>>>>> used also for the data. so HOW exactly i can split them? >>>>>> >>>>>> >>>>>> Thanks in advance, >>>>>> Salvatore >>>>>> >>>>>> >>>>>> >>>>>> _______________________________________________ >>>>>> gpfsug-discuss mailing list >>>>>> gpfsug-discuss atgpfsug.org >>>>>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >>>>> >>>>> _______________________________________________ >>>>> gpfsug-discuss mailing list >>>>> gpfsug-discuss at gpfsug.org >>>>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >>>> >>> >>> _______________________________________________ >>> gpfsug-discuss mailing list >>> gpfsug-discuss at gpfsug.org >>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >> >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at gpfsug.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at gpfsug.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > > -- > This communication contains confidential information intended only for > the persons to whom it is addressed. Any other distribution, copying > or disclosure is strictly prohibited. If you have received this > communication in error, please notify the sender and delete this > e-mail message immediately. > > Le pr?sent message contient des renseignements de nature > confidentielle r?serv?s uniquement ? l'usage du destinataire. Toute > diffusion, distribution, divulgation, utilisation ou reproduction de > la pr?sente communication, et de tout fichier qui y est joint, est > strictement interdite. Si vous avez re?u le pr?sent message > ?lectronique par erreur, veuillez informer imm?diatement l'exp?diteur > et supprimer le message de votre ordinateur et de votre serveur. > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at gpfsug.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: gpfs.jpg Type: image/jpeg Size: 28904 bytes Desc: not available URL: From oehmes at gmail.com Wed Jul 15 15:33:11 2015 From: oehmes at gmail.com (Sven Oehme) Date: Wed, 15 Jul 2015 14:33:11 +0000 Subject: [gpfsug-discuss] GPFS UG 10 Presentations - Sven Oehme In-Reply-To: References: Message-ID: Hi Jon, the answer is no, its an development internal tool. sven On Wed, Jul 15, 2015 at 1:20 AM Jon Bernard wrote: > If I may revive this: is trcio publicly available? > > Jon Bernard > > On Fri, May 2, 2014 at 5:06 PM, Bob Oesterlin wrote: > >> It Sven's presentation, he mentions a tools "trcio" (in >> /xcat/oehmes/gpfs-clone) >> >> Where can I find that? >> >> Bob Oesterlin >> >> >> >> On Fri, May 2, 2014 at 9:49 AM, Jez Tucker (Chair) >> wrote: >> >>> Hello all >>> >>> Firstly, thanks for the feedback we've had so far. Very much >>> appreciated. >>> >>> Secondly, GPFS UG 10 Presentations are now available on the >>> Presentations section of the website. >>> Any outstanding presentations will follow shortly. >>> >>> See: http://www.gpfsug.org/ >>> >>> Best regards, >>> >>> Jez >>> >>> UG Chair >>> _______________________________________________ >>> gpfsug-discuss mailing list >>> gpfsug-discuss at gpfsug.org >>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >>> >> >> >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at gpfsug.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >> >> > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at gpfsug.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ewahl at osc.edu Wed Jul 15 15:37:57 2015 From: ewahl at osc.edu (Wahl, Edward) Date: Wed, 15 Jul 2015 14:37:57 +0000 Subject: [gpfsug-discuss] data interface and management infercace. In-Reply-To: <55A625BE.9000809@ebi.ac.uk> References: <559F9960.7010509@ebi.ac.uk> <55A3AF96.3060303@ebi.ac.uk> <5269A4E9-416B-4D70-AAE0-B86042FC96B9@ddn.com> <55A3BD4E.3000205@ebi.ac.uk> , <55A625BE.9000809@ebi.ac.uk> Message-ID: <9DA9EC7A281AC7428A9618AFDC49049955A606E4@CIO-KRC-D1MBX02.osuad.osu.edu> I don't see this in the thread but perhaps I missed it, what version are you running? I'm still on 3.5 so this is all based on that. A few notes for a little "heads up" here hoping to help with the pitfalls. I seem to recall a number of caveats when I did this a while back. Such as using the 'subnets' option being discussed, stops GPFS from failing over to other TCP networks when there are failures. VERY important! 'mmdiag --network' will show your setup. Definitely verify this if failing downwards is in your plans. We fail from 56Gb RDMA->10GbE TCP-> 1GbE here. And having had it work during some bad power events last year it was VERY nice that the users only noticed a slowdown when we completely lost Lustre and other resources. Also I recall that there was a restriction on having multiple private networks, and some special switch to force this. I have a note about "privateSubnetOverride" so you might read up about this. I seem to recall this was for TCP connections and daemonnodename being a private IP. Or maybe it was that AND mmlscluster having private IPs as well? I think the developerworks wiki had some writeup on this. I don't see it in the admin manuals. Hopefully this may help as you plan this out. Ed Wahl OSC ________________________________ From: gpfsug-discuss-bounces at gpfsug.org [gpfsug-discuss-bounces at gpfsug.org] on behalf of Salvatore Di Nardo [sdinardo at ebi.ac.uk] Sent: Wednesday, July 15, 2015 5:19 AM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] data interface and management infercace. Thanks for the input.. this is actually very interesting! Reading here: https://www.ibm.com/developerworks/community/wikis/home?lang=en#!/wiki/General+Parallel+File+System+%28GPFS%29/page/GPFS+Network+Communication+Overview , specifically the " Using more than one network" part it seems to me that this way we should be able to split the lease/token/ping from the data. Supposing that I implement a GSS cluster with only NDS and a second cluster with only clients: [cid:part1.03040109.00080709 at ebi.ac.uk] As far i understood if on the NDS cluster add first the subnet 10.20.0.0/16 and then 10.30.0.0 is should use the internal network for all the node-to-node comunication, leaving the 10.30.0.0/30 only for data traffic witht he remote cluster ( the clients). Similarly, in the client cluster, adding first 10.10.0.0/16 and then 10.30.0.0, will guarantee than the node-to-node comunication pass trough a different interface there the data is passing. Since the client are just "clients" the traffic trough 10.10.0.0/16 should be minimal (only token ,lease, ping and so on ) and not affected by the rest. Should be possible at this point move aldo the "admin network" on the internal interface, so we effectively splitted all the "non data" traffic on a dedicated interface. I'm wondering if I'm missing something, and in case i didn't, what could be the real traffic in the internal (black) networks ( 1g link its fine or i still need 10g for that). Another thing I I'm wondering its the load of the "non data" traffic between the clusters.. i suppose some "daemon traffic" goes trough the blue interface for the inter-cluster communication. Any thoughts ? Salvatore On 13/07/15 18:19, Muhammad Habib wrote: Did you look at "subnets" parameter used with "mmchconfig" command. I think you can use order list of subnets for daemon communication and then actual daemon interface can be used for data transfer. When the GPFS will start it will use actual daemon interface for communication , however , once its started , it will use the IPs from the subnet list whichever coming first in the list. To further validate , you can put network sniffer before you do actual implementation or alternatively you can open a PMR with IBM. If your cluster having expel situation , you may fine tune your cluster e.g. increase ping timeout period , having multiple NSD servers and distributing filesystems across these NSD servers. Also critical servers can have HBA cards installed for direct I/O through fiber. Thanks On Mon, Jul 13, 2015 at 11:22 AM, Jason Hick > wrote: Hi, Yes having separate data and management networks has been critical for us for keeping health monitoring/communication unimpeded by data movement. Not as important, but you can also tune the networks differently (packet sizes, buffer sizes, SAK, etc) which can help. Jason On Jul 13, 2015, at 7:25 AM, Vic Cornell > wrote: Hi Salvatore, I agree that that is what the manual - and some of the wiki entries say. However , when we have had problems (typically congestion) with ethernet networks in the past (20GbE or 40GbE) we have resolved them by setting up a separate ?Admin? network. The before and after cluster health we have seen measured in number of expels and waiters has been very marked. Maybe someone ?in the know? could comment on this split. Regards, Vic On 13 Jul 2015, at 14:29, Salvatore Di Nardo > wrote: Hello Vic. We are currently draining our gpfs to do all the recabling to add a management network, but looking what the admin interface does ( man mmchnode ) it says something different: --admin-interface={hostname | ip_address} Specifies the name of the node to be used by GPFS administration commands when communicating between nodes. The admin node name must be specified as an IP address or a hostname that is resolved by the host command to the desired IP address. If the keyword DEFAULT is specified, the admin interface for the node is set to be equal to the daemon interface for the node. So, seems used only for commands propagation, hence have nothing to do with the node-to-node traffic. Infact the other interface description is: --daemon-interface={hostname | ip_address} Specifies the host name or IP address to be used by the GPFS daemons for node-to-node communication. The host name or IP address must refer to the commu- nication adapter over which the GPFS daemons communicate. Alias interfaces are not allowed. Use the original address or a name that is resolved by the host command to that original address. The "expired lease" issue and file locking mechanism a( most of our expells happens when 2 clients try to write in the same file) are exactly node-to node-comunication, so im wondering what's the point to separate the "admin network". I want to be sure to plan the right changes before we do a so massive task. We are talking about adding a new interface on 700 clients, so the recabling work its not small. Regards, Salvatore On 13/07/15 14:00, Vic Cornell wrote: Hi Salavatore, Does your GSS have the facility for a 1GbE ?management? network? If so I think that changing the ?admin? node names of the cluster members to a set of IPs on the management network would give you the split that you need. What about the clients? Can they also connect to a separate admin network? Remember that if you are using multi-cluster all of the nodes in both networks must share the same admin network. Kind Regards, Vic On 13 Jul 2015, at 13:31, Salvatore Di Nardo > wrote: Anyone? On 10/07/15 11:07, Salvatore Di Nardo wrote: Hello guys. Quite a while ago i mentioned that we have a big expel issue on our gss ( first gen) and white a lot people suggested that the root cause could be that we use the same interface for all the traffic, and that we should split the data network from the admin network. Finally we could plan a downtime and we are migrating the data out so, i can soon safelly play with the change, but looking what exactly i should to do i'm a bit puzzled. Our mmlscluster looks like this: GPFS cluster information ======================== GPFS cluster name: GSS.ebi.ac.uk GPFS cluster id: 17987981184946329605 GPFS UID domain: GSS.ebi.ac.uk Remote shell command: /usr/bin/ssh Remote file copy command: /usr/bin/scp GPFS cluster configuration servers: ----------------------------------- Primary server: gss01a.ebi.ac.uk Secondary server: gss02b.ebi.ac.uk Node Daemon node name IP address Admin node name Designation ----------------------------------------------------------------------- 1 gss01a.ebi.ac.uk 10.7.28.2 gss01a.ebi.ac.uk quorum-manager 2 gss01b.ebi.ac.uk 10.7.28.3 gss01b.ebi.ac.uk quorum-manager 3 gss02a.ebi.ac.uk 10.7.28.67 gss02a.ebi.ac.uk quorum-manager 4 gss02b.ebi.ac.uk 10.7.28.66 gss02b.ebi.ac.uk quorum-manager 5 gss03a.ebi.ac.uk 10.7.28.34 gss03a.ebi.ac.uk quorum-manager 6 gss03b.ebi.ac.uk 10.7.28.35 gss03b.ebi.ac.uk quorum-manager It was my understanding that the "admin node" should use a different interface ( a 1g link copper should be fine), while the daemon node is where the data was passing , so should point to the bonded 10g interfaces. but when i read the mmchnode man page i start to be quite confused. It says: --daemon-interface={hostname | ip_address} Specifies the host name or IP address to be used by the GPFS daemons for node-to-node communication. The host name or IP address must refer to the communication adapter over which the GPFS daemons communicate. Alias interfaces are not allowed. Use the original address or a name that is resolved by the host command to that original address. --admin-interface={hostname | ip_address} Specifies the name of the node to be used by GPFS administration commands when communicating between nodes. The admin node name must be specified as an IP address or a hostname that is resolved by the host command to the desired IP address. If the keyword DEFAULT is specified, the admin interface for the node is set to be equal to the daemon interface for the node. What exactly means "node-to node-communications" ? Means DATA or also the "lease renew", and the token communication between the clients to get/steal the locks to be able to manage concurrent write to thr same file? Since we are getting expells ( especially when several clients contends the same file ) i assumed i have to split this type of packages from the data stream, but reading the documentation it looks to me that those internal comunication between nodes use the daemon-interface wich i suppose are used also for the data. so HOW exactly i can split them? Thanks in advance, Salvatore _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -- This communication contains confidential information intended only for the persons to whom it is addressed. Any other distribution, copying or disclosure is strictly prohibited. If you have received this communication in error, please notify the sender and delete this e-mail message immediately. Le pr?sent message contient des renseignements de nature confidentielle r?serv?s uniquement ? l'usage du destinataire. Toute diffusion, distribution, divulgation, utilisation ou reproduction de la pr?sente communication, et de tout fichier qui y est joint, est strictement interdite. Si vous avez re?u le pr?sent message ?lectronique par erreur, veuillez informer imm?diatement l'exp?diteur et supprimer le message de votre ordinateur et de votre serveur. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: gpfs.jpg Type: image/jpeg Size: 28904 bytes Desc: gpfs.jpg URL: From S.J.Thompson at bham.ac.uk Sun Jul 19 11:45:09 2015 From: S.J.Thompson at bham.ac.uk (Simon Thompson (Research Computing - IT Services)) Date: Sun, 19 Jul 2015 10:45:09 +0000 Subject: [gpfsug-discuss] 4.1.1 immutable filesets Message-ID: I was wondering if anyone had looked at the immutable fileset features in 4.1.1? In particular I was looking at the iam compliant mode, but I've a couple of questions. * if I have an iam compliant fileset, and it contains immutable files or directories, can I still unlink and delete the filset? * will HSM work with immutable files? I.e. Can I migrate files to tape and restore them? The docs mention that extended attributes can be updated internally by dmapi, so I guess HSM might work? Thanks Simon From kraemerf at de.ibm.com Sun Jul 19 13:45:35 2015 From: kraemerf at de.ibm.com (Frank Kraemer) Date: Sun, 19 Jul 2015 14:45:35 +0200 Subject: [gpfsug-discuss] Immutable fileset features In-Reply-To: References: Message-ID: >I was wondering if anyone had looked at the immutable fileset features in 4.1.1? yes, Nils Haustein has see: https://www.ibm.com/developerworks/community/blogs/storageneers/entry/Insight_to_the_IBM_Spectrum_Scale_GPFS_Immutability_function Frank Kraemer IBM Consulting IT Specialist / Client Technical Architect Hechtsheimer Str. 2, 55131 Mainz mailto:kraemerf at de.ibm.com voice: +49171-3043699 IBM Germany From S.J.Thompson at bham.ac.uk Sun Jul 19 14:35:47 2015 From: S.J.Thompson at bham.ac.uk (Simon Thompson (Research Computing - IT Services)) Date: Sun, 19 Jul 2015 13:35:47 +0000 Subject: [gpfsug-discuss] Immutable fileset features In-Reply-To: References: , Message-ID: Hi Frank, Yeah id read that this.morning, which is why I was asking... I couldn't see anything about HSM in there or if its possible to delete a fileset with immutable files. I remember Scott (maybe) mentioning it at the gpfs ug meeting in York, but I thought that was immutable file systems, which you have to destroy. Simon ________________________________________ From: gpfsug-discuss-bounces at gpfsug.org [gpfsug-discuss-bounces at gpfsug.org] on behalf of Frank Kraemer [kraemerf at de.ibm.com] Sent: 19 July 2015 13:45 To: gpfsug-discuss at gpfsug.org Subject: [gpfsug-discuss] Immutable fileset features >I was wondering if anyone had looked at the immutable fileset features in 4.1.1? yes, Nils Haustein has see: https://www.ibm.com/developerworks/community/blogs/storageneers/entry/Insight_to_the_IBM_Spectrum_Scale_GPFS_Immutability_function Frank Kraemer IBM Consulting IT Specialist / Client Technical Architect Hechtsheimer Str. 2, 55131 Mainz mailto:kraemerf at de.ibm.com voice: +49171-3043699 IBM Germany _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss From S.J.Thompson at bham.ac.uk Sun Jul 19 21:09:26 2015 From: S.J.Thompson at bham.ac.uk (Simon Thompson (Research Computing - IT Services)) Date: Sun, 19 Jul 2015 20:09:26 +0000 Subject: [gpfsug-discuss] Immutable fileset features In-Reply-To: References: Message-ID: On 19/07/2015 13:45, "Frank Kraemer" wrote: >>I was wondering if anyone had looked at the immutable fileset features in >4.1.1? > >yes, Nils Haustein has see: > >https://www.ibm.com/developerworks/community/blogs/storageneers/entry/Insi >ght_to_the_IBM_Spectrum_Scale_GPFS_Immutability_function I was re-reading some of this blog post and am a little confused. It talks about setting retention times by setting the ATIME from touch, or by using -E to mmchattr. Does that mean if a file is accessed, then the ATIME is updated and so the retention period is changed? What if our retention policy is based on last access time of file +period of time. I was thinking it would be useful to do a policy scan to find newly access files and then set the retention (either directly by policy if possible? Or by passing the file list to a script). Would this work or if the ATIME is overloaded, then I guess we can?t use this? Finally, is this a feature that is supported by IBM? The -E flag for mmchattr is neither in the man page nor the online docs at: http://www-01.ibm.com/support/knowledgecenter/#!/STXKQY_4.1.1/com.ibm.spect rum.scale.v4r11.adm.doc/bl1adm_mmchattr.htm (My possibly incorrect understanding was that if its documented, then is supported, otherwise it might work)? Simon From jamiedavis at us.ibm.com Mon Jul 20 13:26:17 2015 From: jamiedavis at us.ibm.com (James Davis) Date: Mon, 20 Jul 2015 08:26:17 -0400 Subject: [gpfsug-discuss] Immutable fileset features In-Reply-To: References: Message-ID: <201507200027.t6K0RD8b003417@d01av02.pok.ibm.com> Simon, I spoke to a tester who worked on this line item. She thinks mmchattr -E should have been documented. We will follow up. If it was an oversight it should be corrected soon. Jamie Jamie Davis GPFS Functional Verification Test (FVT) jamiedavis at us.ibm.com From: "Simon Thompson (Research Computing - IT Services)" To: gpfsug main discussion list Date: 19-07-15 04:09 PM Subject: Re: [gpfsug-discuss] Immutable fileset features Sent by: gpfsug-discuss-bounces at gpfsug.org On 19/07/2015 13:45, "Frank Kraemer" wrote: >>I was wondering if anyone had looked at the immutable fileset features in >4.1.1? > >yes, Nils Haustein has see: > >https://www.ibm.com/developerworks/community/blogs/storageneers/entry/Insi >ght_to_the_IBM_Spectrum_Scale_GPFS_Immutability_function I was re-reading some of this blog post and am a little confused. It talks about setting retention times by setting the ATIME from touch, or by using -E to mmchattr. Does that mean if a file is accessed, then the ATIME is updated and so the retention period is changed? What if our retention policy is based on last access time of file +period of time. I was thinking it would be useful to do a policy scan to find newly access files and then set the retention (either directly by policy if possible? Or by passing the file list to a script). Would this work or if the ATIME is overloaded, then I guess we can?t use this? Finally, is this a feature that is supported by IBM? The -E flag for mmchattr is neither in the man page nor the online docs at: http://www-01.ibm.com/support/knowledgecenter/#!/STXKQY_4.1.1/com.ibm.spect rum.scale.v4r11.adm.doc/bl1adm_mmchattr.htm (My possibly incorrect understanding was that if its documented, then is supported, otherwise it might work)? Simon _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: From Luke.Raimbach at crick.ac.uk Mon Jul 20 08:02:01 2015 From: Luke.Raimbach at crick.ac.uk (Luke Raimbach) Date: Mon, 20 Jul 2015 07:02:01 +0000 Subject: [gpfsug-discuss] 4.1.1 immutable filesets In-Reply-To: References: Message-ID: Can I add to this list of questions? Apparently, one cannot set immutable, or append-only attributes on files / directories within an AFM cache. However, if I have an independent writer and set immutability at home, what does the AFM IW cache do about this? Or does this restriction just apply to entire filesets (which would make more sense)? Cheers, Luke. -----Original Message----- From: gpfsug-discuss-bounces at gpfsug.org [mailto:gpfsug-discuss-bounces at gpfsug.org] On Behalf Of Simon Thompson (Research Computing - IT Services) Sent: 19 July 2015 11:45 To: gpfsug main discussion list Subject: [gpfsug-discuss] 4.1.1 immutable filesets I was wondering if anyone had looked at the immutable fileset features in 4.1.1? In particular I was looking at the iam compliant mode, but I've a couple of questions. * if I have an iam compliant fileset, and it contains immutable files or directories, can I still unlink and delete the filset? * will HSM work with immutable files? I.e. Can I migrate files to tape and restore them? The docs mention that extended attributes can be updated internally by dmapi, so I guess HSM might work? Thanks Simon _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss The Francis Crick Institute Limited is a registered charity in England and Wales no. 1140062 and a company registered in England and Wales no. 06885462, with its registered office at 215 Euston Road, London NW1 2BE. From kallbac at iu.edu Wed Jul 22 11:50:58 2015 From: kallbac at iu.edu (Kristy Kallback-Rose) Date: Wed, 22 Jul 2015 06:50:58 -0400 Subject: [gpfsug-discuss] SMB support and config In-Reply-To: References: <0193637F-4448-4068-A049-3A40A9AFD998@iu.edu> Message-ID: <203758A6-C7E0-4D3F-BA31-A130CF92DCBC@iu.edu> Yes interested, please post. We?ll probably keep running Samba separately, as we do today, for quite some time, but will be facing this transition at some point so we can be supported by IBM for Samba. On Jul 10, 2015, at 8:06 AM, Simon Thompson (Research Computing - IT Services) wrote: > So IBM came back and said what I was doing wasn?t supported. > > They did say that you can use ?user defined? authentication. Which I?ve > got working now on my environment (figured what I was doing wrong, and you > can?t use mmsmb to do some of the bits I need for it to work for user > defined mode for me...). But I still think it needs a patch to one of the > files for CES for use in user defined authentication. (Right now it > appears to remove all my ?user defined? settings from nsswitch.conf when > you stop CES/GPFS on a node). I?ve supplied my patch to IBM which works > for my case, we?ll see what they do about it? > > (If people are interested, I?ll gather my notes into a blog post). > > Simon > > On 06/07/2015 23:06, "Kallback-Rose, Kristy A" wrote: > >> Just to chime in as another interested party, we do something fairly >> similar but use sssd instead of nslcd. Very interested to see how >> accommodating the IBM Samba is to local configuration needs. >> >> Best, >> Kristy >> >> On Jul 6, 2015, at 6:09 AM, Simon Thompson (Research Computing - IT >> Services) wrote: >> >>> Hi, >>> >>> (sorry, lots of questions about this stuff at the moment!) >>> >>> I?m currently looking at removing the sernet smb configs we had >>> previously >>> and moving to IBM SMB. I?ve removed all the old packages and only now >>> have >>> gpfs.smb installed on the systems. >>> >>> I?m struggling to get the config tools to work for our environment. >>> >>> We have MS Windows AD Domain for authentication. For various reasons, >>> however doesn?t hold the UIDs/GIDs, which are instead held in a >>> different >>> LDAP directory. >>> >>> In the past, we?d configure the Linux servers running Samba so that >>> NSLCD >>> was configured to get details from the LDAP server. (e.g. getent passwd >>> would return the data for an AD user). The Linux boxes would also be >>> configured to use KRB5 authentication where users were allowed to ssh >>> etc >>> in for password authentication. >>> >>> So as far as Samba was concerned, it would do ?security = ADS? and then >>> we?d also have "idmap config * : backend = tdb2? >>> >>> I.e. Use Domain for authentication, but look locally for ID mapping >>> data. >>> >>> Now I can configured IBM SMB to use ADS for authentication: >>> >>> mmuserauth service create --type ad --data-access-method file >>> --netbios-name its-rds --user-name ADMINUSER --servers DOMAIN.ADF >>> --idmap-role subordinate >>> >>> >>> However I can?t see anyway for me to manipulate the config so that it >>> doesn?t use autorid. Using this we end up with: >>> >>> mmsmb config list | grep -i idmap >>> idmap config * : backend autorid >>> idmap config * : range 10000000-299999999 >>> idmap config * : rangesize 1000000 >>> idmap config * : read only yes >>> idmap:cache no >>> >>> >>> It also adds: >>> >>> mmsmb config list | grep -i auth >>> auth methods guest sam winbind >>> >>> (though I don?t think that is a problem). >>> >>> >>> I also can?t change the idmap using the mmsmb command (I think would >>> look >>> like this): >>> # mmsmb config change --option="idmap config * : backend=tdb2" >>> idmap config * : backend=tdb2: [E] Unsupported smb option. More >>> information about smb options is availabe in the man page. >>> >>> >>> >>> I can?t see anything in the docs at: >>> >>> http://www-01.ibm.com/support/knowledgecenter/#!/STXKQY_4.1.1/com.ibm.spe >>> ct >>> rum.scale.v4r11.adm.doc/bl1adm_configfileauthentication.htm >>> >>> That give me a clue how to do what I want. >>> >>> I?d be happy to do some mixture of AD for authentication and LDAP for >>> lookups (rather than just falling back to ?local? from nslcd), but I >>> can?t >>> see a way to do this, and ?manual? seems to stop ADS authentication in >>> Samba. >>> >>> Anyone got any suggestions? >>> >>> >>> Thanks >>> >>> Simon >>> >>> >>> _______________________________________________ >>> gpfsug-discuss mailing list >>> gpfsug-discuss at gpfsug.org >>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >> >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at gpfsug.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at gpfsug.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 495 bytes Desc: Message signed with OpenPGP using GPGMail URL: From S.J.Thompson at bham.ac.uk Wed Jul 22 11:59:56 2015 From: S.J.Thompson at bham.ac.uk (Simon Thompson (Research Computing - IT Services)) Date: Wed, 22 Jul 2015 10:59:56 +0000 Subject: [gpfsug-discuss] SMB support and config In-Reply-To: <203758A6-C7E0-4D3F-BA31-A130CF92DCBC@iu.edu> References: <0193637F-4448-4068-A049-3A40A9AFD998@iu.edu> , <203758A6-C7E0-4D3F-BA31-A130CF92DCBC@iu.edu> Message-ID: Hi Kristy, Funny you should ask, I wrote it up last night... http://www.roamingzebra.co.uk/2015/07/smb-protocol-support-with-spectrum.html They did tell me it was all tested with Samba 4, so should work, subject to you checking your own smb config options. But i like not having to build it myself now ;) The move was actually pretty easy and in theory you can run mixed over existing nodes and upgraded protocol nodes, but you might need a different clustered name. Simon ________________________________________ From: gpfsug-discuss-bounces at gpfsug.org [gpfsug-discuss-bounces at gpfsug.org] on behalf of Kristy Kallback-Rose [kallbac at iu.edu] Sent: 22 July 2015 11:50 To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] SMB support and config Yes interested, please post. We?ll probably keep running Samba separately, as we do today, for quite some time, but will be facing this transition at some point so we can be supported by IBM for Samba. On Jul 10, 2015, at 8:06 AM, Simon Thompson (Research Computing - IT Services) wrote: > So IBM came back and said what I was doing wasn?t supported. > > They did say that you can use ?user defined? authentication. Which I?ve > got working now on my environment (figured what I was doing wrong, and you > can?t use mmsmb to do some of the bits I need for it to work for user > defined mode for me...). But I still think it needs a patch to one of the > files for CES for use in user defined authentication. (Right now it > appears to remove all my ?user defined? settings from nsswitch.conf when > you stop CES/GPFS on a node). I?ve supplied my patch to IBM which works > for my case, we?ll see what they do about it? > > (If people are interested, I?ll gather my notes into a blog post). > > Simon > > On 06/07/2015 23:06, "Kallback-Rose, Kristy A" wrote: > >> Just to chime in as another interested party, we do something fairly >> similar but use sssd instead of nslcd. Very interested to see how >> accommodating the IBM Samba is to local configuration needs. >> >> Best, >> Kristy >> >> On Jul 6, 2015, at 6:09 AM, Simon Thompson (Research Computing - IT >> Services) wrote: >> >>> Hi, >>> >>> (sorry, lots of questions about this stuff at the moment!) >>> >>> I?m currently looking at removing the sernet smb configs we had >>> previously >>> and moving to IBM SMB. I?ve removed all the old packages and only now >>> have >>> gpfs.smb installed on the systems. >>> >>> I?m struggling to get the config tools to work for our environment. >>> >>> We have MS Windows AD Domain for authentication. For various reasons, >>> however doesn?t hold the UIDs/GIDs, which are instead held in a >>> different >>> LDAP directory. >>> >>> In the past, we?d configure the Linux servers running Samba so that >>> NSLCD >>> was configured to get details from the LDAP server. (e.g. getent passwd >>> would return the data for an AD user). The Linux boxes would also be >>> configured to use KRB5 authentication where users were allowed to ssh >>> etc >>> in for password authentication. >>> >>> So as far as Samba was concerned, it would do ?security = ADS? and then >>> we?d also have "idmap config * : backend = tdb2? >>> >>> I.e. Use Domain for authentication, but look locally for ID mapping >>> data. >>> >>> Now I can configured IBM SMB to use ADS for authentication: >>> >>> mmuserauth service create --type ad --data-access-method file >>> --netbios-name its-rds --user-name ADMINUSER --servers DOMAIN.ADF >>> --idmap-role subordinate >>> >>> >>> However I can?t see anyway for me to manipulate the config so that it >>> doesn?t use autorid. Using this we end up with: >>> >>> mmsmb config list | grep -i idmap >>> idmap config * : backend autorid >>> idmap config * : range 10000000-299999999 >>> idmap config * : rangesize 1000000 >>> idmap config * : read only yes >>> idmap:cache no >>> >>> >>> It also adds: >>> >>> mmsmb config list | grep -i auth >>> auth methods guest sam winbind >>> >>> (though I don?t think that is a problem). >>> >>> >>> I also can?t change the idmap using the mmsmb command (I think would >>> look >>> like this): >>> # mmsmb config change --option="idmap config * : backend=tdb2" >>> idmap config * : backend=tdb2: [E] Unsupported smb option. More >>> information about smb options is availabe in the man page. >>> >>> >>> >>> I can?t see anything in the docs at: >>> >>> http://www-01.ibm.com/support/knowledgecenter/#!/STXKQY_4.1.1/com.ibm.spe >>> ct >>> rum.scale.v4r11.adm.doc/bl1adm_configfileauthentication.htm >>> >>> That give me a clue how to do what I want. >>> >>> I?d be happy to do some mixture of AD for authentication and LDAP for >>> lookups (rather than just falling back to ?local? from nslcd), but I >>> can?t >>> see a way to do this, and ?manual? seems to stop ADS authentication in >>> Samba. >>> >>> Anyone got any suggestions? >>> >>> >>> Thanks >>> >>> Simon >>> >>> >>> _______________________________________________ >>> gpfsug-discuss mailing list >>> gpfsug-discuss at gpfsug.org >>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >> >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at gpfsug.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at gpfsug.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss From mhabib73 at gmail.com Wed Jul 22 13:58:51 2015 From: mhabib73 at gmail.com (Muhammad Habib) Date: Wed, 22 Jul 2015 08:58:51 -0400 Subject: [gpfsug-discuss] data interface and management infercace. In-Reply-To: <55A625BE.9000809@ebi.ac.uk> References: <559F9960.7010509@ebi.ac.uk> <55A3AF96.3060303@ebi.ac.uk> <5269A4E9-416B-4D70-AAE0-B86042FC96B9@ddn.com> <55A3BD4E.3000205@ebi.ac.uk> <55A625BE.9000809@ebi.ac.uk> Message-ID: did you implement it ? looks ok. All daemon traffic should be going through black network including inter-cluster daemon traffic ( assume black subnet routable). All data traffic should be going through the blue network. You may need to run iptrace or tcpdump to make sure proper network are in use. You can always open a PMR if you having issue during the configuration . Thanks On Wed, Jul 15, 2015 at 5:19 AM, Salvatore Di Nardo wrote: > Thanks for the input.. this is actually very interesting! > > Reading here: > https://www.ibm.com/developerworks/community/wikis/home?lang=en#!/wiki/General+Parallel+File+System+%28GPFS%29/page/GPFS+Network+Communication+Overview > , > specifically the " Using more than one network" part it seems to me that > this way we should be able to split the lease/token/ping from the data. > > Supposing that I implement a GSS cluster with only NDS and a second > cluster with only clients: > > > > As far i understood if on the NDS cluster add first the subnet > 10.20.0.0/16 and then 10.30.0.0 is should use the internal network for > all the node-to-node comunication, leaving the 10.30.0.0/30 only for data > traffic witht he remote cluster ( the clients). Similarly, in the client > cluster, adding first 10.10.0.0/16 and then 10.30.0.0, will guarantee > than the node-to-node comunication pass trough a different interface there > the data is passing. Since the client are just "clients" the traffic trough > 10.10.0.0/16 should be minimal (only token ,lease, ping and so on ) and > not affected by the rest. Should be possible at this point move aldo the > "admin network" on the internal interface, so we effectively splitted all > the "non data" traffic on a dedicated interface. > > I'm wondering if I'm missing something, and in case i didn't, what could > be the real traffic in the internal (black) networks ( 1g link its fine or > i still need 10g for that). Another thing I I'm wondering its the load of > the "non data" traffic between the clusters.. i suppose some "daemon > traffic" goes trough the blue interface for the inter-cluster > communication. > > > Any thoughts ? > > Salvatore > > On 13/07/15 18:19, Muhammad Habib wrote: > > Did you look at "subnets" parameter used with "mmchconfig" command. I > think you can use order list of subnets for daemon communication and then > actual daemon interface can be used for data transfer. When the GPFS will > start it will use actual daemon interface for communication , however , > once its started , it will use the IPs from the subnet list whichever > coming first in the list. To further validate , you can put network > sniffer before you do actual implementation or alternatively you can open a > PMR with IBM. > > If your cluster having expel situation , you may fine tune your cluster > e.g. increase ping timeout period , having multiple NSD servers and > distributing filesystems across these NSD servers. Also critical servers > can have HBA cards installed for direct I/O through fiber. > > Thanks > > On Mon, Jul 13, 2015 at 11:22 AM, Jason Hick wrote: > >> Hi, >> >> Yes having separate data and management networks has been critical for >> us for keeping health monitoring/communication unimpeded by data movement. >> >> Not as important, but you can also tune the networks differently >> (packet sizes, buffer sizes, SAK, etc) which can help. >> >> Jason >> >> On Jul 13, 2015, at 7:25 AM, Vic Cornell wrote: >> >> Hi Salvatore, >> >> I agree that that is what the manual - and some of the wiki entries say. >> >> However , when we have had problems (typically congestion) with >> ethernet networks in the past (20GbE or 40GbE) we have resolved them by >> setting up a separate ?Admin? network. >> >> The before and after cluster health we have seen measured in number of >> expels and waiters has been very marked. >> >> Maybe someone ?in the know? could comment on this split. >> >> Regards, >> >> Vic >> >> >> On 13 Jul 2015, at 14:29, Salvatore Di Nardo wrote: >> >> Hello Vic. >> We are currently draining our gpfs to do all the recabling to add a >> management network, but looking what the admin interface does ( man >> mmchnode ) it says something different: >> >> --admin-interface={hostname | ip_address} >> Specifies the name of the node to be used by >> GPFS administration commands when communicating between nodes. The admin >> node name must be specified as an IP >> address or a hostname that is resolved by the >> host command to the desired IP address. If the keyword DEFAULT is >> specified, the admin interface for the >> node is set to be equal to the daemon interface >> for the node. >> >> >> So, seems used only for commands propagation, hence have nothing to do >> with the node-to-node traffic. Infact the other interface description is: >> >> --daemon-interface={hostname | ip_address} >> Specifies the host name or IP address *to be >> used by the GPFS daemons for node-to-node communication*. The host name >> or IP address must refer to the commu- >> nication adapter over which the GPFS daemons >> communicate. Alias interfaces are not allowed. Use the original address or >> a name that is resolved by the >> host command to that original address. >> >> >> The "expired lease" issue and file locking mechanism a( most of our >> expells happens when 2 clients try to write in the same file) are exactly >> node-to node-comunication, so im wondering what's the point to separate >> the "admin network". I want to be sure to plan the right changes before we >> do a so massive task. We are talking about adding a new interface on 700 >> clients, so the recabling work its not small. >> >> >> Regards, >> Salvatore >> >> >> >> On 13/07/15 14:00, Vic Cornell wrote: >> >> Hi Salavatore, >> >> Does your GSS have the facility for a 1GbE ?management? network? If so >> I think that changing the ?admin? node names of the cluster members to a >> set of IPs on the management network would give you the split that you need. >> >> What about the clients? Can they also connect to a separate admin >> network? >> >> Remember that if you are using multi-cluster all of the nodes in both >> networks must share the same admin network. >> >> Kind Regards, >> >> Vic >> >> >> On 13 Jul 2015, at 13:31, Salvatore Di Nardo wrote: >> >> Anyone? >> >> On 10/07/15 11:07, Salvatore Di Nardo wrote: >> >> Hello guys. >> Quite a while ago i mentioned that we have a big expel issue on our gss >> ( first gen) and white a lot people suggested that the root cause could be >> that we use the same interface for all the traffic, and that we should >> split the data network from the admin network. Finally we could plan a >> downtime and we are migrating the data out so, i can soon safelly play with >> the change, but looking what exactly i should to do i'm a bit puzzled. Our >> mmlscluster looks like this: >> >> GPFS cluster information >> ======================== >> GPFS cluster name: GSS.ebi.ac.uk >> GPFS cluster id: 17987981184946329605 >> GPFS UID domain: GSS.ebi.ac.uk >> Remote shell command: /usr/bin/ssh >> Remote file copy command: /usr/bin/scp >> >> GPFS cluster configuration servers: >> ----------------------------------- >> Primary server: gss01a.ebi.ac.uk >> Secondary server: gss02b.ebi.ac.uk >> >> Node Daemon node name IP address Admin node name Designation >> ----------------------------------------------------------------------- >> 1 gss01a.ebi.ac.uk 10.7.28.2 gss01a.ebi.ac.uk quorum-manager >> 2 gss01b.ebi.ac.uk 10.7.28.3 gss01b.ebi.ac.uk quorum-manager >> 3 gss02a.ebi.ac.uk 10.7.28.67 gss02a.ebi.ac.uk quorum-manager >> 4 gss02b.ebi.ac.uk 10.7.28.66 gss02b.ebi.ac.uk quorum-manager >> 5 gss03a.ebi.ac.uk 10.7.28.34 gss03a.ebi.ac.uk quorum-manager >> 6 gss03b.ebi.ac.uk 10.7.28.35 gss03b.ebi.ac.uk quorum-manager >> >> >> It was my understanding that the "admin node" should use a different >> interface ( a 1g link copper should be fine), while the daemon node is >> where the data was passing , so should point to the bonded 10g interfaces. >> but when i read the mmchnode man page i start to be quite confused. It says: >> >> --daemon-interface={hostname | ip_address} >> Specifies the host name or IP address *to be >> used by the GPFS daemons for node-to-node communication*. The host name >> or IP address must refer to the communication adapter over which the GPFS >> daemons communicate. >> Alias interfaces are not allowed. Use the >> original address or a name that is resolved by the host command to that >> original address. >> >> --admin-interface={hostname | ip_address} >> Specifies the name of the node to be used by >> GPFS administration commands when communicating between nodes. The admin >> node name must be specified as an IP address or a hostname that is resolved >> by the host command >> to the desired IP address. If the keyword >> DEFAULT is specified, the admin interface for the node is set to be equal >> to the daemon interface for the node. >> >> What exactly means "node-to node-communications" ? >> Means DATA or also the "lease renew", and the token communication between >> the clients to get/steal the locks to be able to manage concurrent write to >> thr same file? >> Since we are getting expells ( especially when several clients contends >> the same file ) i assumed i have to split this type of packages from the >> data stream, but reading the documentation it looks to me that those >> internal comunication between nodes use the daemon-interface wich i suppose >> are used also for the data. so HOW exactly i can split them? >> >> >> Thanks in advance, >> Salvatore >> >> >> >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at gpfsug.orghttp://gpfsug.org/mailman/listinfo/gpfsug-discuss >> >> >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at gpfsug.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >> >> >> >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at gpfsug.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >> >> >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at gpfsug.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >> >> >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at gpfsug.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >> >> > > > -- > This communication contains confidential information intended only for the > persons to whom it is addressed. Any other distribution, copying or > disclosure is strictly prohibited. If you have received this communication > in error, please notify the sender and delete this e-mail message > immediately. > > Le pr?sent message contient des renseignements de nature confidentielle > r?serv?s uniquement ? l'usage du destinataire. Toute diffusion, > distribution, divulgation, utilisation ou reproduction de la pr?sente > communication, et de tout fichier qui y est joint, est strictement > interdite. Si vous avez re?u le pr?sent message ?lectronique par erreur, > veuillez informer imm?diatement l'exp?diteur et supprimer le message de > votre ordinateur et de votre serveur. > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at gpfsug.orghttp://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at gpfsug.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > -- This communication contains confidential information intended only for the persons to whom it is addressed. Any other distribution, copying or disclosure is strictly prohibited. If you have received this communication in error, please notify the sender and delete this e-mail message immediately. Le pr?sent message contient des renseignements de nature confidentielle r?serv?s uniquement ? l'usage du destinataire. Toute diffusion, distribution, divulgation, utilisation ou reproduction de la pr?sente communication, et de tout fichier qui y est joint, est strictement interdite. Si vous avez re?u le pr?sent message ?lectronique par erreur, veuillez informer imm?diatement l'exp?diteur et supprimer le message de votre ordinateur et de votre serveur. -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: gpfs.jpg Type: image/jpeg Size: 28904 bytes Desc: not available URL: From sdinardo at ebi.ac.uk Wed Jul 22 14:51:04 2015 From: sdinardo at ebi.ac.uk (Salvatore Di Nardo) Date: Wed, 22 Jul 2015 14:51:04 +0100 Subject: [gpfsug-discuss] data interface and management infercace. In-Reply-To: References: <559F9960.7010509@ebi.ac.uk> <55A3AF96.3060303@ebi.ac.uk> <5269A4E9-416B-4D70-AAE0-B86042FC96B9@ddn.com> <55A3BD4E.3000205@ebi.ac.uk> <55A625BE.9000809@ebi.ac.uk> Message-ID: <55AF9FC8.6050107@ebi.ac.uk> Hello, no, still didn't anything because we have to drain 2PB data , into a slower storage.. so it will take few weeks. I expect doing it the second half of August. Will let you all know the results once done and properly tested. Salvatore On 22/07/15 13:58, Muhammad Habib wrote: > did you implement it ? looks ok. All daemon traffic should be going > through black network including inter-cluster daemon traffic ( assume > black subnet routable). All data traffic should be going through the > blue network. You may need to run iptrace or tcpdump to make sure > proper network are in use. You can always open a PMR if you having > issue during the configuration . > > Thanks > > On Wed, Jul 15, 2015 at 5:19 AM, Salvatore Di Nardo > > wrote: > > Thanks for the input.. this is actually very interesting! > > Reading here: > https://www.ibm.com/developerworks/community/wikis/home?lang=en#!/wiki/General+Parallel+File+System+%28GPFS%29/page/GPFS+Network+Communication+Overview > > , > specifically the " Using more than one network" part it seems to > me that this way we should be able to split the lease/token/ping > from the data. > > Supposing that I implement a GSS cluster with only NDS and a > second cluster with only clients: > > > > As far i understood if on the NDS cluster add first the subnet > 10.20.0.0/16 and then 10.30.0.0 is should > use the internal network for all the node-to-node comunication, > leaving the 10.30.0.0/30 only for data > traffic witht he remote cluster ( the clients). Similarly, in the > client cluster, adding first 10.10.0.0/16 > and then 10.30.0.0, will guarantee than the node-to-node > comunication pass trough a different interface there the data is > passing. Since the client are just "clients" the traffic trough > 10.10.0.0/16 should be minimal (only token > ,lease, ping and so on ) and not affected by the rest. Should be > possible at this point move aldo the "admin network" on the > internal interface, so we effectively splitted all the "non data" > traffic on a dedicated interface. > > I'm wondering if I'm missing something, and in case i didn't, what > could be the real traffic in the internal (black) networks ( 1g > link its fine or i still need 10g for that). Another thing I I'm > wondering its the load of the "non data" traffic between the > clusters.. i suppose some "daemon traffic" goes trough the blue > interface for the inter-cluster communication. > > > Any thoughts ? > > Salvatore > > On 13/07/15 18:19, Muhammad Habib wrote: >> Did you look at "subnets" parameter used with "mmchconfig" >> command. I think you can use order list of subnets for daemon >> communication and then actual daemon interface can be used for >> data transfer. When the GPFS will start it will use actual >> daemon interface for communication , however , once its started , >> it will use the IPs from the subnet list whichever coming first >> in the list. To further validate , you can put network sniffer >> before you do actual implementation or alternatively you can open >> a PMR with IBM. >> >> If your cluster having expel situation , you may fine tune your >> cluster e.g. increase ping timeout period , having multiple NSD >> servers and distributing filesystems across these NSD servers. >> Also critical servers can have HBA cards installed for direct I/O >> through fiber. >> >> Thanks >> >> On Mon, Jul 13, 2015 at 11:22 AM, Jason Hick > > wrote: >> >> Hi, >> >> Yes having separate data and management networks has been >> critical for us for keeping health monitoring/communication >> unimpeded by data movement. >> >> Not as important, but you can also tune the networks >> differently (packet sizes, buffer sizes, SAK, etc) which can >> help. >> >> Jason >> >> On Jul 13, 2015, at 7:25 AM, Vic Cornell >> > wrote: >> >>> Hi Salvatore, >>> >>> I agree that that is what the manual - and some of the wiki >>> entries say. >>> >>> However , when we have had problems (typically congestion) >>> with ethernet networks in the past (20GbE or 40GbE) we have >>> resolved them by setting up a separate ?Admin? network. >>> >>> The before and after cluster health we have seen measured in >>> number of expels and waiters has been very marked. >>> >>> Maybe someone ?in the know? could comment on this split. >>> >>> Regards, >>> >>> Vic >>> >>> >>>> On 13 Jul 2015, at 14:29, Salvatore Di Nardo >>>> > wrote: >>>> >>>> Hello Vic. >>>> We are currently draining our gpfs to do all the recabling >>>> to add a management network, but looking what the admin >>>> interface does ( man mmchnode ) it says something different: >>>> >>>> --admin-interface={hostname | ip_address} >>>> Specifies the name of the node to be used by GPFS >>>> administration commands when communicating between >>>> nodes. The admin node name must be specified as an IP >>>> address or a hostname that is resolved by the host >>>> command to the desired IP address. If the keyword >>>> DEFAULT is specified, the admin interface for the >>>> node is set to be equal to the daemon interface for >>>> the node. >>>> >>>> >>>> So, seems used only for commands propagation, hence have >>>> nothing to do with the node-to-node traffic. Infact the >>>> other interface description is: >>>> >>>> --daemon-interface={hostname | ip_address} >>>> Specifies the host name or IP address _*to be used >>>> by the GPFS daemons for node-to-node >>>> communication*_. The host name or IP address must >>>> refer to the commu- >>>> nication adapter over which the GPFS daemons >>>> communicate. Alias interfaces are not allowed. Use >>>> the original address or a name that is resolved >>>> by the >>>> host command to that original address. >>>> >>>> >>>> The "expired lease" issue and file locking mechanism a( >>>> most of our expells happens when 2 clients try to write in >>>> the same file) are exactly node-to node-comunication, so >>>> im wondering what's the point to separate the "admin >>>> network". I want to be sure to plan the right changes >>>> before we do a so massive task. We are talking about adding >>>> a new interface on 700 clients, so the recabling work its >>>> not small. >>>> >>>> >>>> Regards, >>>> Salvatore >>>> >>>> >>>> >>>> On 13/07/15 14:00, Vic Cornell wrote: >>>>> Hi Salavatore, >>>>> >>>>> Does your GSS have the facility for a 1GbE ?management? >>>>> network? If so I think that changing the ?admin? node >>>>> names of the cluster members to a set of IPs on the >>>>> management network would give you the split that you need. >>>>> >>>>> What about the clients? Can they also connect to a >>>>> separate admin network? >>>>> >>>>> Remember that if you are using multi-cluster all of the >>>>> nodes in both networks must share the same admin network. >>>>> >>>>> Kind Regards, >>>>> >>>>> Vic >>>>> >>>>> >>>>>> On 13 Jul 2015, at 13:31, Salvatore Di Nardo >>>>>> > wrote: >>>>>> >>>>>> Anyone? >>>>>> >>>>>> On 10/07/15 11:07, Salvatore Di Nardo wrote: >>>>>>> Hello guys. >>>>>>> Quite a while ago i mentioned that we have a big expel >>>>>>> issue on our gss ( first gen) and white a lot people >>>>>>> suggested that the root cause could be that we use the >>>>>>> same interface for all the traffic, and that we should >>>>>>> split the data network from the admin network. Finally >>>>>>> we could plan a downtime and we are migrating the data >>>>>>> out so, i can soon safelly play with the change, but >>>>>>> looking what exactly i should to do i'm a bit puzzled. >>>>>>> Our mmlscluster looks like this: >>>>>>> >>>>>>> GPFS cluster information >>>>>>> ======================== >>>>>>> GPFS cluster name: GSS.ebi.ac.uk >>>>>>> >>>>>>> GPFS cluster id: 17987981184946329605 >>>>>>> GPFS UID domain: GSS.ebi.ac.uk >>>>>>> >>>>>>> Remote shell command: /usr/bin/ssh >>>>>>> Remote file copy command: /usr/bin/scp >>>>>>> >>>>>>> GPFS cluster configuration servers: >>>>>>> ----------------------------------- >>>>>>> Primary server: gss01a.ebi.ac.uk >>>>>>> >>>>>>> Secondary server: gss02b.ebi.ac.uk >>>>>>> >>>>>>> >>>>>>> Node Daemon node name IP address Admin >>>>>>> node name Designation >>>>>>> ----------------------------------------------------------------------- >>>>>>> 1 gss01a.ebi.ac.uk >>>>>>> 10.7.28.2 >>>>>>> gss01a.ebi.ac.uk >>>>>>> quorum-manager >>>>>>> 2 gss01b.ebi.ac.uk >>>>>>> 10.7.28.3 >>>>>>> gss01b.ebi.ac.uk >>>>>>> quorum-manager >>>>>>> 3 gss02a.ebi.ac.uk >>>>>>> 10.7.28.67 >>>>>>> gss02a.ebi.ac.uk >>>>>>> quorum-manager >>>>>>> 4 gss02b.ebi.ac.uk >>>>>>> 10.7.28.66 >>>>>>> gss02b.ebi.ac.uk >>>>>>> quorum-manager >>>>>>> 5 gss03a.ebi.ac.uk >>>>>>> 10.7.28.34 >>>>>>> gss03a.ebi.ac.uk >>>>>>> quorum-manager >>>>>>> 6 gss03b.ebi.ac.uk >>>>>>> 10.7.28.35 >>>>>>> gss03b.ebi.ac.uk >>>>>>> quorum-manager >>>>>>> >>>>>>> >>>>>>> It was my understanding that the "admin node" should use >>>>>>> a different interface ( a 1g link copper should be >>>>>>> fine), while the daemon node is where the data was >>>>>>> passing , so should point to the bonded 10g interfaces. >>>>>>> but when i read the mmchnode man page i start to be >>>>>>> quite confused. It says: >>>>>>> >>>>>>> --daemon-interface={hostname | ip_address} >>>>>>> Specifies the host name or IP address _*to be used by >>>>>>> the GPFS daemons for node-to-node communication*_. The >>>>>>> host name or IP address must refer to the communication >>>>>>> adapter over which the GPFS daemons communicate. >>>>>>> Alias interfaces are not allowed. Use the >>>>>>> original address or a name that is resolved by the host >>>>>>> command to that original address. >>>>>>> >>>>>>> --admin-interface={hostname | ip_address} >>>>>>> Specifies the name of the node to be used by GPFS >>>>>>> administration commands when communicating between >>>>>>> nodes. The admin node name must be specified as an IP >>>>>>> address or a hostname that is resolved by the host command >>>>>>> tothe desired IP address. If the keyword >>>>>>> DEFAULT is specified, the admin interface for the node >>>>>>> is set to be equal to the daemon interface for the node. >>>>>>> >>>>>>> What exactly means "node-to node-communications" ? >>>>>>> Means DATA or also the "lease renew", and the token >>>>>>> communication between the clients to get/steal the locks >>>>>>> to be able to manage concurrent write to thr same file? >>>>>>> Since we are getting expells ( especially when several >>>>>>> clients contends the same file ) i assumed i have to >>>>>>> split this type of packages from the data stream, but >>>>>>> reading the documentation it looks to me that those >>>>>>> internal comunication between nodes use the >>>>>>> daemon-interface wich i suppose are used also for the >>>>>>> data. so HOW exactly i can split them? >>>>>>> >>>>>>> >>>>>>> Thanks in advance, >>>>>>> Salvatore >>>>>>> >>>>>>> >>>>>>> >>>>>>> _______________________________________________ >>>>>>> gpfsug-discuss mailing list >>>>>>> gpfsug-discuss atgpfsug.org >>>>>>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >>>>>> >>>>>> _______________________________________________ >>>>>> gpfsug-discuss mailing list >>>>>> gpfsug-discuss at gpfsug.org >>>>>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >>>>> >>>> >>>> _______________________________________________ >>>> gpfsug-discuss mailing list >>>> gpfsug-discuss at gpfsug.org >>>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >>> >>> _______________________________________________ >>> gpfsug-discuss mailing list >>> gpfsug-discuss at gpfsug.org >>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >> >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at gpfsug.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >> >> >> >> >> -- >> This communication contains confidential information intended >> only for the persons to whom it is addressed. Any other >> distribution, copying or disclosure is strictly prohibited. If >> you have received this communication in error, please notify the >> sender and delete this e-mail message immediately. >> >> Le pr?sent message contient des renseignements de nature >> confidentielle r?serv?s uniquement ? l'usage du destinataire. >> Toute diffusion, distribution, divulgation, utilisation ou >> reproduction de la pr?sente communication, et de tout fichier qui >> y est joint, est strictement interdite. Si vous avez re?u le >> pr?sent message ?lectronique par erreur, veuillez informer >> imm?diatement l'exp?diteur et supprimer le message de votre >> ordinateur et de votre serveur. >> >> >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss atgpfsug.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at gpfsug.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > > -- > This communication contains confidential information intended only for > the persons to whom it is addressed. Any other distribution, copying > or disclosure is strictly prohibited. If you have received this > communication in error, please notify the sender and delete this > e-mail message immediately. > > Le pr?sent message contient des renseignements de nature > confidentielle r?serv?s uniquement ? l'usage du destinataire. Toute > diffusion, distribution, divulgation, utilisation ou reproduction de > la pr?sente communication, et de tout fichier qui y est joint, est > strictement interdite. Si vous avez re?u le pr?sent message > ?lectronique par erreur, veuillez informer imm?diatement l'exp?diteur > et supprimer le message de votre ordinateur et de votre serveur. > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at gpfsug.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/jpeg Size: 28904 bytes Desc: not available URL: From S.J.Thompson at bham.ac.uk Mon Jul 27 22:24:11 2015 From: S.J.Thompson at bham.ac.uk (Simon Thompson (Research Computing - IT Services)) Date: Mon, 27 Jul 2015 21:24:11 +0000 Subject: [gpfsug-discuss] GPFS and Community Scientific Cloud Message-ID: Hi Ulf, Thanks for the email, as suggested, I'm copying this to the GPFS UG mailing list as well as I'm sure the discussion is of interest to others. I guess what we're looking to do is to have arbitrary VMs running provided by users (I.e. Completely untrusted), but to provide them a way to get secure access to only their data. Right now we can't give them a GPFS client as this is too trusting, I was wondering how easy it would be for us to implement something like: User has a VM User runs 'kinit user at DOMAIN' to gain kerberos ticket and can then securely gain access to only their files from my NFS server. I also mentioned Janet ASSENT, which is a relatively recent project: https://jisc.ac.uk/assent (It was piloted as Janet Moonshot). Which builds on top of SAML to provide other software access to federation. My understanding is that site-specific UID mapping is needed (e.g. On the NFS/GPFS server). Simon >I have some experience with the following questions: > >> NFS just isn?t built for security really. I guess NFSv4 with KRB5 is >> one option to look at, with user based credentials. That might just >> about be feasible if the user were do authenticate with kinit before >> being able to access NFSv4 mounted files. I.e. Its done at the user >> level rather than the instance level. That might be an interesting >> project as a feasibility study to look at, will it work? How would >> we integrate into a federated access management system (something >> like UK Federation and ABFAB/Moonshot/Assent maybe?). Could we >> provide easy steps for a user in a VM to follow? Can we even make it >> work with Ganesha in such an environment? > > >Kerberized NFSv3 and Kerberized NFSv4 provide nearly the same level of >security. Kerberos makes the difference and not the NFS version. I have >posted some background information to the GPFS forum: >http://ibm.co/1VFLUR4 > >Kerberized NFSv4 has the advantage that it allows different UID/GID ranges >on NFS server and NFS client. I have led a proof-of-concept where we have >used this feature to provide secure data access to personalized patient >data for multiple tenants where the tenants had conflicting UID/GID >ranges. >I have some material which I will share via the GPFS forum. > >UK Federation seems to be based on SAML/Shibboleth. Unfortunately there is >no easy integration of network file protocols such as NFS and SMB and >SAML/Shibboleth, because file protocols require attributes which are >typically not stored in SAML/Shibboleth. Fortunately I provided technical >guidance to a customer who exactly implemented this integration in order >to >provide secure file service to multiple universities, again with >conflicting UID/GID ranges. I need some time to write it up and publish >it. From martin.gasthuber at desy.de Tue Jul 28 17:28:44 2015 From: martin.gasthuber at desy.de (Martin Gasthuber) Date: Tue, 28 Jul 2015 18:28:44 +0200 Subject: [gpfsug-discuss] fast ACL alter solution Message-ID: Hi, since a few months we're running a new infrastructure, with the core built on GPFS (4.1.0.8), for 'light source - X-Rays' experiments local at the site. The system is used for the data acquisition chain, data analysis, data exports and archive. Right now we got new detector types (homebuilt, experimental) generating millions of small files - the last run produced ~9 million files at 64 to 128K in size ;-). In our setup, the files gets copied to a (user accessible) GPFS instance which controls the access by NFSv4 ACLs (only !) and from time to time, we had to modify these ACLs (add/remove user/group etc.). Doing a (non policy-run based) simple approach, changing 9 million files requires ~200 hours to run - which we consider not really a good option. Running mmgetacl/mmputacl whithin a policy-run will clearly speed that up - but the biggest time consuming operations are the get and put ACL ops. Is anybody aware of any faster ACL access operation (whithin the policy-run) - or even a 'mod-acl' operation ? best regards, Martin From jonathan at buzzard.me.uk Tue Jul 28 19:06:30 2015 From: jonathan at buzzard.me.uk (Jonathan Buzzard) Date: Tue, 28 Jul 2015 19:06:30 +0100 Subject: [gpfsug-discuss] fast ACL alter solution In-Reply-To: References: Message-ID: <55B7C4A6.9020205@buzzard.me.uk> On 28/07/15 17:28, Martin Gasthuber wrote: > Hi, > > since a few months we're running a new infrastructure, with the core > built on GPFS (4.1.0.8), for 'light source - X-Rays' experiments > local at the site. The system is used for the data acquisition chain, > data analysis, data exports and archive. Right now we got new > detector types (homebuilt, experimental) generating millions of small > files - the last run produced ~9 million files at 64 to 128K in size > ;-). In our setup, the files gets copied to a (user accessible) GPFS > instance which controls the access by NFSv4 ACLs (only !) and from > time to time, we had to modify these ACLs (add/remove user/group > etc.). Doing a (non policy-run based) simple approach, changing 9 > million files requires ~200 hours to run - which we consider not > really a good option. Running mmgetacl/mmputacl whithin a policy-run > will clearly speed that up - but the biggest time consuming > operations are the get and put ACL ops. Is anybody aware of any > faster ACL access operation (whithin the policy-run) - or even a > 'mod-acl' operation ? > In the past IBM have said that their expectations are that the ACL's are set via Windows on remote workstations and not from the command line on the GPFS servers themselves!!! Crazy I know. There really needs to be a mm version of the NFSv4 setfacl/nfs4_getfacl commands that ideally makes use of the fast inode traversal features to make things better. In the past I wrote some C code that set specific ACL's on files. This however was to deal with migrating files onto a system and needed to set initial ACL's and didn't make use of the fast traversal features and is completely unpolished. A good starting point would probably be the FreeBSD setfacl/getfacl tools, that at least was my plan but I have never gotten around to it. JAB. -- Jonathan A. Buzzard Email: jonathan (at) buzzard.me.uk Fife, United Kingdom. From TROPPENS at de.ibm.com Wed Jul 29 09:02:59 2015 From: TROPPENS at de.ibm.com (Ulf Troppens) Date: Wed, 29 Jul 2015 10:02:59 +0200 Subject: [gpfsug-discuss] GPFS and Community Scientific Cloud In-Reply-To: References: Message-ID: Hi Simon, I have started to draft a response, but it gets longer and longer. I need some more time to respond. Best regards, Ulf. -- IBM Spectrum Scale Development - Client Engagements & Solutions Delivery Consulting IT Specialist Author "Storage Networks Explained" IBM Deutschland Research & Development GmbH Vorsitzende des Aufsichtsrats: Martina Koederitz Gesch?ftsf?hrung: Dirk Wittkopp Sitz der Gesellschaft: B?blingen / Registergericht: Amtsgericht Stuttgart, HRB 243294 From: "Simon Thompson (Research Computing - IT Services)" To: gpfsug main discussion list Date: 27.07.2015 23:24 Subject: Re: [gpfsug-discuss] GPFS and Community Scientific Cloud Sent by: gpfsug-discuss-bounces at gpfsug.org Hi Ulf, Thanks for the email, as suggested, I'm copying this to the GPFS UG mailing list as well as I'm sure the discussion is of interest to others. I guess what we're looking to do is to have arbitrary VMs running provided by users (I.e. Completely untrusted), but to provide them a way to get secure access to only their data. Right now we can't give them a GPFS client as this is too trusting, I was wondering how easy it would be for us to implement something like: User has a VM User runs 'kinit user at DOMAIN' to gain kerberos ticket and can then securely gain access to only their files from my NFS server. I also mentioned Janet ASSENT, which is a relatively recent project: https://jisc.ac.uk/assent (It was piloted as Janet Moonshot). Which builds on top of SAML to provide other software access to federation. My understanding is that site-specific UID mapping is needed (e.g. On the NFS/GPFS server). Simon >I have some experience with the following questions: > >> NFS just isn?t built for security really. I guess NFSv4 with KRB5 is >> one option to look at, with user based credentials. That might just >> about be feasible if the user were do authenticate with kinit before >> being able to access NFSv4 mounted files. I.e. Its done at the user >> level rather than the instance level. That might be an interesting >> project as a feasibility study to look at, will it work? How would >> we integrate into a federated access management system (something >> like UK Federation and ABFAB/Moonshot/Assent maybe?). Could we >> provide easy steps for a user in a VM to follow? Can we even make it >> work with Ganesha in such an environment? > > >Kerberized NFSv3 and Kerberized NFSv4 provide nearly the same level of >security. Kerberos makes the difference and not the NFS version. I have >posted some background information to the GPFS forum: >http://ibm.co/1VFLUR4 > >Kerberized NFSv4 has the advantage that it allows different UID/GID ranges >on NFS server and NFS client. I have led a proof-of-concept where we have >used this feature to provide secure data access to personalized patient >data for multiple tenants where the tenants had conflicting UID/GID >ranges. >I have some material which I will share via the GPFS forum. > >UK Federation seems to be based on SAML/Shibboleth. Unfortunately there is >no easy integration of network file protocols such as NFS and SMB and >SAML/Shibboleth, because file protocols require attributes which are >typically not stored in SAML/Shibboleth. Fortunately I provided technical >guidance to a customer who exactly implemented this integration in order >to >provide secure file service to multiple universities, again with >conflicting UID/GID ranges. I need some time to write it up and publish >it. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss From chair at gpfsug.org Thu Jul 30 21:36:07 2015 From: chair at gpfsug.org (chair-gpfsug.org) Date: Thu, 30 Jul 2015 21:36:07 +0100 Subject: [gpfsug-discuss] July Meet the devs Message-ID: I've heard some great feedback about the July meet the devs held at IBM Warwick this week. Thanks to Ross and Patrick at IBM and Clare for coordinating the registration for this! Jez has a few photos so we'll try and get those uploaded in the next week or so to the website. Simon (GPFS UG Chair) From secretary at gpfsug.org Wed Jul 1 09:00:51 2015 From: secretary at gpfsug.org (Secretary GPFS UG) Date: Wed, 01 Jul 2015 09:00:51 +0100 Subject: [gpfsug-discuss] Meet the Developers Message-ID: Dear All, We are planning the next 'Meet the Devs' event for Wednesday 29th July, 11am-3pm. Depending on interest, we are looking to hold in either Manchester or Warwick. The agenda promises to be hands on and give you the opportunity to speak face to face with the developers of GPFS. Guideline agenda: * Data analytic workloads - development to show and tell UK work on establishing use cases and tighter integration of Spark on top of GPFS * Show the GUI coming in 4.2 * Discuss 4.2 and beyond roadmap * How would you like IP management to work for protocol access? * Optional - Team can demo & discuss NFS/SMB/Object integration into Scale Lunch and refreshments will be provided. Please can you let me know by email if you are interested in attending along with your preferred venue by Friday 10th July. Thanks and we hope to see you there! -- Claire O'Toole (n?e Robson) GPFS User Group Secretary +44 (0)7508 033896 From chair at gpfsug.org Wed Jul 1 09:21:03 2015 From: chair at gpfsug.org (GPFS UG Chair) Date: Wed, 1 Jul 2015 09:21:03 +0100 Subject: [gpfsug-discuss] mailing list change Message-ID: Hi All, We've made a change to the mailing list so that only subscribers are now able to post to the list. We've done this as we've been getting a *lot* of spam held for moderation from non-members and the occasional legitimate post was getting lost in the spam. If you or colleagues routinely post from a different address from that subscribed to the list, you'll now need to be subscribed (you'll get an error back from the list when you try to post). As its a mailman list, if you do want to have multiple addresses subscribed, you can of course disable the address from the mailman interface from receiving posts. Simon -------------- next part -------------- An HTML attachment was scrubbed... URL: From oehmes at us.ibm.com Wed Jul 1 15:21:29 2015 From: oehmes at us.ibm.com (Sven Oehme) Date: Wed, 1 Jul 2015 07:21:29 -0700 Subject: [gpfsug-discuss] GPFS 4.1.1 without QoS for mmrestripefs? In-Reply-To: <2CDF270206A255459AC4FA6B08E52AF90114634DD0@ABCSYSEXC1.abcsystems.ch> References: <2CDF270206A255459AC4FA6B08E52AF90114634DD0@ABCSYSEXC1.abcsystems.ch> Message-ID: <201507011422.t61EMZmw011626@d01av01.pok.ibm.com> Daniel, as you know, we can't discuss future / confidential items on a mailing list. what i presented as an outlook to future releases hasn't changed from a technical standpoint, we just can't share a release date until we announce it official. there are multiple ways today to limit the impact on restripe and other tasks, the best way to do this is to run the task ( using -N) on a node (or very small number of nodes) that has no performance critical role. while this is not perfect, it should limit the impact significantly. . sven ------------------------------------------ Sven Oehme Scalable Storage Research email: oehmes at us.ibm.com Phone: +1 (408) 824-8904 IBM Almaden Research Lab ------------------------------------------ From: Daniel Vogel To: "'gpfsug-discuss at gpfsug.org'" Date: 07/01/2015 03:29 AM Subject: [gpfsug-discuss] GPFS 4.1.1 without QoS for mmrestripefs? Sent by: gpfsug-discuss-bounces at gpfsug.org Hi Years ago, IBM made some plan to do a implementation ?QoS for mmrestripefs, mmdeldisk??. If a ?mmfsrestripe? is running, very poor performance for NFS access. I opened a PMR to ask for QoS in version 4.1.1 (Spectrum Scale). PMR 61309,113,848: I discussed the question of QOS with the development team. These command changes that were noticed are not meant to be used as GA code which is why they are not documented. I cannot provide any further information from the support perspective. Anybody knows about QoS? The last hope was at ?GPFS Workshop Stuttgart M?rz 2015? with Sven Oehme as speaker. Daniel Vogel IT Consultant ABC SYSTEMS AG Hauptsitz Z?rich R?tistrasse 28 CH - 8952 Schlieren T +41 43 433 6 433 D +41 43 433 6 467 http://www.abcsystems.ch ABC - Always Better Concepts. Approved By Customers since 1981. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: From Kevin.Buterbaugh at Vanderbilt.Edu Wed Jul 1 15:32:50 2015 From: Kevin.Buterbaugh at Vanderbilt.Edu (Buterbaugh, Kevin L) Date: Wed, 1 Jul 2015 14:32:50 +0000 Subject: [gpfsug-discuss] GPFS 4.1.1 without QoS for mmrestripefs? In-Reply-To: <201507011422.t61EMZmw011626@d01av01.pok.ibm.com> References: <2CDF270206A255459AC4FA6B08E52AF90114634DD0@ABCSYSEXC1.abcsystems.ch> <201507011422.t61EMZmw011626@d01av01.pok.ibm.com> Message-ID: Sven, It?s been a while since I tried that, but the last time I tried to limit the impact of a restripe by only running it on a few NSD server nodes it made things worse. Everybody was as slowed down as they would?ve been if I?d thrown every last NSD server we have at it and they were slowed down for longer, since using fewer NSD servers meant the restripe ran longer. What we do is always kick off restripes on a Friday afternoon, throw every NSD server we have at them, and let them run over the weekend. Interactive use is lower then and people don?t notice or care if their batch jobs run longer. Of course, this is all just my experiences. YMMV... Kevin On Jul 1, 2015, at 9:21 AM, Sven Oehme > wrote: Daniel, as you know, we can't discuss future / confidential items on a mailing list. what i presented as an outlook to future releases hasn't changed from a technical standpoint, we just can't share a release date until we announce it official. there are multiple ways today to limit the impact on restripe and other tasks, the best way to do this is to run the task ( using -N) on a node (or very small number of nodes) that has no performance critical role. while this is not perfect, it should limit the impact significantly. . sven ------------------------------------------ Sven Oehme Scalable Storage Research email: oehmes at us.ibm.com Phone: +1 (408) 824-8904 IBM Almaden Research Lab ------------------------------------------ Daniel Vogel ---07/01/2015 03:29:11 AM---Hi Years ago, IBM made some plan to do a implementation "QoS for mmrestripefs, mmdeldisk...". If a " From: Daniel Vogel > To: "'gpfsug-discuss at gpfsug.org'" > Date: 07/01/2015 03:29 AM Subject: [gpfsug-discuss] GPFS 4.1.1 without QoS for mmrestripefs? Sent by: gpfsug-discuss-bounces at gpfsug.org ________________________________ Hi Years ago, IBM made some plan to do a implementation ?QoS for mmrestripefs, mmdeldisk??. If a ?mmfsrestripe? is running, very poor performance for NFS access. I opened a PMR to ask for QoS in version 4.1.1 (Spectrum Scale). PMR 61309,113,848: I discussed the question of QOS with the development team. These command changes that were noticed are not meant to be used as GA code which is why they are not documented. I cannot provide any further information from the support perspective. Anybody knows about QoS? The last hope was at ?GPFS Workshop Stuttgart M?rz 2015? with Sven Oehme as speaker. Daniel Vogel IT Consultant ABC SYSTEMS AG Hauptsitz Z?rich R?tistrasse 28 CH - 8952 Schlieren T +41 43 433 6 433 D +41 43 433 6 467 http://www.abcsystems.ch ABC - Always Better Concepts. Approved By Customers since 1981. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss ? Kevin Buterbaugh - Senior System Administrator Vanderbilt University - Advanced Computing Center for Research and Education Kevin.Buterbaugh at vanderbilt.edu - (615)875-9633 -------------- next part -------------- An HTML attachment was scrubbed... URL: From chris.hunter at yale.edu Wed Jul 1 16:52:07 2015 From: chris.hunter at yale.edu (Chris Hunter) Date: Wed, 01 Jul 2015 11:52:07 -0400 Subject: [gpfsug-discuss] gpfs rdma expels Message-ID: <55940CA7.9010506@yale.edu> Hi UG list, We have a large rdma/tcp multi-cluster gpfs filesystem, about 2/3 of clients use RDMA. We see a large number of expels of rdma clients but less of the tcp clients. Most of the gpfs config is at defaults. We are unclear if any of the non-RDMA config items (eg. Idle socket timeout) would help our issue. Any suggestions on gpfs config parameters we should investigate ? thank-you in advance, chris hunter yale hpc group From viccornell at gmail.com Wed Jul 1 16:58:31 2015 From: viccornell at gmail.com (Vic Cornell) Date: Wed, 1 Jul 2015 16:58:31 +0100 Subject: [gpfsug-discuss] gpfs rdma expels In-Reply-To: <55940CA7.9010506@yale.edu> References: <55940CA7.9010506@yale.edu> Message-ID: <6E28C0FB-2F99-4127-B1F2-272BA2532330@gmail.com> If it used to work then its probably not config. Most expels are the result of network connectivity problems. If your cluster is not too big try looking at ping from every node to every other node and look for large latencies. Also look to see who is expelling who. Ie - if your RDMA nodes are being expelled by non-RDMA nodes. It may point to a weakness in your network which GPFS ,being as it is a great finder of weaknesses, is having a problem with. Also more details (network config etc) will elicit more detailed suggestions. Cheers, Vic > On 1 Jul 2015, at 16:52, Chris Hunter wrote: > > Hi UG list, > We have a large rdma/tcp multi-cluster gpfs filesystem, about 2/3 of clients use RDMA. We see a large number of expels of rdma clients but less of the tcp clients. > Most of the gpfs config is at defaults. We are unclear if any of the non-RDMA config items (eg. Idle socket timeout) would help our issue. Any suggestions on gpfs config parameters we should investigate ? > > thank-you in advance, > chris hunter > yale hpc group > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at gpfsug.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss From stijn.deweirdt at ugent.be Thu Jul 2 07:42:30 2015 From: stijn.deweirdt at ugent.be (Stijn De Weirdt) Date: Thu, 02 Jul 2015 08:42:30 +0200 Subject: [gpfsug-discuss] gpfs rdma expels In-Reply-To: <6E28C0FB-2F99-4127-B1F2-272BA2532330@gmail.com> References: <55940CA7.9010506@yale.edu> <6E28C0FB-2F99-4127-B1F2-272BA2532330@gmail.com> Message-ID: <5594DD56.6010302@ugent.be> do you use ipoib for the rdma nodes or regular ethernet? and what OS are you on? we had issue with el7.1 kernel and ipoib; there's packet loss with ipoib and mlnx_ofed (and mlnx engineering told that it might be in basic ofed from el7.1 too). 7.0 kernels are ok) and client expels were the first signs on our setup. stijn On 07/01/2015 05:58 PM, Vic Cornell wrote: > If it used to work then its probably not config. Most expels are the result of network connectivity problems. > > If your cluster is not too big try looking at ping from every node to every other node and look for large latencies. > > Also look to see who is expelling who. Ie - if your RDMA nodes are being expelled by non-RDMA nodes. It may point to a weakness in your network which GPFS ,being as it is a great finder of weaknesses, is having a problem with. > > Also more details (network config etc) will elicit more detailed suggestions. > > Cheers, > > Vic > > > >> On 1 Jul 2015, at 16:52, Chris Hunter wrote: >> >> Hi UG list, >> We have a large rdma/tcp multi-cluster gpfs filesystem, about 2/3 of clients use RDMA. We see a large number of expels of rdma clients but less of the tcp clients. >> Most of the gpfs config is at defaults. We are unclear if any of the non-RDMA config items (eg. Idle socket timeout) would help our issue. Any suggestions on gpfs config parameters we should investigate ? >> >> thank-you in advance, >> chris hunter >> yale hpc group >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at gpfsug.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at gpfsug.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > From Daniel.Vogel at abcsystems.ch Thu Jul 2 08:12:32 2015 From: Daniel.Vogel at abcsystems.ch (Daniel Vogel) Date: Thu, 2 Jul 2015 07:12:32 +0000 Subject: [gpfsug-discuss] GPFS 4.1.1 without QoS for mmrestripefs? In-Reply-To: <201507011422.t61EMZmw011626@d01av01.pok.ibm.com> References: <2CDF270206A255459AC4FA6B08E52AF90114634DD0@ABCSYSEXC1.abcsystems.ch> <201507011422.t61EMZmw011626@d01av01.pok.ibm.com> Message-ID: <2CDF270206A255459AC4FA6B08E52AF901146351BD@ABCSYSEXC1.abcsystems.ch> Sven, Yes I agree, but ?using ?N? to reduce the load helps not really. If I use NFS, for example, as a ESX data store, ESX I/O latency for NFS goes very high, the VM?s hangs. By the way I use SSD PCIe cards, perfect ?mirror speed? but slow I/O on NFS. The GPFS cluster concept I use are different than GSS or traditional FC (shared storage). I use shared nothing with IB (no FPO), many GPFS nodes with NSD?s. I know the need to resync the FS with mmchdisk / mmrestripe will happen more often. The only one feature will help is QoS for the GPFS admin jobs. I hope we are not fare away from this. Thanks, Daniel Von: gpfsug-discuss-bounces at gpfsug.org [mailto:gpfsug-discuss-bounces at gpfsug.org] Im Auftrag von Sven Oehme Gesendet: Mittwoch, 1. Juli 2015 16:21 An: gpfsug main discussion list Betreff: Re: [gpfsug-discuss] GPFS 4.1.1 without QoS for mmrestripefs? Daniel, as you know, we can't discuss future / confidential items on a mailing list. what i presented as an outlook to future releases hasn't changed from a technical standpoint, we just can't share a release date until we announce it official. there are multiple ways today to limit the impact on restripe and other tasks, the best way to do this is to run the task ( using -N) on a node (or very small number of nodes) that has no performance critical role. while this is not perfect, it should limit the impact significantly. . sven ------------------------------------------ Sven Oehme Scalable Storage Research email: oehmes at us.ibm.com Phone: +1 (408) 824-8904 IBM Almaden Research Lab ------------------------------------------ [Inactive hide details for Daniel Vogel ---07/01/2015 03:29:11 AM---Hi Years ago, IBM made some plan to do a implementation "QoS]Daniel Vogel ---07/01/2015 03:29:11 AM---Hi Years ago, IBM made some plan to do a implementation "QoS for mmrestripefs, mmdeldisk...". If a " From: Daniel Vogel > To: "'gpfsug-discuss at gpfsug.org'" > Date: 07/01/2015 03:29 AM Subject: [gpfsug-discuss] GPFS 4.1.1 without QoS for mmrestripefs? Sent by: gpfsug-discuss-bounces at gpfsug.org ________________________________ Hi Years ago, IBM made some plan to do a implementation ?QoS for mmrestripefs, mmdeldisk??. If a ?mmfsrestripe? is running, very poor performance for NFS access. I opened a PMR to ask for QoS in version 4.1.1 (Spectrum Scale). PMR 61309,113,848: I discussed the question of QOS with the development team. These command changes that were noticed are not meant to be used as GA code which is why they are not documented. I cannot provide any further information from the support perspective. Anybody knows about QoS? The last hope was at ?GPFS Workshop Stuttgart M?rz 2015? with Sven Oehme as speaker. Daniel Vogel IT Consultant ABC SYSTEMS AG Hauptsitz Z?rich R?tistrasse 28 CH - 8952 Schlieren T +41 43 433 6 433 D +41 43 433 6 467 http://www.abcsystems.ch ABC - Always Better Concepts. Approved By Customers since 1981. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image001.gif Type: image/gif Size: 105 bytes Desc: image001.gif URL: From chris.howarth at citi.com Thu Jul 2 08:24:37 2015 From: chris.howarth at citi.com (Howarth, Chris ) Date: Thu, 2 Jul 2015 07:24:37 +0000 Subject: [gpfsug-discuss] GPFS 4.1.1 without QoS for mmrestripefs? In-Reply-To: <2CDF270206A255459AC4FA6B08E52AF901146351BD@ABCSYSEXC1.abcsystems.ch> References: <2CDF270206A255459AC4FA6B08E52AF90114634DD0@ABCSYSEXC1.abcsystems.ch> <201507011422.t61EMZmw011626@d01av01.pok.ibm.com> <2CDF270206A255459AC4FA6B08E52AF901146351BD@ABCSYSEXC1.abcsystems.ch> Message-ID: <0609A0AC1B1CA9408D88D4144C5C990B75D89CF5@EXLNMB52.eur.nsroot.net> Daniel ?in our environment we have data and metadata split out onto separate drives in separate servers. We also set the GPFS parameter ?mmchconfig defaultHelperNodes=?list_of_metadata_servers? which will automatically only use these nodes for the scan for restriping/rebalancing data (rather than having to specify the ?N option). This dramatically reduced the impact to clients accessing the data nodes while these activities are taking place. Also using SSDs for metadata nodes can make a big improvement. Chris From: gpfsug-discuss-bounces at gpfsug.org [mailto:gpfsug-discuss-bounces at gpfsug.org] On Behalf Of Daniel Vogel Sent: Thursday, July 02, 2015 8:13 AM To: 'gpfsug main discussion list' Subject: Re: [gpfsug-discuss] GPFS 4.1.1 without QoS for mmrestripefs? Sven, Yes I agree, but ?using ?N? to reduce the load helps not really. If I use NFS, for example, as a ESX data store, ESX I/O latency for NFS goes very high, the VM?s hangs. By the way I use SSD PCIe cards, perfect ?mirror speed? but slow I/O on NFS. The GPFS cluster concept I use are different than GSS or traditional FC (shared storage). I use shared nothing with IB (no FPO), many GPFS nodes with NSD?s. I know the need to resync the FS with mmchdisk / mmrestripe will happen more often. The only one feature will help is QoS for the GPFS admin jobs. I hope we are not fare away from this. Thanks, Daniel Von: gpfsug-discuss-bounces at gpfsug.org [mailto:gpfsug-discuss-bounces at gpfsug.org] Im Auftrag von Sven Oehme Gesendet: Mittwoch, 1. Juli 2015 16:21 An: gpfsug main discussion list Betreff: Re: [gpfsug-discuss] GPFS 4.1.1 without QoS for mmrestripefs? Daniel, as you know, we can't discuss future / confidential items on a mailing list. what i presented as an outlook to future releases hasn't changed from a technical standpoint, we just can't share a release date until we announce it official. there are multiple ways today to limit the impact on restripe and other tasks, the best way to do this is to run the task ( using -N) on a node (or very small number of nodes) that has no performance critical role. while this is not perfect, it should limit the impact significantly. . sven ------------------------------------------ Sven Oehme Scalable Storage Research email: oehmes at us.ibm.com Phone: +1 (408) 824-8904 IBM Almaden Research Lab ------------------------------------------ [Inactive hide details for Daniel Vogel ---07/01/2015 03:29:11 AM---Hi Years ago, IBM made some plan to do a implementation "QoS]Daniel Vogel ---07/01/2015 03:29:11 AM---Hi Years ago, IBM made some plan to do a implementation "QoS for mmrestripefs, mmdeldisk...". If a " From: Daniel Vogel > To: "'gpfsug-discuss at gpfsug.org'" > Date: 07/01/2015 03:29 AM Subject: [gpfsug-discuss] GPFS 4.1.1 without QoS for mmrestripefs? Sent by: gpfsug-discuss-bounces at gpfsug.org ________________________________ Hi Years ago, IBM made some plan to do a implementation ?QoS for mmrestripefs, mmdeldisk??. If a ?mmfsrestripe? is running, very poor performance for NFS access. I opened a PMR to ask for QoS in version 4.1.1 (Spectrum Scale). PMR 61309,113,848: I discussed the question of QOS with the development team. These command changes that were noticed are not meant to be used as GA code which is why they are not documented. I cannot provide any further information from the support perspective. Anybody knows about QoS? The last hope was at ?GPFS Workshop Stuttgart M?rz 2015? with Sven Oehme as speaker. Daniel Vogel IT Consultant ABC SYSTEMS AG Hauptsitz Z?rich R?tistrasse 28 CH - 8952 Schlieren T +41 43 433 6 433 D +41 43 433 6 467 http://www.abcsystems.ch ABC - Always Better Concepts. Approved By Customers since 1981. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image001.gif Type: image/gif Size: 105 bytes Desc: image001.gif URL: From chris.hunter at yale.edu Thu Jul 2 14:01:53 2015 From: chris.hunter at yale.edu (Chris Hunter) Date: Thu, 02 Jul 2015 09:01:53 -0400 Subject: [gpfsug-discuss] gpfs rdma expels Message-ID: <55953641.4010701@yale.edu> Thanks for the feedback. Our network is non-uniform, we have three (uniform) rdma networks connected by narrow uplinks. Previously we used gpfs on one network, now we wish to expand to the other networks. Previous experience shows we see "PortXmitWait" messages from traffic over the narrow uplinks. We find expels happen often from gpfs communication over the narrow uplinks. We acknowledge an inherent weakness with narrow uplinks but for practical reasons it would be difficult to resolve. So the question, is it possible to configure gpfs to be tolerant of non-uniform networks with narrow uplinks ? thanks, chris hunter > On 1 Jul 2015, at 16:52, Chris Hunter wrote: > > Hi UG list, > We have a large rdma/tcp multi-cluster gpfs filesystem, about 2/3 of > clients use RDMA. We see a large number of expels of rdma clients but > less of the tcp clients. Most of the gpfs config is at defaults. We > are unclear if any of the non-RDMA config items (eg. Idle socket > timeout) would help our issue. Any suggestions on gpfs config > parameters we should investigate ? From S.J.Thompson at bham.ac.uk Thu Jul 2 16:43:03 2015 From: S.J.Thompson at bham.ac.uk (Simon Thompson (Research Computing - IT Services)) Date: Thu, 2 Jul 2015 15:43:03 +0000 Subject: [gpfsug-discuss] 4.1.1 protocol support Message-ID: Hi, Just wondering if anyone has looked at the new protocol support stuff in 4.1.1 yet? >From what I can see, it wants to use the installer to add things like IBM Samba onto nodes in the cluster. The docs online seem to list manual installation as running the chef template, which is hardly manual... 1. Id like to know what is being run on my cluster 2. Its an existing install which was using sernet samba, so I don't want to go out and break anything inadvertently 3. My protocol nodes are in a multicluster, and I understand the installer doesn't support multicluster. (the docs state that multicluster isn't supported but something like its expected to work). So... Has anyone had a go at this yet and have a set of steps? I've started unpicking the chef recipe, but just wondering if anyone had already had a go at this? (and lets not start on the mildy bemusing error when you "enable" the service with "mmces service enable" (ces service not enabled) - there's other stuff to enable it)... Simon From GARWOODM at uk.ibm.com Thu Jul 2 16:55:42 2015 From: GARWOODM at uk.ibm.com (Michael Garwood7) Date: Thu, 2 Jul 2015 16:55:42 +0100 Subject: [gpfsug-discuss] 4.1.1 protocol support In-Reply-To: References: Message-ID: Hi Simon, 1. Most of the chef recipes involve installing the various packages required for the protocols, and some of the new performance monitoring packages required for mmperfquery. There is a series of steps for proper manual install at http://www-01.ibm.com/support/knowledgecenter/STXKQY_4.1.1/com.ibm.spectrum.scale.v4r11.adv.doc/bl1adv_ces_features.htm but this assumes you have all IBM Samba RPMs and prerequisites installed. The recipes *should* be split out so that at the very least, RPM install is done in its own recipe without configuring or enabling anything... 2. I am not 100% sure what deploying IBM Samba on the cluster will do with regards to sernet samba. As far as I am aware there is no code in the installer or chef recipes to check for other samba deployments running but I may be mistaken. Depending on how sernet samba hooks to GPFS I can't think of any reason why it would cause problems aside from the risk of the protocols not communicating and causing issues with file locks/data overwrites, depending on what workload you have running on samba. 3. I haven't personally seen multicluster deployments done or tested before, but no, it is not officially supported. The installer has been written with the assumption that you are installing to one cluster, so I wouldn't recommend trying with multiple clusters - unforseen consequences :) Regards, Michael Garwood IBM Systems Developer Phone: 44-161-905-4118 E-mail: GARWOODM at uk.ibm.com 40 Blackfriars Street Manchester, M3 2EG United Kingdom IBM United Kingdom Limited Registered in England and Wales with number 741598 Registered office: PO Box 41, North Harbour, Portsmouth, Hants. PO6 3AU From: "Simon Thompson (Research Computing - IT Services)" To: gpfsug main discussion list , Date: 02/07/2015 16:43 Subject: [gpfsug-discuss] 4.1.1 protocol support Sent by: gpfsug-discuss-bounces at gpfsug.org Hi, Just wondering if anyone has looked at the new protocol support stuff in 4.1.1 yet? >From what I can see, it wants to use the installer to add things like IBM Samba onto nodes in the cluster. The docs online seem to list manual installation as running the chef template, which is hardly manual... 1. Id like to know what is being run on my cluster 2. Its an existing install which was using sernet samba, so I don't want to go out and break anything inadvertently 3. My protocol nodes are in a multicluster, and I understand the installer doesn't support multicluster. (the docs state that multicluster isn't supported but something like its expected to work). So... Has anyone had a go at this yet and have a set of steps? I've started unpicking the chef recipe, but just wondering if anyone had already had a go at this? (and lets not start on the mildy bemusing error when you "enable" the service with "mmces service enable" (ces service not enabled) - there's other stuff to enable it)... Simon _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss Unless stated otherwise above: IBM United Kingdom Limited - Registered in England and Wales with number 741598. Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU -------------- next part -------------- An HTML attachment was scrubbed... URL: From oester at gmail.com Thu Jul 2 17:02:01 2015 From: oester at gmail.com (Bob Oesterlin) Date: Thu, 2 Jul 2015 11:02:01 -0500 Subject: [gpfsug-discuss] 4.1.1 protocol support In-Reply-To: References: Message-ID: Hi Simon I was part of a beta program for GPFS (ok, better start saying Spectrum Scale!) 4.1.1, so I've had some experience with the toolkit that installs the protocol nodes. The new protocol nodes MUST be RH7, so it's going to be a bit more of an involved process to migrate to this level than in the past. The GPFS server nodes/client nodes can remain at RH6 is needed. Overall it works pretty well. You do have the option of doing things manually as well. The guide that describes it is pretty good. If you want to discuss the process in detail, I'd be happy to do so - a bit too much to cover over a mailing list. Bob Oesterlin Sr Storage Engineer, Nuance Communications robert.oesterlin at nuance.com On Thu, Jul 2, 2015 at 10:43 AM, Simon Thompson (Research Computing - IT Services) wrote: > Hi, > > Just wondering if anyone has looked at the new protocol support stuff in > 4.1.1 yet? > > From what I can see, it wants to use the installer to add things like IBM > Samba onto nodes in the cluster. The docs online seem to list manual > installation as running the chef template, which is hardly manual... > > 1. Id like to know what is being run on my cluster > 2. Its an existing install which was using sernet samba, so I don't want > to go out and break anything inadvertently > 3. My protocol nodes are in a multicluster, and I understand the installer > doesn't support multicluster. > > (the docs state that multicluster isn't supported but something like its > expected to work). > > So... Has anyone had a go at this yet and have a set of steps? > > I've started unpicking the chef recipe, but just wondering if anyone had > already had a go at this? > > (and lets not start on the mildy bemusing error when you "enable" the > service with "mmces service enable" (ces service not enabled) - there's > other stuff to enable it)... > > Simon > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at gpfsug.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: From S.J.Thompson at bham.ac.uk Thu Jul 2 19:52:28 2015 From: S.J.Thompson at bham.ac.uk (Simon Thompson (Research Computing - IT Services)) Date: Thu, 2 Jul 2015 18:52:28 +0000 Subject: [gpfsug-discuss] 4.1.1 protocol support In-Reply-To: References: Message-ID: Hi Michael, Thanks for that link. This is the docs I?d found before: http://www-01.ibm.com/support/knowledgecenter/#!/STXKQY_4.1.1/com.ibm.spectrum.scale.v4r11.ins.doc/bl1ins_manualprotocols.htm I guess one of the reasons for wanting to unpick is because we already have configuration management tools all in place. I have no issue about GPFS config being inside GPFS, but we really need to know what is going on (and we can manage to get the RPMs all on etc if we know what is needed from the config management tool). I do note that it needs CCR enabled, which we currently don?t have. Now I think this was because we saw issues with mmsdrestore when adding a node that had been reinstalled back into the cluster. I need to check if that is still the case (we work on being able to pull clients, NSDs etc from the cluster and using xcat to reprovision and the a config tool to do the relevant bits to rejoin the cluster ? makes it easier for us to stage kernel, GPFS, OFED updates as we just blat on a new image). I don?t really want to have a mix of Sernet and IBM samba on there, so am happy to pull out those bits, but obviously need to get the IBM bits working as well. Multicluster ? well, our ?protocol? cluster is a separate cluster from the NSD cluster (can?t remote expel, might want to add other GPFS clusters to the protocol layer etc). Of course the multi cluster talks GPFS protocol, so I don?t see any reason why it shouldn?t work, but yes, noted its not supported. Simon From: Michael Garwood7 > Reply-To: gpfsug main discussion list > Date: Thursday, 2 July 2015 16:55 To: gpfsug main discussion list > Subject: Re: [gpfsug-discuss] 4.1.1 protocol support Hi Simon, 1. Most of the chef recipes involve installing the various packages required for the protocols, and some of the new performance monitoring packages required for mmperfquery. There is a series of steps for proper manual install at http://www-01.ibm.com/support/knowledgecenter/STXKQY_4.1.1/com.ibm.spectrum.scale.v4r11.adv.doc/bl1adv_ces_features.htm but this assumes you have all IBM Samba RPMs and prerequisites installed. The recipes *should* be split out so that at the very least, RPM install is done in its own recipe without configuring or enabling anything... 2. I am not 100% sure what deploying IBM Samba on the cluster will do with regards to sernet samba. As far as I am aware there is no code in the installer or chef recipes to check for other samba deployments running but I may be mistaken. Depending on how sernet samba hooks to GPFS I can't think of any reason why it would cause problems aside from the risk of the protocols not communicating and causing issues with file locks/data overwrites, depending on what workload you have running on samba. 3. I haven't personally seen multicluster deployments done or tested before, but no, it is not officially supported. The installer has been written with the assumption that you are installing to one cluster, so I wouldn't recommend trying with multiple clusters - unforseen consequences :) Regards, Michael Garwood IBM Systems Developer ________________________________ Phone: 44-161-905-4118 E-mail: GARWOODM at uk.ibm.com 40 Blackfriars Street Manchester, M3 2EG United Kingdom IBM United Kingdom Limited Registered in England and Wales with number 741598 Registered office: PO Box 41, North Harbour, Portsmouth, Hants. PO6 3AU From: "Simon Thompson (Research Computing - IT Services)" > To: gpfsug main discussion list >, Date: 02/07/2015 16:43 Subject: [gpfsug-discuss] 4.1.1 protocol support Sent by: gpfsug-discuss-bounces at gpfsug.org ________________________________ Hi, Just wondering if anyone has looked at the new protocol support stuff in 4.1.1 yet? >From what I can see, it wants to use the installer to add things like IBM Samba onto nodes in the cluster. The docs online seem to list manual installation as running the chef template, which is hardly manual... 1. Id like to know what is being run on my cluster 2. Its an existing install which was using sernet samba, so I don't want to go out and break anything inadvertently 3. My protocol nodes are in a multicluster, and I understand the installer doesn't support multicluster. (the docs state that multicluster isn't supported but something like its expected to work). So... Has anyone had a go at this yet and have a set of steps? I've started unpicking the chef recipe, but just wondering if anyone had already had a go at this? (and lets not start on the mildy bemusing error when you "enable" the service with "mmces service enable" (ces service not enabled) - there's other stuff to enable it)... Simon _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss Unless stated otherwise above: IBM United Kingdom Limited - Registered in England and Wales with number 741598. Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU -------------- next part -------------- An HTML attachment was scrubbed... URL: From S.J.Thompson at bham.ac.uk Thu Jul 2 19:58:12 2015 From: S.J.Thompson at bham.ac.uk (Simon Thompson (Research Computing - IT Services)) Date: Thu, 2 Jul 2015 18:58:12 +0000 Subject: [gpfsug-discuss] 4.1.1 protocol support In-Reply-To: References: Message-ID: Hi Bob, Thanks, I?ll have a look through the link Michael sent me and shout if I get stuck? Looks a bit different to the previous way were we running this with ctdb etc. Our protocol nodes are already running 7.1 (though CentOS which means the mmbuildgpl command doesn?t work, would be much nice of course if the init script detected the kernel had changed and did a build etc automagically ?). Simon From: Bob Oesterlin > Reply-To: gpfsug main discussion list > Date: Thursday, 2 July 2015 17:02 To: gpfsug main discussion list > Subject: Re: [gpfsug-discuss] 4.1.1 protocol support Hi Simon I was part of a beta program for GPFS (ok, better start saying Spectrum Scale!) 4.1.1, so I've had some experience with the toolkit that installs the protocol nodes. The new protocol nodes MUST be RH7, so it's going to be a bit more of an involved process to migrate to this level than in the past. The GPFS server nodes/client nodes can remain at RH6 is needed. Overall it works pretty well. You do have the option of doing things manually as well. The guide that describes it is pretty good. If you want to discuss the process in detail, I'd be happy to do so - a bit too much to cover over a mailing list. Bob Oesterlin Sr Storage Engineer, Nuance Communications robert.oesterlin at nuance.com On Thu, Jul 2, 2015 at 10:43 AM, Simon Thompson (Research Computing - IT Services) > wrote: Hi, Just wondering if anyone has looked at the new protocol support stuff in 4.1.1 yet? >From what I can see, it wants to use the installer to add things like IBM Samba onto nodes in the cluster. The docs online seem to list manual installation as running the chef template, which is hardly manual... 1. Id like to know what is being run on my cluster 2. Its an existing install which was using sernet samba, so I don't want to go out and break anything inadvertently 3. My protocol nodes are in a multicluster, and I understand the installer doesn't support multicluster. (the docs state that multicluster isn't supported but something like its expected to work). So... Has anyone had a go at this yet and have a set of steps? I've started unpicking the chef recipe, but just wondering if anyone had already had a go at this? (and lets not start on the mildy bemusing error when you "enable" the service with "mmces service enable" (ces service not enabled) - there's other stuff to enable it)... Simon _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From oester at gmail.com Thu Jul 2 20:03:02 2015 From: oester at gmail.com (Bob Oesterlin) Date: Thu, 2 Jul 2015 14:03:02 -0500 Subject: [gpfsug-discuss] 4.1.1 protocol support In-Reply-To: References: Message-ID: On Thu, Jul 2, 2015 at 1:52 PM, Simon Thompson (Research Computing - IT Services) wrote: > I do note that it needs CCR enabled, which we currently don?t have. Now I > think this was because we saw issues with mmsdrestore when adding a node > that had been reinstalled back into the cluster. I need to check if that is > still the case (we work on being able to pull clients, NSDs etc from the > cluster and using xcat to reprovision and the a config tool to do the > relevant bits to rejoin the cluster ? makes it easier for us to stage > kernel, GPFS, OFED updates as we just blat on a new image). > Yes, and this is why we couldn't use CCR - our compute nodes are netboot, so they go thru a mmsdrrestore every time they reboot. Now, they have fixed this in 4.1.1, which means if you can get (the cluster) to 4.1.1 and turn on CCR, mmsdrrestore should work. Note to self: Test this out in your sandbox cluster. :-) Bob Oesterlin -------------- next part -------------- An HTML attachment was scrubbed... URL: From S.J.Thompson at bham.ac.uk Fri Jul 3 12:22:06 2015 From: S.J.Thompson at bham.ac.uk (Simon Thompson (Research Computing - IT Services)) Date: Fri, 3 Jul 2015 11:22:06 +0000 Subject: [gpfsug-discuss] mmsdrestore with CCR enabled (was Re: 4.1.1 protocol support) Message-ID: Bob, (anyone?) Have you tried mmsdrestore to see if its working in 4.1.1? # mmsdrrestore -p PRIMARY -R /usr/bin/scp Fri 3 Jul 11:56:05 BST 2015: mmsdrrestore: Processing node PRIMARY ccrio initialization failed (err 811) mmsdrrestore: Unable to retrieve GPFS cluster files from CCR. mmsdrrestore: Unexpected error from updateMmfsEnvironment. Return code: 1 mmsdrrestore: Command failed. Examine previous error messages to determine cause. It seems to copy the mmsdrfs file to the local node into /var/mmfs/gen/mmsdrfs but then fails to actually work. Simon From: Bob Oesterlin > Reply-To: gpfsug main discussion list > Date: Thursday, 2 July 2015 20:03 To: gpfsug main discussion list > Subject: Re: [gpfsug-discuss] 4.1.1 protocol support On Thu, Jul 2, 2015 at 1:52 PM, Simon Thompson (Research Computing - IT Services) > wrote: I do note that it needs CCR enabled, which we currently don?t have. Now I think this was because we saw issues with mmsdrestore when adding a node that had been reinstalled back into the cluster. I need to check if that is still the case (we work on being able to pull clients, NSDs etc from the cluster and using xcat to reprovision and the a config tool to do the relevant bits to rejoin the cluster ? makes it easier for us to stage kernel, GPFS, OFED updates as we just blat on a new image). Yes, and this is why we couldn't use CCR - our compute nodes are netboot, so they go thru a mmsdrrestore every time they reboot. Now, they have fixed this in 4.1.1, which means if you can get (the cluster) to 4.1.1 and turn on CCR, mmsdrrestore should work. Note to self: Test this out in your sandbox cluster. :-) Bob Oesterlin -------------- next part -------------- An HTML attachment was scrubbed... URL: From S.J.Thompson at bham.ac.uk Fri Jul 3 12:50:31 2015 From: S.J.Thompson at bham.ac.uk (Simon Thompson (Research Computing - IT Services)) Date: Fri, 3 Jul 2015 11:50:31 +0000 Subject: [gpfsug-discuss] mmsdrestore with CCR enabled (was Re: 4.1.1 protocol support) In-Reply-To: References: Message-ID: Actually, no just ignore me, it does appear to be fixed in 4.1.1 * I cleaned up the node by removing the 4.1.1 packages, then cleaned up /var/mmfs, but then when the config tool reinstalled, it put 4.1.0 back on and didn?t apply the updates to 4.1.1, so it must have been an older version of mmsdrrestore Simon From: Simon Thompson > Reply-To: gpfsug main discussion list > Date: Friday, 3 July 2015 12:22 To: gpfsug main discussion list > Subject: [gpfsug-discuss] mmsdrestore with CCR enabled (was Re: 4.1.1 protocol support) Bob, (anyone?) Have you tried mmsdrestore to see if its working in 4.1.1? # mmsdrrestore -p PRIMARY -R /usr/bin/scp Fri 3 Jul 11:56:05 BST 2015: mmsdrrestore: Processing node PRIMARY ccrio initialization failed (err 811) mmsdrrestore: Unable to retrieve GPFS cluster files from CCR. mmsdrrestore: Unexpected error from updateMmfsEnvironment. Return code: 1 mmsdrrestore: Command failed. Examine previous error messages to determine cause. It seems to copy the mmsdrfs file to the local node into /var/mmfs/gen/mmsdrfs but then fails to actually work. Simon From: Bob Oesterlin > Reply-To: gpfsug main discussion list > Date: Thursday, 2 July 2015 20:03 To: gpfsug main discussion list > Subject: Re: [gpfsug-discuss] 4.1.1 protocol support On Thu, Jul 2, 2015 at 1:52 PM, Simon Thompson (Research Computing - IT Services) > wrote: I do note that it needs CCR enabled, which we currently don?t have. Now I think this was because we saw issues with mmsdrestore when adding a node that had been reinstalled back into the cluster. I need to check if that is still the case (we work on being able to pull clients, NSDs etc from the cluster and using xcat to reprovision and the a config tool to do the relevant bits to rejoin the cluster ? makes it easier for us to stage kernel, GPFS, OFED updates as we just blat on a new image). Yes, and this is why we couldn't use CCR - our compute nodes are netboot, so they go thru a mmsdrrestore every time they reboot. Now, they have fixed this in 4.1.1, which means if you can get (the cluster) to 4.1.1 and turn on CCR, mmsdrrestore should work. Note to self: Test this out in your sandbox cluster. :-) Bob Oesterlin -------------- next part -------------- An HTML attachment was scrubbed... URL: From oester at gmail.com Fri Jul 3 13:21:43 2015 From: oester at gmail.com (Bob Oesterlin) Date: Fri, 3 Jul 2015 07:21:43 -0500 Subject: [gpfsug-discuss] mmsdrestore with CCR enabled (was Re: 4.1.1 protocol support) In-Reply-To: References: Message-ID: On Fri, Jul 3, 2015 at 6:22 AM, Simon Thompson (Research Computing - IT Services) wrote: > Have you tried mmsdrestore to see if its working in 4.1.1? Well, no actually :) They told me it was fixed but I have never got 'round to checking it during my beta testing. If it's not, I say submit a PMR and let's get them to fix it - I will do the same. It would be nice to actually use CCR, especially if the new protocol support depends on it. Bob Oesterlin -------------- next part -------------- An HTML attachment was scrubbed... URL: From oester at gmail.com Fri Jul 3 13:22:37 2015 From: oester at gmail.com (Bob Oesterlin) Date: Fri, 3 Jul 2015 07:22:37 -0500 Subject: [gpfsug-discuss] mmsdrestore with CCR enabled (was Re: 4.1.1 protocol support) In-Reply-To: References: Message-ID: On Fri, Jul 3, 2015 at 6:22 AM, Simon Thompson (Research Computing - IT Services) wrote: > Have you tried mmsdrestore to see if its working in 4.1.1? One thing - did you try this on a pure 4.1.1 cluster with release=LATEST? Bob Oesterlin -------------- next part -------------- An HTML attachment was scrubbed... URL: From S.J.Thompson at bham.ac.uk Fri Jul 3 13:28:10 2015 From: S.J.Thompson at bham.ac.uk (Simon Thompson (Research Computing - IT Services)) Date: Fri, 3 Jul 2015 12:28:10 +0000 Subject: [gpfsug-discuss] mmsdrestore with CCR enabled (was Re: 4.1.1 protocol support) In-Reply-To: References: Message-ID: It was on a pure cluster with 4.1.1 only. (I had to do that a precursor to start enabling CES). As I mentioned, I messed up with 4.1.0 client installed so it doesn?t work from a mixed version, but did work from pure 4.1.1 Simon From: Bob Oesterlin > Reply-To: gpfsug main discussion list > Date: Friday, 3 July 2015 13:22 To: gpfsug main discussion list > Subject: Re: [gpfsug-discuss] mmsdrestore with CCR enabled (was Re: 4.1.1 protocol support) On Fri, Jul 3, 2015 at 6:22 AM, Simon Thompson (Research Computing - IT Services) > wrote: Have you tried mmsdrestore to see if its working in 4.1.1? One thing - did you try this on a pure 4.1.1 cluster with release=LATEST? Bob Oesterlin -------------- next part -------------- An HTML attachment was scrubbed... URL: From oehmes at us.ibm.com Fri Jul 3 23:48:38 2015 From: oehmes at us.ibm.com (Sven Oehme) Date: Fri, 3 Jul 2015 15:48:38 -0700 Subject: [gpfsug-discuss] GPFS 4.1.1 without QoS for mmrestripefs? In-Reply-To: <2CDF270206A255459AC4FA6B08E52AF901146351BD@ABCSYSEXC1.abcsystems.ch> References: <2CDF270206A255459AC4FA6B08E52AF90114634DD0@ABCSYSEXC1.abcsystems.ch><201507011422.t61EMZmw011626@d01av01.pok.ibm.com> <2CDF270206A255459AC4FA6B08E52AF901146351BD@ABCSYSEXC1.abcsystems.ch> Message-ID: <201507032249.t63Mnffp025995@d03av03.boulder.ibm.com> this triggers a few questions 1. have you tried running it only on a node that doesn't serve NFS data ? 2. what NFS stack are you using ? is this the kernel NFS Server as part of linux means you use cNFS ? if the answer to 2 is yes, have you adjusted the nfsd threads in /etc/sysconfig/nfs ? the default is only 8 and if you run with the default you have a very low number of threads from the outside competing with a larger number of threads doing restripe, increasing the nfsd threads could help. you could also reduce the number of internal restripe threads to try out if that helps mitigating the impact. to try an extreme low value set the following : mmchconfig pitWorkerThreadsPerNode=1 -i and retry the restripe again, to reset it back to default run mmchconfig pitWorkerThreadsPerNode=DEFAULT -i sven ------------------------------------------ Sven Oehme Scalable Storage Research email: oehmes at us.ibm.com Phone: +1 (408) 824-8904 IBM Almaden Research Lab ------------------------------------------ From: Daniel Vogel To: "'gpfsug main discussion list'" Date: 07/02/2015 12:12 AM Subject: Re: [gpfsug-discuss] GPFS 4.1.1 without QoS for mmrestripefs? Sent by: gpfsug-discuss-bounces at gpfsug.org Sven, Yes I agree, but ?using ?N? to reduce the load helps not really. If I use NFS, for example, as a ESX data store, ESX I/O latency for NFS goes very high, the VM?s hangs. By the way I use SSD PCIe cards, perfect ?mirror speed? but slow I/O on NFS. The GPFS cluster concept I use are different than GSS or traditional FC (shared storage). I use shared nothing with IB (no FPO), many GPFS nodes with NSD?s. I know the need to resync the FS with mmchdisk / mmrestripe will happen more often. The only one feature will help is QoS for the GPFS admin jobs. I hope we are not fare away from this. Thanks, Daniel Von: gpfsug-discuss-bounces at gpfsug.org [ mailto:gpfsug-discuss-bounces at gpfsug.org] Im Auftrag von Sven Oehme Gesendet: Mittwoch, 1. Juli 2015 16:21 An: gpfsug main discussion list Betreff: Re: [gpfsug-discuss] GPFS 4.1.1 without QoS for mmrestripefs? Daniel, as you know, we can't discuss future / confidential items on a mailing list. what i presented as an outlook to future releases hasn't changed from a technical standpoint, we just can't share a release date until we announce it official. there are multiple ways today to limit the impact on restripe and other tasks, the best way to do this is to run the task ( using -N) on a node (or very small number of nodes) that has no performance critical role. while this is not perfect, it should limit the impact significantly. . sven ------------------------------------------ Sven Oehme Scalable Storage Research email: oehmes at us.ibm.com Phone: +1 (408) 824-8904 IBM Almaden Research Lab ------------------------------------------ Inactive hide details for Daniel Vogel ---07/01/2015 03:29:11 AM---Hi Years ago, IBM made some plan to do a implementation "QoSDaniel Vogel ---07/01/2015 03:29:11 AM---Hi Years ago, IBM made some plan to do a implementation "QoS for mmrestripefs, mmdeldisk...". If a " From: Daniel Vogel To: "'gpfsug-discuss at gpfsug.org'" Date: 07/01/2015 03:29 AM Subject: [gpfsug-discuss] GPFS 4.1.1 without QoS for mmrestripefs? Sent by: gpfsug-discuss-bounces at gpfsug.org Hi Years ago, IBM made some plan to do a implementation ?QoS for mmrestripefs, mmdeldisk??. If a ?mmfsrestripe? is running, very poor performance for NFS access. I opened a PMR to ask for QoS in version 4.1.1 (Spectrum Scale). PMR 61309,113,848: I discussed the question of QOS with the development team. These command changes that were noticed are not meant to be used as GA code which is why they are not documented. I cannot provide any further information from the support perspective. Anybody knows about QoS? The last hope was at ?GPFS Workshop Stuttgart M?rz 2015? with Sven Oehme as speaker. Daniel Vogel IT Consultant ABC SYSTEMS AG Hauptsitz Z?rich R?tistrasse 28 CH - 8952 Schlieren T +41 43 433 6 433 D +41 43 433 6 467 http://www.abcsystems.ch ABC - Always Better Concepts. Approved By Customers since 1981. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: From S.J.Thompson at bham.ac.uk Mon Jul 6 11:09:08 2015 From: S.J.Thompson at bham.ac.uk (Simon Thompson (Research Computing - IT Services)) Date: Mon, 6 Jul 2015 10:09:08 +0000 Subject: [gpfsug-discuss] SMB support and config Message-ID: Hi, (sorry, lots of questions about this stuff at the moment!) I?m currently looking at removing the sernet smb configs we had previously and moving to IBM SMB. I?ve removed all the old packages and only now have gpfs.smb installed on the systems. I?m struggling to get the config tools to work for our environment. We have MS Windows AD Domain for authentication. For various reasons, however doesn?t hold the UIDs/GIDs, which are instead held in a different LDAP directory. In the past, we?d configure the Linux servers running Samba so that NSLCD was configured to get details from the LDAP server. (e.g. getent passwd would return the data for an AD user). The Linux boxes would also be configured to use KRB5 authentication where users were allowed to ssh etc in for password authentication. So as far as Samba was concerned, it would do ?security = ADS? and then we?d also have "idmap config * : backend = tdb2? I.e. Use Domain for authentication, but look locally for ID mapping data. Now I can configured IBM SMB to use ADS for authentication: mmuserauth service create --type ad --data-access-method file --netbios-name its-rds --user-name ADMINUSER --servers DOMAIN.ADF --idmap-role subordinate However I can?t see anyway for me to manipulate the config so that it doesn?t use autorid. Using this we end up with: mmsmb config list | grep -i idmap idmap config * : backend autorid idmap config * : range 10000000-299999999 idmap config * : rangesize 1000000 idmap config * : read only yes idmap:cache no It also adds: mmsmb config list | grep -i auth auth methods guest sam winbind (though I don?t think that is a problem). I also can?t change the idmap using the mmsmb command (I think would look like this): # mmsmb config change --option="idmap config * : backend=tdb2" idmap config * : backend=tdb2: [E] Unsupported smb option. More information about smb options is availabe in the man page. I can?t see anything in the docs at: http://www-01.ibm.com/support/knowledgecenter/#!/STXKQY_4.1.1/com.ibm.spect rum.scale.v4r11.adm.doc/bl1adm_configfileauthentication.htm That give me a clue how to do what I want. I?d be happy to do some mixture of AD for authentication and LDAP for lookups (rather than just falling back to ?local? from nslcd), but I can?t see a way to do this, and ?manual? seems to stop ADS authentication in Samba. Anyone got any suggestions? Thanks Simon From kallbac at iu.edu Mon Jul 6 23:06:00 2015 From: kallbac at iu.edu (Kallback-Rose, Kristy A) Date: Mon, 6 Jul 2015 22:06:00 +0000 Subject: [gpfsug-discuss] SMB support and config In-Reply-To: References: Message-ID: <0193637F-4448-4068-A049-3A40A9AFD998@iu.edu> Just to chime in as another interested party, we do something fairly similar but use sssd instead of nslcd. Very interested to see how accommodating the IBM Samba is to local configuration needs. Best, Kristy On Jul 6, 2015, at 6:09 AM, Simon Thompson (Research Computing - IT Services) wrote: > Hi, > > (sorry, lots of questions about this stuff at the moment!) > > I?m currently looking at removing the sernet smb configs we had previously > and moving to IBM SMB. I?ve removed all the old packages and only now have > gpfs.smb installed on the systems. > > I?m struggling to get the config tools to work for our environment. > > We have MS Windows AD Domain for authentication. For various reasons, > however doesn?t hold the UIDs/GIDs, which are instead held in a different > LDAP directory. > > In the past, we?d configure the Linux servers running Samba so that NSLCD > was configured to get details from the LDAP server. (e.g. getent passwd > would return the data for an AD user). The Linux boxes would also be > configured to use KRB5 authentication where users were allowed to ssh etc > in for password authentication. > > So as far as Samba was concerned, it would do ?security = ADS? and then > we?d also have "idmap config * : backend = tdb2? > > I.e. Use Domain for authentication, but look locally for ID mapping data. > > Now I can configured IBM SMB to use ADS for authentication: > > mmuserauth service create --type ad --data-access-method file > --netbios-name its-rds --user-name ADMINUSER --servers DOMAIN.ADF > --idmap-role subordinate > > > However I can?t see anyway for me to manipulate the config so that it > doesn?t use autorid. Using this we end up with: > > mmsmb config list | grep -i idmap > idmap config * : backend autorid > idmap config * : range 10000000-299999999 > idmap config * : rangesize 1000000 > idmap config * : read only yes > idmap:cache no > > > It also adds: > > mmsmb config list | grep -i auth > auth methods guest sam winbind > > (though I don?t think that is a problem). > > > I also can?t change the idmap using the mmsmb command (I think would look > like this): > # mmsmb config change --option="idmap config * : backend=tdb2" > idmap config * : backend=tdb2: [E] Unsupported smb option. More > information about smb options is availabe in the man page. > > > > I can?t see anything in the docs at: > http://www-01.ibm.com/support/knowledgecenter/#!/STXKQY_4.1.1/com.ibm.spect > rum.scale.v4r11.adm.doc/bl1adm_configfileauthentication.htm > > That give me a clue how to do what I want. > > I?d be happy to do some mixture of AD for authentication and LDAP for > lookups (rather than just falling back to ?local? from nslcd), but I can?t > see a way to do this, and ?manual? seems to stop ADS authentication in > Samba. > > Anyone got any suggestions? > > > Thanks > > Simon > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at gpfsug.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss From S.J.Thompson at bham.ac.uk Tue Jul 7 12:39:24 2015 From: S.J.Thompson at bham.ac.uk (Simon Thompson (Research Computing - IT Services)) Date: Tue, 7 Jul 2015 11:39:24 +0000 Subject: [gpfsug-discuss] SMB support and config In-Reply-To: <0193637F-4448-4068-A049-3A40A9AFD998@iu.edu> References: <0193637F-4448-4068-A049-3A40A9AFD998@iu.edu> Message-ID: So based on what I?m seeing ... When you run mmstartup, the start process edits /etc/nsswitch.conf. I?ve managed to make it work in my environment, but I had to edit the file /usr/lpp/mmfs/bin/mmcesop to make it put ldap instead of winbind when it starts up. I also had to do some studious use of "net conf delparm? ? Which is probably not a good idea. I did try using: mmuserauth service create --type userdefined --data-access-method file And the setting the "security = ADS? parameters by hand with "net conf? (can?t do it with mmsmb), and a manual ?net ads join" but I couldn?t get it to authenticate clients properly. I can?t work out why just at the moment. But even then when mmshutdown runs, it still goes ahead and edits /etc/nsswitch.conf I?ve got a ticket open with IBM at the moment via our integrator to see what they say. But I?m not sure I like something going off and poking things like /etc/nsswitch.conf at startup/shutdown. I can sorta see that at config time, but when service start etc, I?m not sure I really like that idea! Simon On 06/07/2015 23:06, "Kallback-Rose, Kristy A" wrote: >Just to chime in as another interested party, we do something fairly >similar but use sssd instead of nslcd. Very interested to see how >accommodating the IBM Samba is to local configuration needs. > >Best, >Kristy > >On Jul 6, 2015, at 6:09 AM, Simon Thompson (Research Computing - IT >Services) wrote: > >> Hi, >> >> (sorry, lots of questions about this stuff at the moment!) >> >> I?m currently looking at removing the sernet smb configs we had >>previously >> and moving to IBM SMB. I?ve removed all the old packages and only now >>have >> gpfs.smb installed on the systems. >> >> I?m struggling to get the config tools to work for our environment. >> >> We have MS Windows AD Domain for authentication. For various reasons, >> however doesn?t hold the UIDs/GIDs, which are instead held in a >>different >> LDAP directory. >> >> In the past, we?d configure the Linux servers running Samba so that >>NSLCD >> was configured to get details from the LDAP server. (e.g. getent passwd >> would return the data for an AD user). The Linux boxes would also be >> configured to use KRB5 authentication where users were allowed to ssh >>etc >> in for password authentication. >> >> So as far as Samba was concerned, it would do ?security = ADS? and then >> we?d also have "idmap config * : backend = tdb2? >> >> I.e. Use Domain for authentication, but look locally for ID mapping >>data. >> >> Now I can configured IBM SMB to use ADS for authentication: >> >> mmuserauth service create --type ad --data-access-method file >> --netbios-name its-rds --user-name ADMINUSER --servers DOMAIN.ADF >> --idmap-role subordinate >> >> >> However I can?t see anyway for me to manipulate the config so that it >> doesn?t use autorid. Using this we end up with: >> >> mmsmb config list | grep -i idmap >> idmap config * : backend autorid >> idmap config * : range 10000000-299999999 >> idmap config * : rangesize 1000000 >> idmap config * : read only yes >> idmap:cache no >> >> >> It also adds: >> >> mmsmb config list | grep -i auth >> auth methods guest sam winbind >> >> (though I don?t think that is a problem). >> >> >> I also can?t change the idmap using the mmsmb command (I think would >>look >> like this): >> # mmsmb config change --option="idmap config * : backend=tdb2" >> idmap config * : backend=tdb2: [E] Unsupported smb option. More >> information about smb options is availabe in the man page. >> >> >> >> I can?t see anything in the docs at: >> >>http://www-01.ibm.com/support/knowledgecenter/#!/STXKQY_4.1.1/com.ibm.spe >>ct >> rum.scale.v4r11.adm.doc/bl1adm_configfileauthentication.htm >> >> That give me a clue how to do what I want. >> >> I?d be happy to do some mixture of AD for authentication and LDAP for >> lookups (rather than just falling back to ?local? from nslcd), but I >>can?t >> see a way to do this, and ?manual? seems to stop ADS authentication in >> Samba. >> >> Anyone got any suggestions? >> >> >> Thanks >> >> Simon >> >> >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at gpfsug.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss > >_______________________________________________ >gpfsug-discuss mailing list >gpfsug-discuss at gpfsug.org >http://gpfsug.org/mailman/listinfo/gpfsug-discuss From TROPPENS at de.ibm.com Thu Jul 9 07:55:24 2015 From: TROPPENS at de.ibm.com (Ulf Troppens) Date: Thu, 9 Jul 2015 08:55:24 +0200 Subject: [gpfsug-discuss] ISC 2015 Message-ID: Anybody at ISC 2015 in Frankfurt next week? I am happy to share my experience with supporting four ESP (a.k.a beta) customers of the new protocol feature. You can find me at the IBM booth (Booth 928). -- IBM Spectrum Scale Development - Client Engagements & Solutions Delivery Consulting IT Specialist Author "Storage Networks Explained" IBM Deutschland Research & Development GmbH Vorsitzende des Aufsichtsrats: Martina Koederitz Gesch?ftsf?hrung: Dirk Wittkopp Sitz der Gesellschaft: B?blingen / Registergericht: Amtsgericht Stuttgart, HRB 243294 From daniel.kidger at uk.ibm.com Thu Jul 9 09:12:51 2015 From: daniel.kidger at uk.ibm.com (Daniel Kidger) Date: Thu, 9 Jul 2015 09:12:51 +0100 Subject: [gpfsug-discuss] ISC 2015 In-Reply-To: Message-ID: <1970894201.4637011436429559512.JavaMail.notes@d06wgw86.portsmouth.uk.ibm.com> Ulf, I am certainly interested. You can find me on the IBM booth too :-) Looking forward to meeting you. Daniel Sent from IBM Verse Ulf Troppens --- [gpfsug-discuss] ISC 2015 --- From:"Ulf Troppens" To:"gpfsug main discussion list" Date:Thu, 9 Jul 2015 08:55Subject:[gpfsug-discuss] ISC 2015 Anybody at ISC 2015 in Frankfurt next week? I am happy to share my experience with supporting four ESP (a.k.a beta) customers of the new protocol feature. You can find me at the IBM booth (Booth 928). -- IBM Spectrum Scale Development - Client Engagements & Solutions Delivery Consulting IT Specialist Author "Storage Networks Explained" IBM Deutschland Research & Development GmbH Vorsitzende des Aufsichtsrats: Martina Koederitz Gesch?ftsf?hrung: Dirk Wittkopp Sitz der Gesellschaft: B?blingen / Registergericht: Amtsgericht Stuttgart, HRB 243294 _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From ewahl at osc.edu Thu Jul 9 15:56:42 2015 From: ewahl at osc.edu (Wahl, Edward) Date: Thu, 9 Jul 2015 14:56:42 +0000 Subject: [gpfsug-discuss] 4.1.1 protocol support In-Reply-To: References: , Message-ID: <9DA9EC7A281AC7428A9618AFDC49049955A5DCC4@CIO-KRC-D1MBX02.osuad.osu.edu> Please please please please PLEASE tell me that support for RHEL 6 is in the plan for protocol nodes. Forcing us to 7 seems rather VERY premature. been out sick a week so I just saw this, FYI. I'd sell my co-workers to test out protocol nodes, but frankly NOT on RHEL 7. Definitely NOT an HPC ready release. ugh. Ed Wahl OSC ________________________________ From: gpfsug-discuss-bounces at gpfsug.org [gpfsug-discuss-bounces at gpfsug.org] on behalf of Bob Oesterlin [oester at gmail.com] Sent: Thursday, July 02, 2015 12:02 PM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] 4.1.1 protocol support Hi Simon I was part of a beta program for GPFS (ok, better start saying Spectrum Scale!) 4.1.1, so I've had some experience with the toolkit that installs the protocol nodes. The new protocol nodes MUST be RH7, so it's going to be a bit more of an involved process to migrate to this level than in the past. The GPFS server nodes/client nodes can remain at RH6 is needed. Overall it works pretty well. You do have the option of doing things manually as well. The guide that describes it is pretty good. If you want to discuss the process in detail, I'd be happy to do so - a bit too much to cover over a mailing list. Bob Oesterlin Sr Storage Engineer, Nuance Communications robert.oesterlin at nuance.com On Thu, Jul 2, 2015 at 10:43 AM, Simon Thompson (Research Computing - IT Services) > wrote: Hi, Just wondering if anyone has looked at the new protocol support stuff in 4.1.1 yet? >From what I can see, it wants to use the installer to add things like IBM Samba onto nodes in the cluster. The docs online seem to list manual installation as running the chef template, which is hardly manual... 1. Id like to know what is being run on my cluster 2. Its an existing install which was using sernet samba, so I don't want to go out and break anything inadvertently 3. My protocol nodes are in a multicluster, and I understand the installer doesn't support multicluster. (the docs state that multicluster isn't supported but something like its expected to work). So... Has anyone had a go at this yet and have a set of steps? I've started unpicking the chef recipe, but just wondering if anyone had already had a go at this? (and lets not start on the mildy bemusing error when you "enable" the service with "mmces service enable" (ces service not enabled) - there's other stuff to enable it)... Simon _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From sdinardo at ebi.ac.uk Fri Jul 10 11:07:28 2015 From: sdinardo at ebi.ac.uk (Salvatore Di Nardo) Date: Fri, 10 Jul 2015 11:07:28 +0100 Subject: [gpfsug-discuss] data interface and management infercace. Message-ID: <559F9960.7010509@ebi.ac.uk> Hello guys. Quite a while ago i mentioned that we have a big expel issue on our gss ( first gen) and white a lot people suggested that the root cause could be that we use the same interface for all the traffic, and that we should split the data network from the admin network. Finally we could plan a downtime and we are migrating the data out so, i can soon safelly play with the change, but looking what exactly i should to do i'm a bit puzzled. Our mmlscluster looks like this: GPFS cluster information ======================== GPFS cluster name: GSS.ebi.ac.uk GPFS cluster id: 17987981184946329605 GPFS UID domain: GSS.ebi.ac.uk Remote shell command: /usr/bin/ssh Remote file copy command: /usr/bin/scp GPFS cluster configuration servers: ----------------------------------- Primary server: gss01a.ebi.ac.uk Secondary server: gss02b.ebi.ac.uk Node Daemon node name IP address Admin node name Designation ----------------------------------------------------------------------- 1 gss01a.ebi.ac.uk 10.7.28.2 gss01a.ebi.ac.uk quorum-manager 2 gss01b.ebi.ac.uk 10.7.28.3 gss01b.ebi.ac.uk quorum-manager 3 gss02a.ebi.ac.uk 10.7.28.67 gss02a.ebi.ac.uk quorum-manager 4 gss02b.ebi.ac.uk 10.7.28.66 gss02b.ebi.ac.uk quorum-manager 5 gss03a.ebi.ac.uk 10.7.28.34 gss03a.ebi.ac.uk quorum-manager 6 gss03b.ebi.ac.uk 10.7.28.35 gss03b.ebi.ac.uk quorum-manager It was my understanding that the "admin node" should use a different interface ( a 1g link copper should be fine), while the daemon node is where the data was passing , so should point to the bonded 10g interfaces. but when i read the mmchnode man page i start to be quite confused. It says: --daemon-interface={hostname | ip_address} Specifies the host name or IP address _*to be used by the GPFS daemons for node-to-node communication*_. The host name or IP address must refer to the communication adapter over which the GPFS daemons communicate. Alias interfaces are not allowed. Use the original address or a name that is resolved by the host command to that original address. --admin-interface={hostname | ip_address} Specifies the name of the node to be used by GPFS administration commands when communicating between nodes. The admin node name must be specified as an IP address or a hostname that is resolved by the host command tothe desired IP address. If the keyword DEFAULT is specified, the admin interface for the node is set to be equal to the daemon interface for the node. What exactly means "node-to node-communications" ? Means DATA or also the "lease renew", and the token communication between the clients to get/steal the locks to be able to manage concurrent write to thr same file? Since we are getting expells ( especially when several clients contends the same file ) i assumed i have to split this type of packages from the data stream, but reading the documentation it looks to me that those internal comunication between nodes use the daemon-interface wich i suppose are used also for the data. so HOW exactly i can split them? _**_ Thanks in advance, Salvatore -------------- next part -------------- An HTML attachment was scrubbed... URL: From secretary at gpfsug.org Fri Jul 10 12:33:48 2015 From: secretary at gpfsug.org (Secretary GPFS UG) Date: Fri, 10 Jul 2015 12:33:48 +0100 Subject: [gpfsug-discuss] Places available: Meet the Devs Message-ID: Dear All, There are a couple of places remaining at the next 'Meet the Devs' event on Wednesday 29th July, 11am-3pm. The event is being held at IBM Warwick. The agenda promises to be hands on and give you the opportunity to speak face to face with the developers of GPFS. Guideline agenda: * Data analytic workloads - development to show and tell UK work on establishing use cases and tighter integration of Spark on top of GPFS * Show the GUI coming in 4.2 * Discuss 4.2 and beyond roadmap * How would you like IP management to work for protocol access? * Optional - Team can demo & discuss NFS/SMB/Object integration into Scale Lunch and refreshments will be provided. Please can you let me know by email if you are interested in attending and I'll register your place. Thanks and we hope to see you there! -- Claire O'Toole (n?e Robson) GPFS User Group Secretary +44 (0)7508 033896 www.gpfsug.org From S.J.Thompson at bham.ac.uk Fri Jul 10 12:59:19 2015 From: S.J.Thompson at bham.ac.uk (Simon Thompson (Research Computing - IT Services)) Date: Fri, 10 Jul 2015 11:59:19 +0000 Subject: [gpfsug-discuss] 4.1.1 protocol support In-Reply-To: <9DA9EC7A281AC7428A9618AFDC49049955A5DCC4@CIO-KRC-D1MBX02.osuad.osu.edu> References: <9DA9EC7A281AC7428A9618AFDC49049955A5DCC4@CIO-KRC-D1MBX02.osuad.osu.edu> Message-ID: Hi Ed, Well, technically: http://www-01.ibm.com/support/knowledgecenter/#!/STXKQY_4.1.1/com.ibm.spectrum.scale.v4r11.ins.doc/bl1ins_protocolsprerequisites.htm Says "The spectrumscale installation toolkit supports Red Hat Enterprise Linux 7.0 and 7.1 platforms on x86_64 and ppc64 architectures" So maybe if you don?t want to use the installer, you don't need RHEL 7. Of course where or not that is supported, only IBM would be able to say ? I?ve only looked at gpfs.smb, but as its provided as a binary RPM, it might or might not work in a 6 environment (it bundles ctdb etc all in). For object, as its a bundle of openstack RPMs, then potentially it won?t work on EL6 depending on the python requirements? And surely you aren?t running protocol support on HPC nodes anyway ... so maybe a few EL7 nodes could work for you? Simon From: , Edward > Reply-To: gpfsug main discussion list > Date: Thursday, 9 July 2015 15:56 To: gpfsug main discussion list > Subject: Re: [gpfsug-discuss] 4.1.1 protocol support Please please please please PLEASE tell me that support for RHEL 6 is in the plan for protocol nodes. Forcing us to 7 seems rather VERY premature. been out sick a week so I just saw this, FYI. I'd sell my co-workers to test out protocol nodes, but frankly NOT on RHEL 7. Definitely NOT an HPC ready release. ugh. Ed Wahl OSC ________________________________ From: gpfsug-discuss-bounces at gpfsug.org [gpfsug-discuss-bounces at gpfsug.org] on behalf of Bob Oesterlin [oester at gmail.com] Sent: Thursday, July 02, 2015 12:02 PM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] 4.1.1 protocol support Hi Simon I was part of a beta program for GPFS (ok, better start saying Spectrum Scale!) 4.1.1, so I've had some experience with the toolkit that installs the protocol nodes. The new protocol nodes MUST be RH7, so it's going to be a bit more of an involved process to migrate to this level than in the past. The GPFS server nodes/client nodes can remain at RH6 is needed. Overall it works pretty well. You do have the option of doing things manually as well. The guide that describes it is pretty good. If you want to discuss the process in detail, I'd be happy to do so - a bit too much to cover over a mailing list. Bob Oesterlin Sr Storage Engineer, Nuance Communications robert.oesterlin at nuance.com On Thu, Jul 2, 2015 at 10:43 AM, Simon Thompson (Research Computing - IT Services) > wrote: Hi, Just wondering if anyone has looked at the new protocol support stuff in 4.1.1 yet? >From what I can see, it wants to use the installer to add things like IBM Samba onto nodes in the cluster. The docs online seem to list manual installation as running the chef template, which is hardly manual... 1. Id like to know what is being run on my cluster 2. Its an existing install which was using sernet samba, so I don't want to go out and break anything inadvertently 3. My protocol nodes are in a multicluster, and I understand the installer doesn't support multicluster. (the docs state that multicluster isn't supported but something like its expected to work). So... Has anyone had a go at this yet and have a set of steps? I've started unpicking the chef recipe, but just wondering if anyone had already had a go at this? (and lets not start on the mildy bemusing error when you "enable" the service with "mmces service enable" (ces service not enabled) - there's other stuff to enable it)... Simon _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From S.J.Thompson at bham.ac.uk Fri Jul 10 13:06:01 2015 From: S.J.Thompson at bham.ac.uk (Simon Thompson (Research Computing - IT Services)) Date: Fri, 10 Jul 2015 12:06:01 +0000 Subject: [gpfsug-discuss] SMB support and config In-Reply-To: <0193637F-4448-4068-A049-3A40A9AFD998@iu.edu> References: <0193637F-4448-4068-A049-3A40A9AFD998@iu.edu> Message-ID: So IBM came back and said what I was doing wasn?t supported. They did say that you can use ?user defined? authentication. Which I?ve got working now on my environment (figured what I was doing wrong, and you can?t use mmsmb to do some of the bits I need for it to work for user defined mode for me...). But I still think it needs a patch to one of the files for CES for use in user defined authentication. (Right now it appears to remove all my ?user defined? settings from nsswitch.conf when you stop CES/GPFS on a node). I?ve supplied my patch to IBM which works for my case, we?ll see what they do about it? (If people are interested, I?ll gather my notes into a blog post). Simon On 06/07/2015 23:06, "Kallback-Rose, Kristy A" wrote: >Just to chime in as another interested party, we do something fairly >similar but use sssd instead of nslcd. Very interested to see how >accommodating the IBM Samba is to local configuration needs. > >Best, >Kristy > >On Jul 6, 2015, at 6:09 AM, Simon Thompson (Research Computing - IT >Services) wrote: > >> Hi, >> >> (sorry, lots of questions about this stuff at the moment!) >> >> I?m currently looking at removing the sernet smb configs we had >>previously >> and moving to IBM SMB. I?ve removed all the old packages and only now >>have >> gpfs.smb installed on the systems. >> >> I?m struggling to get the config tools to work for our environment. >> >> We have MS Windows AD Domain for authentication. For various reasons, >> however doesn?t hold the UIDs/GIDs, which are instead held in a >>different >> LDAP directory. >> >> In the past, we?d configure the Linux servers running Samba so that >>NSLCD >> was configured to get details from the LDAP server. (e.g. getent passwd >> would return the data for an AD user). The Linux boxes would also be >> configured to use KRB5 authentication where users were allowed to ssh >>etc >> in for password authentication. >> >> So as far as Samba was concerned, it would do ?security = ADS? and then >> we?d also have "idmap config * : backend = tdb2? >> >> I.e. Use Domain for authentication, but look locally for ID mapping >>data. >> >> Now I can configured IBM SMB to use ADS for authentication: >> >> mmuserauth service create --type ad --data-access-method file >> --netbios-name its-rds --user-name ADMINUSER --servers DOMAIN.ADF >> --idmap-role subordinate >> >> >> However I can?t see anyway for me to manipulate the config so that it >> doesn?t use autorid. Using this we end up with: >> >> mmsmb config list | grep -i idmap >> idmap config * : backend autorid >> idmap config * : range 10000000-299999999 >> idmap config * : rangesize 1000000 >> idmap config * : read only yes >> idmap:cache no >> >> >> It also adds: >> >> mmsmb config list | grep -i auth >> auth methods guest sam winbind >> >> (though I don?t think that is a problem). >> >> >> I also can?t change the idmap using the mmsmb command (I think would >>look >> like this): >> # mmsmb config change --option="idmap config * : backend=tdb2" >> idmap config * : backend=tdb2: [E] Unsupported smb option. More >> information about smb options is availabe in the man page. >> >> >> >> I can?t see anything in the docs at: >> >>http://www-01.ibm.com/support/knowledgecenter/#!/STXKQY_4.1.1/com.ibm.spe >>ct >> rum.scale.v4r11.adm.doc/bl1adm_configfileauthentication.htm >> >> That give me a clue how to do what I want. >> >> I?d be happy to do some mixture of AD for authentication and LDAP for >> lookups (rather than just falling back to ?local? from nslcd), but I >>can?t >> see a way to do this, and ?manual? seems to stop ADS authentication in >> Samba. >> >> Anyone got any suggestions? >> >> >> Thanks >> >> Simon >> >> >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at gpfsug.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss > >_______________________________________________ >gpfsug-discuss mailing list >gpfsug-discuss at gpfsug.org >http://gpfsug.org/mailman/listinfo/gpfsug-discuss From Daniel.Vogel at abcsystems.ch Fri Jul 10 15:19:11 2015 From: Daniel.Vogel at abcsystems.ch (Daniel Vogel) Date: Fri, 10 Jul 2015 14:19:11 +0000 Subject: [gpfsug-discuss] GPFS 4.1.1 without QoS for mmrestripefs? In-Reply-To: <201507032249.t63Mnffp025995@d03av03.boulder.ibm.com> References: <2CDF270206A255459AC4FA6B08E52AF90114634DD0@ABCSYSEXC1.abcsystems.ch><201507011422.t61EMZmw011626@d01av01.pok.ibm.com> <2CDF270206A255459AC4FA6B08E52AF901146351BD@ABCSYSEXC1.abcsystems.ch> <201507032249.t63Mnffp025995@d03av03.boulder.ibm.com> Message-ID: <2CDF270206A255459AC4FA6B08E52AF90114635E8E@ABCSYSEXC1.abcsystems.ch> For ?1? we use the quorum node to do ?start disk? or ?restripe file system? (quorum node without disks). For ?2? we use kernel NFS with cNFS I used the command ?cnfsNFSDprocs 64? to set the NFS threads. Is this correct? gpfs01:~ # cat /proc/fs/nfsd/threads 64 I will verify the settings in our lab, will use the following configuration: mmchconfig worker1Threads=128 mmchconfig prefetchThreads=128 mmchconfig nsdMaxWorkerThreads=128 mmchconfig cnfsNFSDprocs=256 daniel Von: gpfsug-discuss-bounces at gpfsug.org [mailto:gpfsug-discuss-bounces at gpfsug.org] Im Auftrag von Sven Oehme Gesendet: Samstag, 4. Juli 2015 00:49 An: gpfsug main discussion list Betreff: Re: [gpfsug-discuss] GPFS 4.1.1 without QoS for mmrestripefs? this triggers a few questions 1. have you tried running it only on a node that doesn't serve NFS data ? 2. what NFS stack are you using ? is this the kernel NFS Server as part of linux means you use cNFS ? if the answer to 2 is yes, have you adjusted the nfsd threads in /etc/sysconfig/nfs ? the default is only 8 and if you run with the default you have a very low number of threads from the outside competing with a larger number of threads doing restripe, increasing the nfsd threads could help. you could also reduce the number of internal restripe threads to try out if that helps mitigating the impact. to try an extreme low value set the following : mmchconfig pitWorkerThreadsPerNode=1 -i and retry the restripe again, to reset it back to default run mmchconfig pitWorkerThreadsPerNode=DEFAULT -i sven ------------------------------------------ Sven Oehme Scalable Storage Research email: oehmes at us.ibm.com Phone: +1 (408) 824-8904 IBM Almaden Research Lab ------------------------------------------ [Beschreibung: Inactive hide details for Daniel Vogel ---07/02/2015 12:12:46 AM---Sven, Yes I agree, but ?using ?N? to reduce the load help]Daniel Vogel ---07/02/2015 12:12:46 AM---Sven, Yes I agree, but ?using ?N? to reduce the load helps not really. If I use NFS, for example, as From: Daniel Vogel > To: "'gpfsug main discussion list'" > Date: 07/02/2015 12:12 AM Subject: Re: [gpfsug-discuss] GPFS 4.1.1 without QoS for mmrestripefs? Sent by: gpfsug-discuss-bounces at gpfsug.org ________________________________ Sven, Yes I agree, but ?using ?N? to reduce the load helps not really. If I use NFS, for example, as a ESX data store, ESX I/O latency for NFS goes very high, the VM?s hangs. By the way I use SSD PCIe cards, perfect ?mirror speed? but slow I/O on NFS. The GPFS cluster concept I use are different than GSS or traditional FC (shared storage). I use shared nothing with IB (no FPO), many GPFS nodes with NSD?s. I know the need to resync the FS with mmchdisk / mmrestripe will happen more often. The only one feature will help is QoS for the GPFS admin jobs. I hope we are not fare away from this. Thanks, Daniel Von: gpfsug-discuss-bounces at gpfsug.org [mailto:gpfsug-discuss-bounces at gpfsug.org] Im Auftrag von Sven Oehme Gesendet: Mittwoch, 1. Juli 2015 16:21 An: gpfsug main discussion list Betreff: Re: [gpfsug-discuss] GPFS 4.1.1 without QoS for mmrestripefs? Daniel, as you know, we can't discuss future / confidential items on a mailing list. what i presented as an outlook to future releases hasn't changed from a technical standpoint, we just can't share a release date until we announce it official. there are multiple ways today to limit the impact on restripe and other tasks, the best way to do this is to run the task ( using -N) on a node (or very small number of nodes) that has no performance critical role. while this is not perfect, it should limit the impact significantly. . sven ------------------------------------------ Sven Oehme Scalable Storage Research email: oehmes at us.ibm.com Phone: +1 (408) 824-8904 IBM Almaden Research Lab ------------------------------------------ [Beschreibung: Inactive hide details for Daniel Vogel ---07/01/2015 03:29:11 AM---Hi Years ago, IBM made some plan to do a implementation "QoS]Daniel Vogel ---07/01/2015 03:29:11 AM---Hi Years ago, IBM made some plan to do a implementation "QoS for mmrestripefs, mmdeldisk...". If a " From: Daniel Vogel > To: "'gpfsug-discuss at gpfsug.org'" > Date: 07/01/2015 03:29 AM Subject: [gpfsug-discuss] GPFS 4.1.1 without QoS for mmrestripefs? Sent by: gpfsug-discuss-bounces at gpfsug.org ________________________________ Hi Years ago, IBM made some plan to do a implementation ?QoS for mmrestripefs, mmdeldisk??. If a ?mmfsrestripe? is running, very poor performance for NFS access. I opened a PMR to ask for QoS in version 4.1.1 (Spectrum Scale). PMR 61309,113,848: I discussed the question of QOS with the development team. These command changes that were noticed are not meant to be used as GA code which is why they are not documented. I cannot provide any further information from the support perspective. Anybody knows about QoS? The last hope was at ?GPFS Workshop Stuttgart M?rz 2015? with Sven Oehme as speaker. Daniel Vogel IT Consultant ABC SYSTEMS AG Hauptsitz Z?rich R?tistrasse 28 CH - 8952 Schlieren T +41 43 433 6 433 D +41 43 433 6 467 http://www.abcsystems.ch ABC - Always Better Concepts. Approved By Customers since 1981. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image001.gif Type: image/gif Size: 105 bytes Desc: image001.gif URL: From Kevin.Buterbaugh at Vanderbilt.Edu Fri Jul 10 15:56:04 2015 From: Kevin.Buterbaugh at Vanderbilt.Edu (Buterbaugh, Kevin L) Date: Fri, 10 Jul 2015 14:56:04 +0000 Subject: [gpfsug-discuss] Fwd: GPFS 4.1, NFSv4, and authenticating against AD References: <69C83493-2E22-4B11-BF15-A276DA6D4901@vanderbilt.edu> Message-ID: <55426129-67A0-4071-91F4-715BAC1F0DBE@vanderbilt.edu> Begin forwarded message: From: buterbkl > Subject: GPFS 4.1, NFSv4, and authenticating against AD Date: July 10, 2015 at 9:52:38 AM CDT To: gpfs-general at sdsc.edu Hi All, We are under the (hopefully not mistaken) impression that with GPFS 4.1 supporting NFSv4 it should be possible to have a CNFS setup authenticate against Active Directory as long as you use NFSv4. I also thought that I had seen somewhere (possibly one of the two GPFS related mailing lists I?m on, or in a DeveloperWorks article, or ???) that IBM has published documentation on how to set this up (a kind of cookbook). I?ve done a fair amount of Googling looking for such a document, but I seem to be uniquely talented in not being able to find things with Google! :-( Does anyone know of such a document and could send me the link to it? It would be very helpful to us as I?ve got essentially zero experience with Kerberos (which I think is required to talk to AD) and the institutions? AD environment is managed by a separate department. Thanks in advance? Kevin ? Kevin Buterbaugh - Senior System Administrator Vanderbilt University - Advanced Computing Center for Research and Education Kevin.Buterbaugh at vanderbilt.edu - (615)875-9633 -------------- next part -------------- An HTML attachment was scrubbed... URL: From sdinardo at ebi.ac.uk Mon Jul 13 13:31:18 2015 From: sdinardo at ebi.ac.uk (Salvatore Di Nardo) Date: Mon, 13 Jul 2015 13:31:18 +0100 Subject: [gpfsug-discuss] data interface and management infercace. In-Reply-To: <559F9960.7010509@ebi.ac.uk> References: <559F9960.7010509@ebi.ac.uk> Message-ID: <55A3AF96.3060303@ebi.ac.uk> Anyone? On 10/07/15 11:07, Salvatore Di Nardo wrote: > Hello guys. > Quite a while ago i mentioned that we have a big expel issue on our > gss ( first gen) and white a lot people suggested that the root cause > could be that we use the same interface for all the traffic, and that > we should split the data network from the admin network. Finally we > could plan a downtime and we are migrating the data out so, i can soon > safelly play with the change, but looking what exactly i should to do > i'm a bit puzzled. Our mmlscluster looks like this: > > GPFS cluster information > ======================== > GPFS cluster name: GSS.ebi.ac.uk > GPFS cluster id: 17987981184946329605 > GPFS UID domain: GSS.ebi.ac.uk > Remote shell command: /usr/bin/ssh > Remote file copy command: /usr/bin/scp > > GPFS cluster configuration servers: > ----------------------------------- > Primary server: gss01a.ebi.ac.uk > Secondary server: gss02b.ebi.ac.uk > > Node Daemon node name IP address Admin node name > Designation > ----------------------------------------------------------------------- > 1 gss01a.ebi.ac.uk 10.7.28.2 gss01a.ebi.ac.uk > quorum-manager > 2 gss01b.ebi.ac.uk 10.7.28.3 gss01b.ebi.ac.uk > quorum-manager > 3 gss02a.ebi.ac.uk 10.7.28.67 gss02a.ebi.ac.uk > quorum-manager > 4 gss02b.ebi.ac.uk 10.7.28.66 gss02b.ebi.ac.uk > quorum-manager > 5 gss03a.ebi.ac.uk 10.7.28.34 gss03a.ebi.ac.uk > quorum-manager > 6 gss03b.ebi.ac.uk 10.7.28.35 gss03b.ebi.ac.uk > quorum-manager > > > It was my understanding that the "admin node" should use a different > interface ( a 1g link copper should be fine), while the daemon node is > where the data was passing , so should point to the bonded 10g > interfaces. but when i read the mmchnode man page i start to be quite > confused. It says: > > --daemon-interface={hostname | ip_address} > Specifies the host name or IP address _*to > be used by the GPFS daemons for node-to-node communication*_. The > host name or IP address must refer to the communication adapter over > which the GPFS daemons communicate. > Alias interfaces are not allowed. Use the > original address or a name that is resolved by the host command to > that original address. > > --admin-interface={hostname | ip_address} > Specifies the name of the node to be used by > GPFS administration commands when communicating between nodes. The > admin node name must be specified as an IP address or a hostname that > is resolved by the host command > tothe desired IP address. If the keyword > DEFAULT is specified, the admin interface for the node is set to be > equal to the daemon interface for the node. > > What exactly means "node-to node-communications" ? > Means DATA or also the "lease renew", and the token communication > between the clients to get/steal the locks to be able to manage > concurrent write to thr same file? > Since we are getting expells ( especially when several clients > contends the same file ) i assumed i have to split this type of > packages from the data stream, but reading the documentation it looks > to me that those internal comunication between nodes use the > daemon-interface wich i suppose are used also for the data. so HOW > exactly i can split them? > > > Thanks in advance, > Salvatore > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at gpfsug.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From sdinardo at ebi.ac.uk Mon Jul 13 14:29:50 2015 From: sdinardo at ebi.ac.uk (Salvatore Di Nardo) Date: Mon, 13 Jul 2015 14:29:50 +0100 Subject: [gpfsug-discuss] data interface and management infercace. In-Reply-To: <5269A4E9-416B-4D70-AAE0-B86042FC96B9@ddn.com> References: <559F9960.7010509@ebi.ac.uk> <55A3AF96.3060303@ebi.ac.uk> <5269A4E9-416B-4D70-AAE0-B86042FC96B9@ddn.com> Message-ID: <55A3BD4E.3000205@ebi.ac.uk> Hello Vic. We are currently draining our gpfs to do all the recabling to add a management network, but looking what the admin interface does ( man mmchnode ) it says something different: --admin-interface={hostname | ip_address} Specifies the name of the node to be used by GPFS administration commands when communicating between nodes. The admin node name must be specified as an IP address or a hostname that is resolved by the host command to the desired IP address. If the keyword DEFAULT is specified, the admin interface for the node is set to be equal to the daemon interface for the node. So, seems used only for commands propagation, hence have nothing to do with the node-to-node traffic. Infact the other interface description is: --daemon-interface={hostname | ip_address} Specifies the host name or IP address _*to be used by the GPFS daemons for node-to-node communication*_. The host name or IP address must refer to the commu- nication adapter over which the GPFS daemons communicate. Alias interfaces are not allowed. Use the original address or a name that is resolved by the host command to that original address. The "expired lease" issue and file locking mechanism a( most of our expells happens when 2 clients try to write in the same file) are exactly node-to node-comunication, so im wondering what's the point to separate the "admin network". I want to be sure to plan the right changes before we do a so massive task. We are talking about adding a new interface on 700 clients, so the recabling work its not small. Regards, Salvatore On 13/07/15 14:00, Vic Cornell wrote: > Hi Salavatore, > > Does your GSS have the facility for a 1GbE ?management? network? If so > I think that changing the ?admin? node names of the cluster members to > a set of IPs on the management network would give you the split that > you need. > > What about the clients? Can they also connect to a separate admin network? > > Remember that if you are using multi-cluster all of the nodes in both > networks must share the same admin network. > > Kind Regards, > > Vic > > >> On 13 Jul 2015, at 13:31, Salvatore Di Nardo > > wrote: >> >> Anyone? >> >> On 10/07/15 11:07, Salvatore Di Nardo wrote: >>> Hello guys. >>> Quite a while ago i mentioned that we have a big expel issue on our >>> gss ( first gen) and white a lot people suggested that the root >>> cause could be that we use the same interface for all the traffic, >>> and that we should split the data network from the admin network. >>> Finally we could plan a downtime and we are migrating the data out >>> so, i can soon safelly play with the change, but looking what >>> exactly i should to do i'm a bit puzzled. Our mmlscluster looks like >>> this: >>> >>> GPFS cluster information >>> ======================== >>> GPFS cluster name: GSS.ebi.ac.uk >>> GPFS cluster id: 17987981184946329605 >>> GPFS UID domain: GSS.ebi.ac.uk >>> Remote shell command: /usr/bin/ssh >>> Remote file copy command: /usr/bin/scp >>> >>> GPFS cluster configuration servers: >>> ----------------------------------- >>> Primary server: gss01a.ebi.ac.uk >>> Secondary server: gss02b.ebi.ac.uk >>> >>> Node Daemon node name IP address Admin node >>> name Designation >>> ----------------------------------------------------------------------- >>> 1 gss01a.ebi.ac.uk >>> 10.7.28.2 gss01a.ebi.ac.uk >>> quorum-manager >>> 2 gss01b.ebi.ac.uk >>> 10.7.28.3 gss01b.ebi.ac.uk >>> quorum-manager >>> 3 gss02a.ebi.ac.uk >>> 10.7.28.67 gss02a.ebi.ac.uk >>> quorum-manager >>> 4 gss02b.ebi.ac.uk >>> 10.7.28.66 gss02b.ebi.ac.uk >>> quorum-manager >>> 5 gss03a.ebi.ac.uk >>> 10.7.28.34 gss03a.ebi.ac.uk >>> quorum-manager >>> 6 gss03b.ebi.ac.uk >>> 10.7.28.35 gss03b.ebi.ac.uk >>> quorum-manager >>> >>> >>> It was my understanding that the "admin node" should use a different >>> interface ( a 1g link copper should be fine), while the daemon node >>> is where the data was passing , so should point to the bonded 10g >>> interfaces. but when i read the mmchnode man page i start to be >>> quite confused. It says: >>> >>> --daemon-interface={hostname | ip_address} >>> Specifies the host name or IP address >>> _*to be used by the GPFS daemons for node-to-node communication*_. >>> The host name or IP address must refer to the communication adapter >>> over which the GPFS daemons communicate. >>> Alias interfaces are not allowed. Use the >>> original address or a name that is resolved by the host command to >>> that original address. >>> >>> --admin-interface={hostname | ip_address} >>> Specifies the name of the node to be used >>> by GPFS administration commands when communicating between nodes. >>> The admin node name must be specified as an IP address or a hostname >>> that is resolved by the host command >>> tothe desired IP address. If the keyword >>> DEFAULT is specified, the admin interface for the node is set to be >>> equal to the daemon interface for the node. >>> >>> What exactly means "node-to node-communications" ? >>> Means DATA or also the "lease renew", and the token communication >>> between the clients to get/steal the locks to be able to manage >>> concurrent write to thr same file? >>> Since we are getting expells ( especially when several clients >>> contends the same file ) i assumed i have to split this type of >>> packages from the data stream, but reading the documentation it >>> looks to me that those internal comunication between nodes use the >>> daemon-interface wich i suppose are used also for the data. so HOW >>> exactly i can split them? >>> >>> >>> Thanks in advance, >>> Salvatore >>> >>> >>> >>> _______________________________________________ >>> gpfsug-discuss mailing list >>> gpfsug-discuss atgpfsug.org >>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >> >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at gpfsug.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: From viccornell at gmail.com Mon Jul 13 15:25:32 2015 From: viccornell at gmail.com (Vic Cornell) Date: Mon, 13 Jul 2015 15:25:32 +0100 Subject: [gpfsug-discuss] data interface and management infercace. In-Reply-To: <55A3BD4E.3000205@ebi.ac.uk> References: <559F9960.7010509@ebi.ac.uk> <55A3AF96.3060303@ebi.ac.uk> <5269A4E9-416B-4D70-AAE0-B86042FC96B9@ddn.com> <55A3BD4E.3000205@ebi.ac.uk> Message-ID: Hi Salvatore, I agree that that is what the manual - and some of the wiki entries say. However , when we have had problems (typically congestion) with ethernet networks in the past (20GbE or 40GbE) we have resolved them by setting up a separate ?Admin? network. The before and after cluster health we have seen measured in number of expels and waiters has been very marked. Maybe someone ?in the know? could comment on this split. Regards, Vic > On 13 Jul 2015, at 14:29, Salvatore Di Nardo wrote: > > Hello Vic. > We are currently draining our gpfs to do all the recabling to add a management network, but looking what the admin interface does ( man mmchnode ) it says something different: > > --admin-interface={hostname | ip_address} > Specifies the name of the node to be used by GPFS administration commands when communicating between nodes. The admin node name must be specified as an IP > address or a hostname that is resolved by the host command to the desired IP address. If the keyword DEFAULT is specified, the admin interface for the > node is set to be equal to the daemon interface for the node. > > So, seems used only for commands propagation, hence have nothing to do with the node-to-node traffic. Infact the other interface description is: > > --daemon-interface={hostname | ip_address} > Specifies the host name or IP address to be used by the GPFS daemons for node-to-node communication. The host name or IP address must refer to the commu- > nication adapter over which the GPFS daemons communicate. Alias interfaces are not allowed. Use the original address or a name that is resolved by the > host command to that original address. > > The "expired lease" issue and file locking mechanism a( most of our expells happens when 2 clients try to write in the same file) are exactly node-to node-comunication, so im wondering what's the point to separate the "admin network". I want to be sure to plan the right changes before we do a so massive task. We are talking about adding a new interface on 700 clients, so the recabling work its not small. > > > Regards, > Salvatore > > > > On 13/07/15 14:00, Vic Cornell wrote: >> Hi Salavatore, >> >> Does your GSS have the facility for a 1GbE ?management? network? If so I think that changing the ?admin? node names of the cluster members to a set of IPs on the management network would give you the split that you need. >> >> What about the clients? Can they also connect to a separate admin network? >> >> Remember that if you are using multi-cluster all of the nodes in both networks must share the same admin network. >> >> Kind Regards, >> >> Vic >> >> >>> On 13 Jul 2015, at 13:31, Salvatore Di Nardo > wrote: >>> >>> Anyone? >>> >>> On 10/07/15 11:07, Salvatore Di Nardo wrote: >>>> Hello guys. >>>> Quite a while ago i mentioned that we have a big expel issue on our gss ( first gen) and white a lot people suggested that the root cause could be that we use the same interface for all the traffic, and that we should split the data network from the admin network. Finally we could plan a downtime and we are migrating the data out so, i can soon safelly play with the change, but looking what exactly i should to do i'm a bit puzzled. Our mmlscluster looks like this: >>>> >>>> GPFS cluster information >>>> ======================== >>>> GPFS cluster name: GSS.ebi.ac.uk >>>> GPFS cluster id: 17987981184946329605 >>>> GPFS UID domain: GSS.ebi.ac.uk >>>> Remote shell command: /usr/bin/ssh >>>> Remote file copy command: /usr/bin/scp >>>> >>>> GPFS cluster configuration servers: >>>> ----------------------------------- >>>> Primary server: gss01a.ebi.ac.uk >>>> Secondary server: gss02b.ebi.ac.uk >>>> >>>> Node Daemon node name IP address Admin node name Designation >>>> ----------------------------------------------------------------------- >>>> 1 gss01a.ebi.ac.uk 10.7.28.2 gss01a.ebi.ac.uk quorum-manager >>>> 2 gss01b.ebi.ac.uk 10.7.28.3 gss01b.ebi.ac.uk quorum-manager >>>> 3 gss02a.ebi.ac.uk 10.7.28.67 gss02a.ebi.ac.uk quorum-manager >>>> 4 gss02b.ebi.ac.uk 10.7.28.66 gss02b.ebi.ac.uk quorum-manager >>>> 5 gss03a.ebi.ac.uk 10.7.28.34 gss03a.ebi.ac.uk quorum-manager >>>> 6 gss03b.ebi.ac.uk 10.7.28.35 gss03b.ebi.ac.uk quorum-manager >>>> >>>> It was my understanding that the "admin node" should use a different interface ( a 1g link copper should be fine), while the daemon node is where the data was passing , so should point to the bonded 10g interfaces. but when i read the mmchnode man page i start to be quite confused. It says: >>>> >>>> --daemon-interface={hostname | ip_address} >>>> Specifies the host name or IP address to be used by the GPFS daemons for node-to-node communication. The host name or IP address must refer to the communication adapter over which the GPFS daemons communicate. >>>> Alias interfaces are not allowed. Use the original address or a name that is resolved by the host command to that original address. >>>> >>>> --admin-interface={hostname | ip_address} >>>> Specifies the name of the node to be used by GPFS administration commands when communicating between nodes. The admin node name must be specified as an IP address or a hostname that is resolved by the host command >>>> to the desired IP address. If the keyword DEFAULT is specified, the admin interface for the node is set to be equal to the daemon interface for the node. >>>> >>>> What exactly means "node-to node-communications" ? >>>> Means DATA or also the "lease renew", and the token communication between the clients to get/steal the locks to be able to manage concurrent write to thr same file? >>>> Since we are getting expells ( especially when several clients contends the same file ) i assumed i have to split this type of packages from the data stream, but reading the documentation it looks to me that those internal comunication between nodes use the daemon-interface wich i suppose are used also for the data. so HOW exactly i can split them? >>>> >>>> >>>> Thanks in advance, >>>> Salvatore >>>> >>>> >>>> >>>> _______________________________________________ >>>> gpfsug-discuss mailing list >>>> gpfsug-discuss at gpfsug.org >>>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >>> >>> _______________________________________________ >>> gpfsug-discuss mailing list >>> gpfsug-discuss at gpfsug.org >>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >> > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at gpfsug.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From jhick at lbl.gov Mon Jul 13 16:22:58 2015 From: jhick at lbl.gov (Jason Hick) Date: Mon, 13 Jul 2015 08:22:58 -0700 Subject: [gpfsug-discuss] data interface and management infercace. In-Reply-To: References: <559F9960.7010509@ebi.ac.uk> <55A3AF96.3060303@ebi.ac.uk> <5269A4E9-416B-4D70-AAE0-B86042FC96B9@ddn.com> <55A3BD4E.3000205@ebi.ac.uk> Message-ID: Hi, Yes having separate data and management networks has been critical for us for keeping health monitoring/communication unimpeded by data movement. Not as important, but you can also tune the networks differently (packet sizes, buffer sizes, SAK, etc) which can help. Jason > On Jul 13, 2015, at 7:25 AM, Vic Cornell wrote: > > Hi Salvatore, > > I agree that that is what the manual - and some of the wiki entries say. > > However , when we have had problems (typically congestion) with ethernet networks in the past (20GbE or 40GbE) we have resolved them by setting up a separate ?Admin? network. > > The before and after cluster health we have seen measured in number of expels and waiters has been very marked. > > Maybe someone ?in the know? could comment on this split. > > Regards, > > Vic > > >> On 13 Jul 2015, at 14:29, Salvatore Di Nardo wrote: >> >> Hello Vic. >> We are currently draining our gpfs to do all the recabling to add a management network, but looking what the admin interface does ( man mmchnode ) it says something different: >> >> --admin-interface={hostname | ip_address} >> Specifies the name of the node to be used by GPFS administration commands when communicating between nodes. The admin node name must be specified as an IP >> address or a hostname that is resolved by the host command to the desired IP address. If the keyword DEFAULT is specified, the admin interface for the >> node is set to be equal to the daemon interface for the node. >> >> So, seems used only for commands propagation, hence have nothing to do with the node-to-node traffic. Infact the other interface description is: >> >> --daemon-interface={hostname | ip_address} >> Specifies the host name or IP address to be used by the GPFS daemons for node-to-node communication. The host name or IP address must refer to the commu- >> nication adapter over which the GPFS daemons communicate. Alias interfaces are not allowed. Use the original address or a name that is resolved by the >> host command to that original address. >> >> The "expired lease" issue and file locking mechanism a( most of our expells happens when 2 clients try to write in the same file) are exactly node-to node-comunication, so im wondering what's the point to separate the "admin network". I want to be sure to plan the right changes before we do a so massive task. We are talking about adding a new interface on 700 clients, so the recabling work its not small. >> >> >> Regards, >> Salvatore >> >> >> >>> On 13/07/15 14:00, Vic Cornell wrote: >>> Hi Salavatore, >>> >>> Does your GSS have the facility for a 1GbE ?management? network? If so I think that changing the ?admin? node names of the cluster members to a set of IPs on the management network would give you the split that you need. >>> >>> What about the clients? Can they also connect to a separate admin network? >>> >>> Remember that if you are using multi-cluster all of the nodes in both networks must share the same admin network. >>> >>> Kind Regards, >>> >>> Vic >>> >>> >>>> On 13 Jul 2015, at 13:31, Salvatore Di Nardo wrote: >>>> >>>> Anyone? >>>> >>>>> On 10/07/15 11:07, Salvatore Di Nardo wrote: >>>>> Hello guys. >>>>> Quite a while ago i mentioned that we have a big expel issue on our gss ( first gen) and white a lot people suggested that the root cause could be that we use the same interface for all the traffic, and that we should split the data network from the admin network. Finally we could plan a downtime and we are migrating the data out so, i can soon safelly play with the change, but looking what exactly i should to do i'm a bit puzzled. Our mmlscluster looks like this: >>>>> >>>>> GPFS cluster information >>>>> ======================== >>>>> GPFS cluster name: GSS.ebi.ac.uk >>>>> GPFS cluster id: 17987981184946329605 >>>>> GPFS UID domain: GSS.ebi.ac.uk >>>>> Remote shell command: /usr/bin/ssh >>>>> Remote file copy command: /usr/bin/scp >>>>> >>>>> GPFS cluster configuration servers: >>>>> ----------------------------------- >>>>> Primary server: gss01a.ebi.ac.uk >>>>> Secondary server: gss02b.ebi.ac.uk >>>>> >>>>> Node Daemon node name IP address Admin node name Designation >>>>> ----------------------------------------------------------------------- >>>>> 1 gss01a.ebi.ac.uk 10.7.28.2 gss01a.ebi.ac.uk quorum-manager >>>>> 2 gss01b.ebi.ac.uk 10.7.28.3 gss01b.ebi.ac.uk quorum-manager >>>>> 3 gss02a.ebi.ac.uk 10.7.28.67 gss02a.ebi.ac.uk quorum-manager >>>>> 4 gss02b.ebi.ac.uk 10.7.28.66 gss02b.ebi.ac.uk quorum-manager >>>>> 5 gss03a.ebi.ac.uk 10.7.28.34 gss03a.ebi.ac.uk quorum-manager >>>>> 6 gss03b.ebi.ac.uk 10.7.28.35 gss03b.ebi.ac.uk quorum-manager >>>>> >>>>> It was my understanding that the "admin node" should use a different interface ( a 1g link copper should be fine), while the daemon node is where the data was passing , so should point to the bonded 10g interfaces. but when i read the mmchnode man page i start to be quite confused. It says: >>>>> >>>>> --daemon-interface={hostname | ip_address} >>>>> Specifies the host name or IP address to be used by the GPFS daemons for node-to-node communication. The host name or IP address must refer to the communication adapter over which the GPFS daemons communicate. >>>>> Alias interfaces are not allowed. Use the original address or a name that is resolved by the host command to that original address. >>>>> >>>>> --admin-interface={hostname | ip_address} >>>>> Specifies the name of the node to be used by GPFS administration commands when communicating between nodes. The admin node name must be specified as an IP address or a hostname that is resolved by the host command >>>>> to the desired IP address. If the keyword DEFAULT is specified, the admin interface for the node is set to be equal to the daemon interface for the node. >>>>> >>>>> What exactly means "node-to node-communications" ? >>>>> Means DATA or also the "lease renew", and the token communication between the clients to get/steal the locks to be able to manage concurrent write to thr same file? >>>>> Since we are getting expells ( especially when several clients contends the same file ) i assumed i have to split this type of packages from the data stream, but reading the documentation it looks to me that those internal comunication between nodes use the daemon-interface wich i suppose are used also for the data. so HOW exactly i can split them? >>>>> >>>>> >>>>> Thanks in advance, >>>>> Salvatore >>>>> >>>>> >>>>> >>>>> _______________________________________________ >>>>> gpfsug-discuss mailing list >>>>> gpfsug-discuss at gpfsug.org >>>>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >>>> >>>> _______________________________________________ >>>> gpfsug-discuss mailing list >>>> gpfsug-discuss at gpfsug.org >>>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >> >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at gpfsug.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at gpfsug.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From sdenham at gmail.com Mon Jul 13 17:45:48 2015 From: sdenham at gmail.com (Scott D) Date: Mon, 13 Jul 2015 11:45:48 -0500 Subject: [gpfsug-discuss] data interface and management infercace. Message-ID: I spent a good deal of time exploring this topic when I was at IBM. I think there are two key aspects here; the congestion of the actual interfaces on the [cluster, FS, token] management nodes and competition for other resources like CPU cycles on those nodes. When using a single Ethernet interface (or for that matter IB RDMA + IPoIB over the same interface), at some point the two kinds of traffic begin to conflict. The management traffic being much more time sensitive suffers as a result. One solution is to separate the traffic. For larger clusters though (1000s of nodes), a better solution, that may avoid having to have a 2nd interface on every client node, is to add dedicated nodes as managers and not rely on NSD servers for this. It does cost you some modest servers and GPFS server licenses. My previous client generally used previous-generation retired compute nodes for this job. Scott Date: Mon, 13 Jul 2015 15:25:32 +0100 > From: Vic Cornell > Subject: Re: [gpfsug-discuss] data interface and management infercace. > > Hi Salvatore, > > I agree that that is what the manual - and some of the wiki entries say. > > However , when we have had problems (typically congestion) with ethernet > networks in the past (20GbE or 40GbE) we have resolved them by setting up a > separate ?Admin? network. > > The before and after cluster health we have seen measured in number of > expels and waiters has been very marked. > > Maybe someone ?in the know? could comment on this split. > > Regards, > > Vic > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mhabib73 at gmail.com Mon Jul 13 18:19:36 2015 From: mhabib73 at gmail.com (Muhammad Habib) Date: Mon, 13 Jul 2015 13:19:36 -0400 Subject: [gpfsug-discuss] data interface and management infercace. In-Reply-To: References: <559F9960.7010509@ebi.ac.uk> <55A3AF96.3060303@ebi.ac.uk> <5269A4E9-416B-4D70-AAE0-B86042FC96B9@ddn.com> <55A3BD4E.3000205@ebi.ac.uk> Message-ID: Did you look at "subnets" parameter used with "mmchconfig" command. I think you can use order list of subnets for daemon communication and then actual daemon interface can be used for data transfer. When the GPFS will start it will use actual daemon interface for communication , however , once its started , it will use the IPs from the subnet list whichever coming first in the list. To further validate , you can put network sniffer before you do actual implementation or alternatively you can open a PMR with IBM. If your cluster having expel situation , you may fine tune your cluster e.g. increase ping timeout period , having multiple NSD servers and distributing filesystems across these NSD servers. Also critical servers can have HBA cards installed for direct I/O through fiber. Thanks On Mon, Jul 13, 2015 at 11:22 AM, Jason Hick wrote: > Hi, > > Yes having separate data and management networks has been critical for us > for keeping health monitoring/communication unimpeded by data movement. > > Not as important, but you can also tune the networks differently (packet > sizes, buffer sizes, SAK, etc) which can help. > > Jason > > On Jul 13, 2015, at 7:25 AM, Vic Cornell wrote: > > Hi Salvatore, > > I agree that that is what the manual - and some of the wiki entries say. > > However , when we have had problems (typically congestion) with ethernet > networks in the past (20GbE or 40GbE) we have resolved them by setting up a > separate ?Admin? network. > > The before and after cluster health we have seen measured in number of > expels and waiters has been very marked. > > Maybe someone ?in the know? could comment on this split. > > Regards, > > Vic > > > On 13 Jul 2015, at 14:29, Salvatore Di Nardo wrote: > > Hello Vic. > We are currently draining our gpfs to do all the recabling to add a > management network, but looking what the admin interface does ( man > mmchnode ) it says something different: > > --admin-interface={hostname | ip_address} > Specifies the name of the node to be used by GPFS > administration commands when communicating between nodes. The admin node > name must be specified as an IP > address or a hostname that is resolved by the > host command to the desired IP address. If the keyword DEFAULT is > specified, the admin interface for the > node is set to be equal to the daemon interface > for the node. > > > So, seems used only for commands propagation, hence have nothing to do > with the node-to-node traffic. Infact the other interface description is: > > --daemon-interface={hostname | ip_address} > Specifies the host name or IP address *to be > used by the GPFS daemons for node-to-node communication*. The host name > or IP address must refer to the commu- > nication adapter over which the GPFS daemons > communicate. Alias interfaces are not allowed. Use the original address or > a name that is resolved by the > host command to that original address. > > > The "expired lease" issue and file locking mechanism a( most of our > expells happens when 2 clients try to write in the same file) are exactly > node-to node-comunication, so im wondering what's the point to separate > the "admin network". I want to be sure to plan the right changes before we > do a so massive task. We are talking about adding a new interface on 700 > clients, so the recabling work its not small. > > > Regards, > Salvatore > > > > On 13/07/15 14:00, Vic Cornell wrote: > > Hi Salavatore, > > Does your GSS have the facility for a 1GbE ?management? network? If so I > think that changing the ?admin? node names of the cluster members to a set > of IPs on the management network would give you the split that you need. > > What about the clients? Can they also connect to a separate admin > network? > > Remember that if you are using multi-cluster all of the nodes in both > networks must share the same admin network. > > Kind Regards, > > Vic > > > On 13 Jul 2015, at 13:31, Salvatore Di Nardo wrote: > > Anyone? > > On 10/07/15 11:07, Salvatore Di Nardo wrote: > > Hello guys. > Quite a while ago i mentioned that we have a big expel issue on our gss ( > first gen) and white a lot people suggested that the root cause could be > that we use the same interface for all the traffic, and that we should > split the data network from the admin network. Finally we could plan a > downtime and we are migrating the data out so, i can soon safelly play with > the change, but looking what exactly i should to do i'm a bit puzzled. Our > mmlscluster looks like this: > > GPFS cluster information > ======================== > GPFS cluster name: GSS.ebi.ac.uk > GPFS cluster id: 17987981184946329605 > GPFS UID domain: GSS.ebi.ac.uk > Remote shell command: /usr/bin/ssh > Remote file copy command: /usr/bin/scp > > GPFS cluster configuration servers: > ----------------------------------- > Primary server: gss01a.ebi.ac.uk > Secondary server: gss02b.ebi.ac.uk > > Node Daemon node name IP address Admin node name Designation > ----------------------------------------------------------------------- > 1 gss01a.ebi.ac.uk 10.7.28.2 gss01a.ebi.ac.uk quorum-manager > 2 gss01b.ebi.ac.uk 10.7.28.3 gss01b.ebi.ac.uk quorum-manager > 3 gss02a.ebi.ac.uk 10.7.28.67 gss02a.ebi.ac.uk quorum-manager > 4 gss02b.ebi.ac.uk 10.7.28.66 gss02b.ebi.ac.uk quorum-manager > 5 gss03a.ebi.ac.uk 10.7.28.34 gss03a.ebi.ac.uk quorum-manager > 6 gss03b.ebi.ac.uk 10.7.28.35 gss03b.ebi.ac.uk quorum-manager > > > It was my understanding that the "admin node" should use a different > interface ( a 1g link copper should be fine), while the daemon node is > where the data was passing , so should point to the bonded 10g interfaces. > but when i read the mmchnode man page i start to be quite confused. It says: > > --daemon-interface={hostname | ip_address} > Specifies the host name or IP address *to be > used by the GPFS daemons for node-to-node communication*. The host name > or IP address must refer to the communication adapter over which the GPFS > daemons communicate. > Alias interfaces are not allowed. Use the > original address or a name that is resolved by the host command to that > original address. > > --admin-interface={hostname | ip_address} > Specifies the name of the node to be used by GPFS > administration commands when communicating between nodes. The admin node > name must be specified as an IP address or a hostname that is resolved by > the host command > to the desired IP address. If the keyword > DEFAULT is specified, the admin interface for the node is set to be equal > to the daemon interface for the node. > > What exactly means "node-to node-communications" ? > Means DATA or also the "lease renew", and the token communication between > the clients to get/steal the locks to be able to manage concurrent write to > thr same file? > Since we are getting expells ( especially when several clients contends > the same file ) i assumed i have to split this type of packages from the > data stream, but reading the documentation it looks to me that those > internal comunication between nodes use the daemon-interface wich i suppose > are used also for the data. so HOW exactly i can split them? > > > Thanks in advance, > Salvatore > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at gpfsug.orghttp://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at gpfsug.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at gpfsug.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at gpfsug.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at gpfsug.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > -- This communication contains confidential information intended only for the persons to whom it is addressed. Any other distribution, copying or disclosure is strictly prohibited. If you have received this communication in error, please notify the sender and delete this e-mail message immediately. Le pr?sent message contient des renseignements de nature confidentielle r?serv?s uniquement ? l'usage du destinataire. Toute diffusion, distribution, divulgation, utilisation ou reproduction de la pr?sente communication, et de tout fichier qui y est joint, est strictement interdite. Si vous avez re?u le pr?sent message ?lectronique par erreur, veuillez informer imm?diatement l'exp?diteur et supprimer le message de votre ordinateur et de votre serveur. -------------- next part -------------- An HTML attachment was scrubbed... URL: From oester at gmail.com Mon Jul 13 18:42:47 2015 From: oester at gmail.com (Bob Oesterlin) Date: Mon, 13 Jul 2015 12:42:47 -0500 Subject: [gpfsug-discuss] data interface and management infercace. In-Reply-To: References: Message-ID: Some thoughts on node expels, based on the last 2-3 months of "expel hell" here. We've spent a lot of time looking at this issue, across multiple clusters. A big thanks to IBM for helping us center in on the right issues. First, you need to understand if the expels are due to "expired lease" message, or expels due to "communication issues". It sounds like you are talking about the latter. In the case of nodes being expelled due to communication issues, it's more likely the problem in related to network congestion. This can occur at many levels - the node, the network, or the switch. When it's a communication issue, changing prams like "missed ping timeout" isn't going to help you. The problem for us ended up being that GPFS wasn't getting a response to a periodic "keep alive" poll to the node, and after 300 seconds, it declared the node dead and expelled it. You can tell if this is the issue by starting to look at the RPC waiters just before the expel. If you see something like "Waiting for poll on sock" RPC, that the node is waiting for that periodic poll to return, and it's not seeing it. The response is either lost in the network, sitting on the network queue, or the node is too busy to send it. You may also see RPC's like "waiting for exclusive use of connection" RPC - this is another clear indication of network congestion. Look at the GPFSUG presentions (http://www.gpfsug.org/presentations/) for one by Jason Hick (NERSC) - he also talks about these issues. You need to take a look at net.ipv4.tcp_wmem and net.ipv4.tcp_rmem, especially if you have client nodes that are on slower network interfaces. In our case, it was a number of factors - adjusting these settings, looking at congestion at the switch level, and some physical hardware issues. I would be happy to discuss in more detail (offline) if you want). There are no simple solutions. :-) Bob Oesterlin, Sr Storage Engineer, Nuance Communications robert.oesterlin at nuance.com On Mon, Jul 13, 2015 at 11:45 AM, Scott D wrote: > I spent a good deal of time exploring this topic when I was at IBM. I > think there are two key aspects here; the congestion of the actual > interfaces on the [cluster, FS, token] management nodes and competition for > other resources like CPU cycles on those nodes. When using a single > Ethernet interface (or for that matter IB RDMA + IPoIB over the same > interface), at some point the two kinds of traffic begin to conflict. The > management traffic being much more time sensitive suffers as a result. One > solution is to separate the traffic. For larger clusters though (1000s of > nodes), a better solution, that may avoid having to have a 2nd interface on > every client node, is to add dedicated nodes as managers and not rely on > NSD servers for this. It does cost you some modest servers and GPFS server > licenses. My previous client generally used previous-generation retired > compute nodes for this job. > -------------- next part -------------- An HTML attachment was scrubbed... URL: From hagley at cscs.ch Tue Jul 14 08:31:04 2015 From: hagley at cscs.ch (Hagley Birgit) Date: Tue, 14 Jul 2015 07:31:04 +0000 Subject: [gpfsug-discuss] data interface and management infercace. In-Reply-To: <55A3BD4E.3000205@ebi.ac.uk> References: <559F9960.7010509@ebi.ac.uk> <55A3AF96.3060303@ebi.ac.uk> <5269A4E9-416B-4D70-AAE0-B86042FC96B9@ddn.com>, <55A3BD4E.3000205@ebi.ac.uk> Message-ID: <97B2355E006F044E9B8518711889B13719CF3810@MBX114.d.ethz.ch> Hello Salvatore, as you wrote that you have about 700 clients, maybe also the tuning recommendations for large GPFS clusters are helpful for you. They are on the developerworks GPFS wiki: https://www.ibm.com/developerworks/community/wikis/home?lang=en#!/wiki/General%20Parallel%20File%20System%20%28GPFS%29/page/Best%20Practices%20Network%20Tuning To my experience especially "failureDetectionTime" and "minMissedPingTimeout" may help in case of expelled nodes. In case you use InfiniBand, for RDMA, there also is a "Best Practices RDMA Tuning" page: https://www.ibm.com/developerworks/community/wikis/home?lang=en#!/wiki/General%20Parallel%20File%20System%20%28GPFS%29/page/Best%20Practices%20RDMA%20Tuning Regards Birgit ________________________________ From: gpfsug-discuss-bounces at gpfsug.org [gpfsug-discuss-bounces at gpfsug.org] on behalf of Salvatore Di Nardo [sdinardo at ebi.ac.uk] Sent: Monday, July 13, 2015 3:29 PM To: Vic Cornell Cc: gpfsug main discussion list Subject: Re: [gpfsug-discuss] data interface and management infercace. Hello Vic. We are currently draining our gpfs to do all the recabling to add a management network, but looking what the admin interface does ( man mmchnode ) it says something different: --admin-interface={hostname | ip_address} Specifies the name of the node to be used by GPFS administration commands when communicating between nodes. The admin node name must be specified as an IP address or a hostname that is resolved by the host command to the desired IP address. If the keyword DEFAULT is specified, the admin interface for the node is set to be equal to the daemon interface for the node. So, seems used only for commands propagation, hence have nothing to do with the node-to-node traffic. Infact the other interface description is: --daemon-interface={hostname | ip_address} Specifies the host name or IP address to be used by the GPFS daemons for node-to-node communication. The host name or IP address must refer to the commu- nication adapter over which the GPFS daemons communicate. Alias interfaces are not allowed. Use the original address or a name that is resolved by the host command to that original address. The "expired lease" issue and file locking mechanism a( most of our expells happens when 2 clients try to write in the same file) are exactly node-to node-comunication, so im wondering what's the point to separate the "admin network". I want to be sure to plan the right changes before we do a so massive task. We are talking about adding a new interface on 700 clients, so the recabling work its not small. Regards, Salvatore On 13/07/15 14:00, Vic Cornell wrote: Hi Salavatore, Does your GSS have the facility for a 1GbE ?management? network? If so I think that changing the ?admin? node names of the cluster members to a set of IPs on the management network would give you the split that you need. What about the clients? Can they also connect to a separate admin network? Remember that if you are using multi-cluster all of the nodes in both networks must share the same admin network. Kind Regards, Vic On 13 Jul 2015, at 13:31, Salvatore Di Nardo > wrote: Anyone? On 10/07/15 11:07, Salvatore Di Nardo wrote: Hello guys. Quite a while ago i mentioned that we have a big expel issue on our gss ( first gen) and white a lot people suggested that the root cause could be that we use the same interface for all the traffic, and that we should split the data network from the admin network. Finally we could plan a downtime and we are migrating the data out so, i can soon safelly play with the change, but looking what exactly i should to do i'm a bit puzzled. Our mmlscluster looks like this: GPFS cluster information ======================== GPFS cluster name: GSS.ebi.ac.uk GPFS cluster id: 17987981184946329605 GPFS UID domain: GSS.ebi.ac.uk Remote shell command: /usr/bin/ssh Remote file copy command: /usr/bin/scp GPFS cluster configuration servers: ----------------------------------- Primary server: gss01a.ebi.ac.uk Secondary server: gss02b.ebi.ac.uk Node Daemon node name IP address Admin node name Designation ----------------------------------------------------------------------- 1 gss01a.ebi.ac.uk 10.7.28.2 gss01a.ebi.ac.uk quorum-manager 2 gss01b.ebi.ac.uk 10.7.28.3 gss01b.ebi.ac.uk quorum-manager 3 gss02a.ebi.ac.uk 10.7.28.67 gss02a.ebi.ac.uk quorum-manager 4 gss02b.ebi.ac.uk 10.7.28.66 gss02b.ebi.ac.uk quorum-manager 5 gss03a.ebi.ac.uk 10.7.28.34 gss03a.ebi.ac.uk quorum-manager 6 gss03b.ebi.ac.uk 10.7.28.35 gss03b.ebi.ac.uk quorum-manager It was my understanding that the "admin node" should use a different interface ( a 1g link copper should be fine), while the daemon node is where the data was passing , so should point to the bonded 10g interfaces. but when i read the mmchnode man page i start to be quite confused. It says: --daemon-interface={hostname | ip_address} Specifies the host name or IP address to be used by the GPFS daemons for node-to-node communication. The host name or IP address must refer to the communication adapter over which the GPFS daemons communicate. Alias interfaces are not allowed. Use the original address or a name that is resolved by the host command to that original address. --admin-interface={hostname | ip_address} Specifies the name of the node to be used by GPFS administration commands when communicating between nodes. The admin node name must be specified as an IP address or a hostname that is resolved by the host command to the desired IP address. If the keyword DEFAULT is specified, the admin interface for the node is set to be equal to the daemon interface for the node. What exactly means "node-to node-communications" ? Means DATA or also the "lease renew", and the token communication between the clients to get/steal the locks to be able to manage concurrent write to thr same file? Since we are getting expells ( especially when several clients contends the same file ) i assumed i have to split this type of packages from the data stream, but reading the documentation it looks to me that those internal comunication between nodes use the daemon-interface wich i suppose are used also for the data. so HOW exactly i can split them? Thanks in advance, Salvatore _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From sdinardo at ebi.ac.uk Tue Jul 14 09:15:26 2015 From: sdinardo at ebi.ac.uk (Salvatore Di Nardo) Date: Tue, 14 Jul 2015 09:15:26 +0100 Subject: [gpfsug-discuss] data interface and management infercace. In-Reply-To: <97B2355E006F044E9B8518711889B13719CF3810@MBX114.d.ethz.ch> References: <559F9960.7010509@ebi.ac.uk> <55A3AF96.3060303@ebi.ac.uk> <5269A4E9-416B-4D70-AAE0-B86042FC96B9@ddn.com>, <55A3BD4E.3000205@ebi.ac.uk> <97B2355E006F044E9B8518711889B13719CF3810@MBX114.d.ethz.ch> Message-ID: <55A4C51E.8050606@ebi.ac.uk> Thanks, this has already been done ( without too much success). We need to rearrange the networking and since somebody experience was to add a copper interface for management i want to do the same, so i'm digging a bit to aundertsand the best way yo do it. Regards, Salvatore On 14/07/15 08:31, Hagley Birgit wrote: > Hello Salvatore, > > as you wrote that you have about 700 clients, maybe also the tuning > recommendations for large GPFS clusters are helpful for you. They are > on the developerworks GPFS wiki: > > https://www.ibm.com/developerworks/community/wikis/home?lang=en#!/wiki/General%20Parallel%20File%20System%20%28GPFS%29/page/Best%20Practices%20Network%20Tuning > > > > To my experience especially "failureDetectionTime" and > "minMissedPingTimeout" may help in case of expelled nodes. > > > In case you use InfiniBand, for RDMA, there also is a "Best Practices > RDMA Tuning" page: > > https://www.ibm.com/developerworks/community/wikis/home?lang=en#!/wiki/General%20Parallel%20File%20System%20%28GPFS%29/page/Best%20Practices%20RDMA%20Tuning > > > > > Regards > Birgit > > ------------------------------------------------------------------------ > *From:* gpfsug-discuss-bounces at gpfsug.org > [gpfsug-discuss-bounces at gpfsug.org] on behalf of Salvatore Di Nardo > [sdinardo at ebi.ac.uk] > *Sent:* Monday, July 13, 2015 3:29 PM > *To:* Vic Cornell > *Cc:* gpfsug main discussion list > *Subject:* Re: [gpfsug-discuss] data interface and management infercace. > > Hello Vic. > We are currently draining our gpfs to do all the recabling to add a > management network, but looking what the admin interface does ( man > mmchnode ) it says something different: > > --admin-interface={hostname | ip_address} > Specifies the name of the node to be used by GPFS > administration commands when communicating between nodes. The > admin node name must be specified as an IP > address or a hostname that is resolved by the host command to > the desired IP address. If the keyword DEFAULT is specified, > the admin interface for the > node is set to be equal to the daemon interface for the node. > > > So, seems used only for commands propagation, hence have nothing to > do with the node-to-node traffic. Infact the other interface > description is: > > --daemon-interface={hostname | ip_address} > Specifies the host name or IP address _*to be used by the GPFS > daemons for node-to-node communication*_. The host name or IP > address must refer to the commu- > nication adapter over which the GPFS daemons communicate. > Alias interfaces are not allowed. Use the original address or > a name that is resolved by the > host command to that original address. > > > The "expired lease" issue and file locking mechanism a( most of our > expells happens when 2 clients try to write in the same file) are > exactly node-to node-comunication, so im wondering what's the point to > separate the "admin network". I want to be sure to plan the right > changes before we do a so massive task. We are talking about adding a > new interface on 700 clients, so the recabling work its not small. > > > Regards, > Salvatore > > > > On 13/07/15 14:00, Vic Cornell wrote: >> Hi Salavatore, >> >> Does your GSS have the facility for a 1GbE ?management? network? If >> so I think that changing the ?admin? node names of the cluster >> members to a set of IPs on the management network would give you the >> split that you need. >> >> What about the clients? Can they also connect to a separate admin >> network? >> >> Remember that if you are using multi-cluster all of the nodes in both >> networks must share the same admin network. >> >> Kind Regards, >> >> Vic >> >> >>> On 13 Jul 2015, at 13:31, Salvatore Di Nardo >> > wrote: >>> >>> Anyone? >>> >>> On 10/07/15 11:07, Salvatore Di Nardo wrote: >>>> Hello guys. >>>> Quite a while ago i mentioned that we have a big expel issue on >>>> our gss ( first gen) and white a lot people suggested that the root >>>> cause could be that we use the same interface for all the traffic, >>>> and that we should split the data network from the admin network. >>>> Finally we could plan a downtime and we are migrating the data out >>>> so, i can soon safelly play with the change, but looking what >>>> exactly i should to do i'm a bit puzzled. Our mmlscluster looks >>>> like this: >>>> >>>> GPFS cluster information >>>> ======================== >>>> GPFS cluster name: GSS.ebi.ac.uk >>>> GPFS cluster id: 17987981184946329605 >>>> GPFS UID domain: GSS.ebi.ac.uk >>>> Remote shell command: /usr/bin/ssh >>>> Remote file copy command: /usr/bin/scp >>>> >>>> GPFS cluster configuration servers: >>>> ----------------------------------- >>>> Primary server: gss01a.ebi.ac.uk >>>> Secondary server: gss02b.ebi.ac.uk >>>> >>>> >>>> Node Daemon node name IP address Admin node >>>> name Designation >>>> ----------------------------------------------------------------------- >>>> 1 gss01a.ebi.ac.uk >>>> 10.7.28.2 gss01a.ebi.ac.uk >>>> quorum-manager >>>> 2 gss01b.ebi.ac.uk >>>> 10.7.28.3 gss01b.ebi.ac.uk >>>> quorum-manager >>>> 3 gss02a.ebi.ac.uk >>>> 10.7.28.67 gss02a.ebi.ac.uk >>>> quorum-manager >>>> 4 gss02b.ebi.ac.uk >>>> 10.7.28.66 gss02b.ebi.ac.uk >>>> quorum-manager >>>> 5 gss03a.ebi.ac.uk >>>> 10.7.28.34 gss03a.ebi.ac.uk >>>> quorum-manager >>>> 6 gss03b.ebi.ac.uk >>>> 10.7.28.35 gss03b.ebi.ac.uk >>>> quorum-manager >>>> >>>> >>>> It was my understanding that the "admin node" should use a >>>> different interface ( a 1g link copper should be fine), while the >>>> daemon node is where the data was passing , so should point to the >>>> bonded 10g interfaces. but when i read the mmchnode man page i >>>> start to be quite confused. It says: >>>> >>>> --daemon-interface={hostname | ip_address} >>>> Specifies the host name or IP address _*to be used by the GPFS >>>> daemons for node-to-node communication*_. The host name or IP >>>> address must refer to the communication adapter over which the GPFS >>>> daemons communicate. >>>> Alias interfaces are not allowed. Use the >>>> original address or a name that is resolved by the host command to >>>> that original address. >>>> >>>> --admin-interface={hostname | ip_address} >>>> Specifies the name of the node to be used by GPFS administration >>>> commands when communicating between nodes. The admin node name must >>>> be specified as an IP address or a hostname that is resolved by >>>> the host command >>>> tothe desired IP address. If the keyword >>>> DEFAULT is specified, the admin interface for the node is set to be >>>> equal to the daemon interface for the node. >>>> >>>> What exactly means "node-to node-communications" ? >>>> Means DATA or also the "lease renew", and the token communication >>>> between the clients to get/steal the locks to be able to manage >>>> concurrent write to thr same file? >>>> Since we are getting expells ( especially when several clients >>>> contends the same file ) i assumed i have to split this type of >>>> packages from the data stream, but reading the documentation it >>>> looks to me that those internal comunication between nodes use the >>>> daemon-interface wich i suppose are used also for the data. so HOW >>>> exactly i can split them? >>>> >>>> >>>> Thanks in advance, >>>> Salvatore >>>> >>>> >>>> >>>> _______________________________________________ >>>> gpfsug-discuss mailing list >>>> gpfsug-discuss atgpfsug.org >>>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >>> >>> _______________________________________________ >>> gpfsug-discuss mailing list >>> gpfsug-discuss at gpfsug.org >>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >> > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at gpfsug.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From jtucker at pixitmedia.com Tue Jul 14 16:11:51 2015 From: jtucker at pixitmedia.com (Jez Tucker) Date: Tue, 14 Jul 2015 16:11:51 +0100 Subject: [gpfsug-discuss] Vim highlighting for GPFS available Message-ID: <55A526B7.6080602@pixitmedia.com> Hi everyone, I've released vim highlighting for GPFS policies as a public git repo. https://github.com/arcapix/vim-gpfs Pull requests welcome. Please enjoy your new colourful world. Jez p.s. Apologies to Emacs users. Head of R&D ArcaStream/Pixit Media -- This email is confidential in that it is intended for the exclusive attention of the addressee(s) indicated. If you are not the intended recipient, this email should not be read or disclosed to any other person. Please notify the sender immediately and delete this email from your computer system. Any opinions expressed are not necessarily those of the company from which this email was sent and, whilst to the best of our knowledge no viruses or defects exist, no responsibility can be accepted for any loss or damage arising from its receipt or subsequent use of this email. From jonbernard at gmail.com Wed Jul 15 09:19:49 2015 From: jonbernard at gmail.com (Jon Bernard) Date: Wed, 15 Jul 2015 10:19:49 +0200 Subject: [gpfsug-discuss] GPFS UG 10 Presentations - Sven Oehme In-Reply-To: References: Message-ID: If I may revive this: is trcio publicly available? Jon Bernard On Fri, May 2, 2014 at 5:06 PM, Bob Oesterlin wrote: > It Sven's presentation, he mentions a tools "trcio" (in > /xcat/oehmes/gpfs-clone) > > Where can I find that? > > Bob Oesterlin > > > > On Fri, May 2, 2014 at 9:49 AM, Jez Tucker (Chair) > wrote: > >> Hello all >> >> Firstly, thanks for the feedback we've had so far. Very much >> appreciated. >> >> Secondly, GPFS UG 10 Presentations are now available on the Presentations >> section of the website. >> Any outstanding presentations will follow shortly. >> >> See: http://www.gpfsug.org/ >> >> Best regards, >> >> Jez >> >> UG Chair >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at gpfsug.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >> > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at gpfsug.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From sdinardo at ebi.ac.uk Wed Jul 15 10:19:58 2015 From: sdinardo at ebi.ac.uk (Salvatore Di Nardo) Date: Wed, 15 Jul 2015 10:19:58 +0100 Subject: [gpfsug-discuss] data interface and management infercace. In-Reply-To: References: <559F9960.7010509@ebi.ac.uk> <55A3AF96.3060303@ebi.ac.uk> <5269A4E9-416B-4D70-AAE0-B86042FC96B9@ddn.com> <55A3BD4E.3000205@ebi.ac.uk> Message-ID: <55A625BE.9000809@ebi.ac.uk> Thanks for the input.. this is actually very interesting! Reading here: https://www.ibm.com/developerworks/community/wikis/home?lang=en#!/wiki/General+Parallel+File+System+%28GPFS%29/page/GPFS+Network+Communication+Overview , specifically the " Using more than one network" part it seems to me that this way we should be able to split the lease/token/ping from the data. Supposing that I implement a GSS cluster with only NDS and a second cluster with only clients: As far i understood if on the NDS cluster add first the subnet 10.20.0.0/16 and then 10.30.0.0 is should use the internal network for all the node-to-node comunication, leaving the 10.30.0.0/30 only for data traffic witht he remote cluster ( the clients). Similarly, in the client cluster, adding first 10.10.0.0/16 and then 10.30.0.0, will guarantee than the node-to-node comunication pass trough a different interface there the data is passing. Since the client are just "clients" the traffic trough 10.10.0.0/16 should be minimal (only token ,lease, ping and so on ) and not affected by the rest. Should be possible at this point move aldo the "admin network" on the internal interface, so we effectively splitted all the "non data" traffic on a dedicated interface. I'm wondering if I'm missing something, and in case i didn't, what could be the real traffic in the internal (black) networks ( 1g link its fine or i still need 10g for that). Another thing I I'm wondering its the load of the "non data" traffic between the clusters.. i suppose some "daemon traffic" goes trough the blue interface for the inter-cluster communication. Any thoughts ? Salvatore On 13/07/15 18:19, Muhammad Habib wrote: > Did you look at "subnets" parameter used with "mmchconfig" command. I > think you can use order list of subnets for daemon communication and > then actual daemon interface can be used for data transfer. When the > GPFS will start it will use actual daemon interface for communication > , however , once its started , it will use the IPs from the subnet > list whichever coming first in the list. To further validate , you > can put network sniffer before you do actual implementation or > alternatively you can open a PMR with IBM. > > If your cluster having expel situation , you may fine tune your > cluster e.g. increase ping timeout period , having multiple NSD > servers and distributing filesystems across these NSD servers. Also > critical servers can have HBA cards installed for direct I/O through > fiber. > > Thanks > > On Mon, Jul 13, 2015 at 11:22 AM, Jason Hick > wrote: > > Hi, > > Yes having separate data and management networks has been critical > for us for keeping health monitoring/communication unimpeded by > data movement. > > Not as important, but you can also tune the networks differently > (packet sizes, buffer sizes, SAK, etc) which can help. > > Jason > > On Jul 13, 2015, at 7:25 AM, Vic Cornell > wrote: > >> Hi Salvatore, >> >> I agree that that is what the manual - and some of the wiki >> entries say. >> >> However , when we have had problems (typically congestion) with >> ethernet networks in the past (20GbE or 40GbE) we have resolved >> them by setting up a separate ?Admin? network. >> >> The before and after cluster health we have seen measured in >> number of expels and waiters has been very marked. >> >> Maybe someone ?in the know? could comment on this split. >> >> Regards, >> >> Vic >> >> >>> On 13 Jul 2015, at 14:29, Salvatore Di Nardo >> > wrote: >>> >>> Hello Vic. >>> We are currently draining our gpfs to do all the recabling to >>> add a management network, but looking what the admin interface >>> does ( man mmchnode ) it says something different: >>> >>> --admin-interface={hostname | ip_address} >>> Specifies the name of the node to be used by GPFS >>> administration commands when communicating between >>> nodes. The admin node name must be specified as an IP >>> address or a hostname that is resolved by the host >>> command to the desired IP address. If the keyword >>> DEFAULT is specified, the admin interface for the >>> node is set to be equal to the daemon interface for the >>> node. >>> >>> >>> So, seems used only for commands propagation, hence have >>> nothing to do with the node-to-node traffic. Infact the other >>> interface description is: >>> >>> --daemon-interface={hostname | ip_address} >>> Specifies the host name or IP address _*to be used by >>> the GPFS daemons for node-to-node communication*_. The >>> host name or IP address must refer to the commu- >>> nication adapter over which the GPFS daemons >>> communicate. Alias interfaces are not allowed. Use the >>> original address or a name that is resolved by the >>> host command to that original address. >>> >>> >>> The "expired lease" issue and file locking mechanism a( most of >>> our expells happens when 2 clients try to write in the same >>> file) are exactly node-to node-comunication, so im wondering >>> what's the point to separate the "admin network". I want to be >>> sure to plan the right changes before we do a so massive task. >>> We are talking about adding a new interface on 700 clients, so >>> the recabling work its not small. >>> >>> >>> Regards, >>> Salvatore >>> >>> >>> >>> On 13/07/15 14:00, Vic Cornell wrote: >>>> Hi Salavatore, >>>> >>>> Does your GSS have the facility for a 1GbE ?management? >>>> network? If so I think that changing the ?admin? node names of >>>> the cluster members to a set of IPs on the management network >>>> would give you the split that you need. >>>> >>>> What about the clients? Can they also connect to a separate >>>> admin network? >>>> >>>> Remember that if you are using multi-cluster all of the nodes >>>> in both networks must share the same admin network. >>>> >>>> Kind Regards, >>>> >>>> Vic >>>> >>>> >>>>> On 13 Jul 2015, at 13:31, Salvatore Di Nardo >>>>> > wrote: >>>>> >>>>> Anyone? >>>>> >>>>> On 10/07/15 11:07, Salvatore Di Nardo wrote: >>>>>> Hello guys. >>>>>> Quite a while ago i mentioned that we have a big expel issue >>>>>> on our gss ( first gen) and white a lot people suggested that >>>>>> the root cause could be that we use the same interface for >>>>>> all the traffic, and that we should split the data network >>>>>> from the admin network. Finally we could plan a downtime and >>>>>> we are migrating the data out so, i can soon safelly play >>>>>> with the change, but looking what exactly i should to do i'm >>>>>> a bit puzzled. Our mmlscluster looks like this: >>>>>> >>>>>> GPFS cluster information >>>>>> ======================== >>>>>> GPFS cluster name: GSS.ebi.ac.uk >>>>>> >>>>>> GPFS cluster id: 17987981184946329605 >>>>>> GPFS UID domain: GSS.ebi.ac.uk >>>>>> >>>>>> Remote shell command: /usr/bin/ssh >>>>>> Remote file copy command: /usr/bin/scp >>>>>> >>>>>> GPFS cluster configuration servers: >>>>>> ----------------------------------- >>>>>> Primary server: gss01a.ebi.ac.uk >>>>>> >>>>>> Secondary server: gss02b.ebi.ac.uk >>>>>> >>>>>> >>>>>> Node Daemon node name IP address Admin node >>>>>> name Designation >>>>>> ----------------------------------------------------------------------- >>>>>> 1 gss01a.ebi.ac.uk >>>>>> 10.7.28.2 gss01a.ebi.ac.uk >>>>>> quorum-manager >>>>>> 2 gss01b.ebi.ac.uk >>>>>> 10.7.28.3 gss01b.ebi.ac.uk >>>>>> quorum-manager >>>>>> 3 gss02a.ebi.ac.uk >>>>>> 10.7.28.67 gss02a.ebi.ac.uk >>>>>> quorum-manager >>>>>> 4 gss02b.ebi.ac.uk >>>>>> 10.7.28.66 gss02b.ebi.ac.uk >>>>>> quorum-manager >>>>>> 5 gss03a.ebi.ac.uk >>>>>> 10.7.28.34 gss03a.ebi.ac.uk >>>>>> quorum-manager >>>>>> 6 gss03b.ebi.ac.uk >>>>>> 10.7.28.35 gss03b.ebi.ac.uk >>>>>> quorum-manager >>>>>> >>>>>> >>>>>> It was my understanding that the "admin node" should use a >>>>>> different interface ( a 1g link copper should be fine), while >>>>>> the daemon node is where the data was passing , so should >>>>>> point to the bonded 10g interfaces. but when i read the >>>>>> mmchnode man page i start to be quite confused. It says: >>>>>> >>>>>> --daemon-interface={hostname | ip_address} >>>>>> Specifies the host name or IP address _*to be used by the >>>>>> GPFS daemons for node-to-node communication*_. The host name >>>>>> or IP address must refer to the communication adapter over >>>>>> which the GPFS daemons communicate. >>>>>> Alias interfaces are not allowed. Use the original address or >>>>>> a name that is resolved by the host command to that original >>>>>> address. >>>>>> >>>>>> --admin-interface={hostname | ip_address} >>>>>> Specifies the name of the node to be used by GPFS >>>>>> administration commands when communicating between nodes. The >>>>>> admin node name must be specified as an IP address or a >>>>>> hostname that is resolved by the host command >>>>>> tothe desired IP address. If the >>>>>> keyword DEFAULT is specified, the admin interface for the >>>>>> node is set to be equal to the daemon interface for the node. >>>>>> >>>>>> What exactly means "node-to node-communications" ? >>>>>> Means DATA or also the "lease renew", and the token >>>>>> communication between the clients to get/steal the locks to >>>>>> be able to manage concurrent write to thr same file? >>>>>> Since we are getting expells ( especially when several >>>>>> clients contends the same file ) i assumed i have to split >>>>>> this type of packages from the data stream, but reading the >>>>>> documentation it looks to me that those internal comunication >>>>>> between nodes use the daemon-interface wich i suppose are >>>>>> used also for the data. so HOW exactly i can split them? >>>>>> >>>>>> >>>>>> Thanks in advance, >>>>>> Salvatore >>>>>> >>>>>> >>>>>> >>>>>> _______________________________________________ >>>>>> gpfsug-discuss mailing list >>>>>> gpfsug-discuss atgpfsug.org >>>>>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >>>>> >>>>> _______________________________________________ >>>>> gpfsug-discuss mailing list >>>>> gpfsug-discuss at gpfsug.org >>>>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >>>> >>> >>> _______________________________________________ >>> gpfsug-discuss mailing list >>> gpfsug-discuss at gpfsug.org >>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >> >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at gpfsug.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at gpfsug.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > > -- > This communication contains confidential information intended only for > the persons to whom it is addressed. Any other distribution, copying > or disclosure is strictly prohibited. If you have received this > communication in error, please notify the sender and delete this > e-mail message immediately. > > Le pr?sent message contient des renseignements de nature > confidentielle r?serv?s uniquement ? l'usage du destinataire. Toute > diffusion, distribution, divulgation, utilisation ou reproduction de > la pr?sente communication, et de tout fichier qui y est joint, est > strictement interdite. Si vous avez re?u le pr?sent message > ?lectronique par erreur, veuillez informer imm?diatement l'exp?diteur > et supprimer le message de votre ordinateur et de votre serveur. > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at gpfsug.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: gpfs.jpg Type: image/jpeg Size: 28904 bytes Desc: not available URL: From oehmes at gmail.com Wed Jul 15 15:33:11 2015 From: oehmes at gmail.com (Sven Oehme) Date: Wed, 15 Jul 2015 14:33:11 +0000 Subject: [gpfsug-discuss] GPFS UG 10 Presentations - Sven Oehme In-Reply-To: References: Message-ID: Hi Jon, the answer is no, its an development internal tool. sven On Wed, Jul 15, 2015 at 1:20 AM Jon Bernard wrote: > If I may revive this: is trcio publicly available? > > Jon Bernard > > On Fri, May 2, 2014 at 5:06 PM, Bob Oesterlin wrote: > >> It Sven's presentation, he mentions a tools "trcio" (in >> /xcat/oehmes/gpfs-clone) >> >> Where can I find that? >> >> Bob Oesterlin >> >> >> >> On Fri, May 2, 2014 at 9:49 AM, Jez Tucker (Chair) >> wrote: >> >>> Hello all >>> >>> Firstly, thanks for the feedback we've had so far. Very much >>> appreciated. >>> >>> Secondly, GPFS UG 10 Presentations are now available on the >>> Presentations section of the website. >>> Any outstanding presentations will follow shortly. >>> >>> See: http://www.gpfsug.org/ >>> >>> Best regards, >>> >>> Jez >>> >>> UG Chair >>> _______________________________________________ >>> gpfsug-discuss mailing list >>> gpfsug-discuss at gpfsug.org >>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >>> >> >> >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at gpfsug.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >> >> > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at gpfsug.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ewahl at osc.edu Wed Jul 15 15:37:57 2015 From: ewahl at osc.edu (Wahl, Edward) Date: Wed, 15 Jul 2015 14:37:57 +0000 Subject: [gpfsug-discuss] data interface and management infercace. In-Reply-To: <55A625BE.9000809@ebi.ac.uk> References: <559F9960.7010509@ebi.ac.uk> <55A3AF96.3060303@ebi.ac.uk> <5269A4E9-416B-4D70-AAE0-B86042FC96B9@ddn.com> <55A3BD4E.3000205@ebi.ac.uk> , <55A625BE.9000809@ebi.ac.uk> Message-ID: <9DA9EC7A281AC7428A9618AFDC49049955A606E4@CIO-KRC-D1MBX02.osuad.osu.edu> I don't see this in the thread but perhaps I missed it, what version are you running? I'm still on 3.5 so this is all based on that. A few notes for a little "heads up" here hoping to help with the pitfalls. I seem to recall a number of caveats when I did this a while back. Such as using the 'subnets' option being discussed, stops GPFS from failing over to other TCP networks when there are failures. VERY important! 'mmdiag --network' will show your setup. Definitely verify this if failing downwards is in your plans. We fail from 56Gb RDMA->10GbE TCP-> 1GbE here. And having had it work during some bad power events last year it was VERY nice that the users only noticed a slowdown when we completely lost Lustre and other resources. Also I recall that there was a restriction on having multiple private networks, and some special switch to force this. I have a note about "privateSubnetOverride" so you might read up about this. I seem to recall this was for TCP connections and daemonnodename being a private IP. Or maybe it was that AND mmlscluster having private IPs as well? I think the developerworks wiki had some writeup on this. I don't see it in the admin manuals. Hopefully this may help as you plan this out. Ed Wahl OSC ________________________________ From: gpfsug-discuss-bounces at gpfsug.org [gpfsug-discuss-bounces at gpfsug.org] on behalf of Salvatore Di Nardo [sdinardo at ebi.ac.uk] Sent: Wednesday, July 15, 2015 5:19 AM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] data interface and management infercace. Thanks for the input.. this is actually very interesting! Reading here: https://www.ibm.com/developerworks/community/wikis/home?lang=en#!/wiki/General+Parallel+File+System+%28GPFS%29/page/GPFS+Network+Communication+Overview , specifically the " Using more than one network" part it seems to me that this way we should be able to split the lease/token/ping from the data. Supposing that I implement a GSS cluster with only NDS and a second cluster with only clients: [cid:part1.03040109.00080709 at ebi.ac.uk] As far i understood if on the NDS cluster add first the subnet 10.20.0.0/16 and then 10.30.0.0 is should use the internal network for all the node-to-node comunication, leaving the 10.30.0.0/30 only for data traffic witht he remote cluster ( the clients). Similarly, in the client cluster, adding first 10.10.0.0/16 and then 10.30.0.0, will guarantee than the node-to-node comunication pass trough a different interface there the data is passing. Since the client are just "clients" the traffic trough 10.10.0.0/16 should be minimal (only token ,lease, ping and so on ) and not affected by the rest. Should be possible at this point move aldo the "admin network" on the internal interface, so we effectively splitted all the "non data" traffic on a dedicated interface. I'm wondering if I'm missing something, and in case i didn't, what could be the real traffic in the internal (black) networks ( 1g link its fine or i still need 10g for that). Another thing I I'm wondering its the load of the "non data" traffic between the clusters.. i suppose some "daemon traffic" goes trough the blue interface for the inter-cluster communication. Any thoughts ? Salvatore On 13/07/15 18:19, Muhammad Habib wrote: Did you look at "subnets" parameter used with "mmchconfig" command. I think you can use order list of subnets for daemon communication and then actual daemon interface can be used for data transfer. When the GPFS will start it will use actual daemon interface for communication , however , once its started , it will use the IPs from the subnet list whichever coming first in the list. To further validate , you can put network sniffer before you do actual implementation or alternatively you can open a PMR with IBM. If your cluster having expel situation , you may fine tune your cluster e.g. increase ping timeout period , having multiple NSD servers and distributing filesystems across these NSD servers. Also critical servers can have HBA cards installed for direct I/O through fiber. Thanks On Mon, Jul 13, 2015 at 11:22 AM, Jason Hick > wrote: Hi, Yes having separate data and management networks has been critical for us for keeping health monitoring/communication unimpeded by data movement. Not as important, but you can also tune the networks differently (packet sizes, buffer sizes, SAK, etc) which can help. Jason On Jul 13, 2015, at 7:25 AM, Vic Cornell > wrote: Hi Salvatore, I agree that that is what the manual - and some of the wiki entries say. However , when we have had problems (typically congestion) with ethernet networks in the past (20GbE or 40GbE) we have resolved them by setting up a separate ?Admin? network. The before and after cluster health we have seen measured in number of expels and waiters has been very marked. Maybe someone ?in the know? could comment on this split. Regards, Vic On 13 Jul 2015, at 14:29, Salvatore Di Nardo > wrote: Hello Vic. We are currently draining our gpfs to do all the recabling to add a management network, but looking what the admin interface does ( man mmchnode ) it says something different: --admin-interface={hostname | ip_address} Specifies the name of the node to be used by GPFS administration commands when communicating between nodes. The admin node name must be specified as an IP address or a hostname that is resolved by the host command to the desired IP address. If the keyword DEFAULT is specified, the admin interface for the node is set to be equal to the daemon interface for the node. So, seems used only for commands propagation, hence have nothing to do with the node-to-node traffic. Infact the other interface description is: --daemon-interface={hostname | ip_address} Specifies the host name or IP address to be used by the GPFS daemons for node-to-node communication. The host name or IP address must refer to the commu- nication adapter over which the GPFS daemons communicate. Alias interfaces are not allowed. Use the original address or a name that is resolved by the host command to that original address. The "expired lease" issue and file locking mechanism a( most of our expells happens when 2 clients try to write in the same file) are exactly node-to node-comunication, so im wondering what's the point to separate the "admin network". I want to be sure to plan the right changes before we do a so massive task. We are talking about adding a new interface on 700 clients, so the recabling work its not small. Regards, Salvatore On 13/07/15 14:00, Vic Cornell wrote: Hi Salavatore, Does your GSS have the facility for a 1GbE ?management? network? If so I think that changing the ?admin? node names of the cluster members to a set of IPs on the management network would give you the split that you need. What about the clients? Can they also connect to a separate admin network? Remember that if you are using multi-cluster all of the nodes in both networks must share the same admin network. Kind Regards, Vic On 13 Jul 2015, at 13:31, Salvatore Di Nardo > wrote: Anyone? On 10/07/15 11:07, Salvatore Di Nardo wrote: Hello guys. Quite a while ago i mentioned that we have a big expel issue on our gss ( first gen) and white a lot people suggested that the root cause could be that we use the same interface for all the traffic, and that we should split the data network from the admin network. Finally we could plan a downtime and we are migrating the data out so, i can soon safelly play with the change, but looking what exactly i should to do i'm a bit puzzled. Our mmlscluster looks like this: GPFS cluster information ======================== GPFS cluster name: GSS.ebi.ac.uk GPFS cluster id: 17987981184946329605 GPFS UID domain: GSS.ebi.ac.uk Remote shell command: /usr/bin/ssh Remote file copy command: /usr/bin/scp GPFS cluster configuration servers: ----------------------------------- Primary server: gss01a.ebi.ac.uk Secondary server: gss02b.ebi.ac.uk Node Daemon node name IP address Admin node name Designation ----------------------------------------------------------------------- 1 gss01a.ebi.ac.uk 10.7.28.2 gss01a.ebi.ac.uk quorum-manager 2 gss01b.ebi.ac.uk 10.7.28.3 gss01b.ebi.ac.uk quorum-manager 3 gss02a.ebi.ac.uk 10.7.28.67 gss02a.ebi.ac.uk quorum-manager 4 gss02b.ebi.ac.uk 10.7.28.66 gss02b.ebi.ac.uk quorum-manager 5 gss03a.ebi.ac.uk 10.7.28.34 gss03a.ebi.ac.uk quorum-manager 6 gss03b.ebi.ac.uk 10.7.28.35 gss03b.ebi.ac.uk quorum-manager It was my understanding that the "admin node" should use a different interface ( a 1g link copper should be fine), while the daemon node is where the data was passing , so should point to the bonded 10g interfaces. but when i read the mmchnode man page i start to be quite confused. It says: --daemon-interface={hostname | ip_address} Specifies the host name or IP address to be used by the GPFS daemons for node-to-node communication. The host name or IP address must refer to the communication adapter over which the GPFS daemons communicate. Alias interfaces are not allowed. Use the original address or a name that is resolved by the host command to that original address. --admin-interface={hostname | ip_address} Specifies the name of the node to be used by GPFS administration commands when communicating between nodes. The admin node name must be specified as an IP address or a hostname that is resolved by the host command to the desired IP address. If the keyword DEFAULT is specified, the admin interface for the node is set to be equal to the daemon interface for the node. What exactly means "node-to node-communications" ? Means DATA or also the "lease renew", and the token communication between the clients to get/steal the locks to be able to manage concurrent write to thr same file? Since we are getting expells ( especially when several clients contends the same file ) i assumed i have to split this type of packages from the data stream, but reading the documentation it looks to me that those internal comunication between nodes use the daemon-interface wich i suppose are used also for the data. so HOW exactly i can split them? Thanks in advance, Salvatore _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -- This communication contains confidential information intended only for the persons to whom it is addressed. Any other distribution, copying or disclosure is strictly prohibited. If you have received this communication in error, please notify the sender and delete this e-mail message immediately. Le pr?sent message contient des renseignements de nature confidentielle r?serv?s uniquement ? l'usage du destinataire. Toute diffusion, distribution, divulgation, utilisation ou reproduction de la pr?sente communication, et de tout fichier qui y est joint, est strictement interdite. Si vous avez re?u le pr?sent message ?lectronique par erreur, veuillez informer imm?diatement l'exp?diteur et supprimer le message de votre ordinateur et de votre serveur. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: gpfs.jpg Type: image/jpeg Size: 28904 bytes Desc: gpfs.jpg URL: From S.J.Thompson at bham.ac.uk Sun Jul 19 11:45:09 2015 From: S.J.Thompson at bham.ac.uk (Simon Thompson (Research Computing - IT Services)) Date: Sun, 19 Jul 2015 10:45:09 +0000 Subject: [gpfsug-discuss] 4.1.1 immutable filesets Message-ID: I was wondering if anyone had looked at the immutable fileset features in 4.1.1? In particular I was looking at the iam compliant mode, but I've a couple of questions. * if I have an iam compliant fileset, and it contains immutable files or directories, can I still unlink and delete the filset? * will HSM work with immutable files? I.e. Can I migrate files to tape and restore them? The docs mention that extended attributes can be updated internally by dmapi, so I guess HSM might work? Thanks Simon From kraemerf at de.ibm.com Sun Jul 19 13:45:35 2015 From: kraemerf at de.ibm.com (Frank Kraemer) Date: Sun, 19 Jul 2015 14:45:35 +0200 Subject: [gpfsug-discuss] Immutable fileset features In-Reply-To: References: Message-ID: >I was wondering if anyone had looked at the immutable fileset features in 4.1.1? yes, Nils Haustein has see: https://www.ibm.com/developerworks/community/blogs/storageneers/entry/Insight_to_the_IBM_Spectrum_Scale_GPFS_Immutability_function Frank Kraemer IBM Consulting IT Specialist / Client Technical Architect Hechtsheimer Str. 2, 55131 Mainz mailto:kraemerf at de.ibm.com voice: +49171-3043699 IBM Germany From S.J.Thompson at bham.ac.uk Sun Jul 19 14:35:47 2015 From: S.J.Thompson at bham.ac.uk (Simon Thompson (Research Computing - IT Services)) Date: Sun, 19 Jul 2015 13:35:47 +0000 Subject: [gpfsug-discuss] Immutable fileset features In-Reply-To: References: , Message-ID: Hi Frank, Yeah id read that this.morning, which is why I was asking... I couldn't see anything about HSM in there or if its possible to delete a fileset with immutable files. I remember Scott (maybe) mentioning it at the gpfs ug meeting in York, but I thought that was immutable file systems, which you have to destroy. Simon ________________________________________ From: gpfsug-discuss-bounces at gpfsug.org [gpfsug-discuss-bounces at gpfsug.org] on behalf of Frank Kraemer [kraemerf at de.ibm.com] Sent: 19 July 2015 13:45 To: gpfsug-discuss at gpfsug.org Subject: [gpfsug-discuss] Immutable fileset features >I was wondering if anyone had looked at the immutable fileset features in 4.1.1? yes, Nils Haustein has see: https://www.ibm.com/developerworks/community/blogs/storageneers/entry/Insight_to_the_IBM_Spectrum_Scale_GPFS_Immutability_function Frank Kraemer IBM Consulting IT Specialist / Client Technical Architect Hechtsheimer Str. 2, 55131 Mainz mailto:kraemerf at de.ibm.com voice: +49171-3043699 IBM Germany _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss From S.J.Thompson at bham.ac.uk Sun Jul 19 21:09:26 2015 From: S.J.Thompson at bham.ac.uk (Simon Thompson (Research Computing - IT Services)) Date: Sun, 19 Jul 2015 20:09:26 +0000 Subject: [gpfsug-discuss] Immutable fileset features In-Reply-To: References: Message-ID: On 19/07/2015 13:45, "Frank Kraemer" wrote: >>I was wondering if anyone had looked at the immutable fileset features in >4.1.1? > >yes, Nils Haustein has see: > >https://www.ibm.com/developerworks/community/blogs/storageneers/entry/Insi >ght_to_the_IBM_Spectrum_Scale_GPFS_Immutability_function I was re-reading some of this blog post and am a little confused. It talks about setting retention times by setting the ATIME from touch, or by using -E to mmchattr. Does that mean if a file is accessed, then the ATIME is updated and so the retention period is changed? What if our retention policy is based on last access time of file +period of time. I was thinking it would be useful to do a policy scan to find newly access files and then set the retention (either directly by policy if possible? Or by passing the file list to a script). Would this work or if the ATIME is overloaded, then I guess we can?t use this? Finally, is this a feature that is supported by IBM? The -E flag for mmchattr is neither in the man page nor the online docs at: http://www-01.ibm.com/support/knowledgecenter/#!/STXKQY_4.1.1/com.ibm.spect rum.scale.v4r11.adm.doc/bl1adm_mmchattr.htm (My possibly incorrect understanding was that if its documented, then is supported, otherwise it might work)? Simon From jamiedavis at us.ibm.com Mon Jul 20 13:26:17 2015 From: jamiedavis at us.ibm.com (James Davis) Date: Mon, 20 Jul 2015 08:26:17 -0400 Subject: [gpfsug-discuss] Immutable fileset features In-Reply-To: References: Message-ID: <201507200027.t6K0RD8b003417@d01av02.pok.ibm.com> Simon, I spoke to a tester who worked on this line item. She thinks mmchattr -E should have been documented. We will follow up. If it was an oversight it should be corrected soon. Jamie Jamie Davis GPFS Functional Verification Test (FVT) jamiedavis at us.ibm.com From: "Simon Thompson (Research Computing - IT Services)" To: gpfsug main discussion list Date: 19-07-15 04:09 PM Subject: Re: [gpfsug-discuss] Immutable fileset features Sent by: gpfsug-discuss-bounces at gpfsug.org On 19/07/2015 13:45, "Frank Kraemer" wrote: >>I was wondering if anyone had looked at the immutable fileset features in >4.1.1? > >yes, Nils Haustein has see: > >https://www.ibm.com/developerworks/community/blogs/storageneers/entry/Insi >ght_to_the_IBM_Spectrum_Scale_GPFS_Immutability_function I was re-reading some of this blog post and am a little confused. It talks about setting retention times by setting the ATIME from touch, or by using -E to mmchattr. Does that mean if a file is accessed, then the ATIME is updated and so the retention period is changed? What if our retention policy is based on last access time of file +period of time. I was thinking it would be useful to do a policy scan to find newly access files and then set the retention (either directly by policy if possible? Or by passing the file list to a script). Would this work or if the ATIME is overloaded, then I guess we can?t use this? Finally, is this a feature that is supported by IBM? The -E flag for mmchattr is neither in the man page nor the online docs at: http://www-01.ibm.com/support/knowledgecenter/#!/STXKQY_4.1.1/com.ibm.spect rum.scale.v4r11.adm.doc/bl1adm_mmchattr.htm (My possibly incorrect understanding was that if its documented, then is supported, otherwise it might work)? Simon _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: From Luke.Raimbach at crick.ac.uk Mon Jul 20 08:02:01 2015 From: Luke.Raimbach at crick.ac.uk (Luke Raimbach) Date: Mon, 20 Jul 2015 07:02:01 +0000 Subject: [gpfsug-discuss] 4.1.1 immutable filesets In-Reply-To: References: Message-ID: Can I add to this list of questions? Apparently, one cannot set immutable, or append-only attributes on files / directories within an AFM cache. However, if I have an independent writer and set immutability at home, what does the AFM IW cache do about this? Or does this restriction just apply to entire filesets (which would make more sense)? Cheers, Luke. -----Original Message----- From: gpfsug-discuss-bounces at gpfsug.org [mailto:gpfsug-discuss-bounces at gpfsug.org] On Behalf Of Simon Thompson (Research Computing - IT Services) Sent: 19 July 2015 11:45 To: gpfsug main discussion list Subject: [gpfsug-discuss] 4.1.1 immutable filesets I was wondering if anyone had looked at the immutable fileset features in 4.1.1? In particular I was looking at the iam compliant mode, but I've a couple of questions. * if I have an iam compliant fileset, and it contains immutable files or directories, can I still unlink and delete the filset? * will HSM work with immutable files? I.e. Can I migrate files to tape and restore them? The docs mention that extended attributes can be updated internally by dmapi, so I guess HSM might work? Thanks Simon _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss The Francis Crick Institute Limited is a registered charity in England and Wales no. 1140062 and a company registered in England and Wales no. 06885462, with its registered office at 215 Euston Road, London NW1 2BE. From kallbac at iu.edu Wed Jul 22 11:50:58 2015 From: kallbac at iu.edu (Kristy Kallback-Rose) Date: Wed, 22 Jul 2015 06:50:58 -0400 Subject: [gpfsug-discuss] SMB support and config In-Reply-To: References: <0193637F-4448-4068-A049-3A40A9AFD998@iu.edu> Message-ID: <203758A6-C7E0-4D3F-BA31-A130CF92DCBC@iu.edu> Yes interested, please post. We?ll probably keep running Samba separately, as we do today, for quite some time, but will be facing this transition at some point so we can be supported by IBM for Samba. On Jul 10, 2015, at 8:06 AM, Simon Thompson (Research Computing - IT Services) wrote: > So IBM came back and said what I was doing wasn?t supported. > > They did say that you can use ?user defined? authentication. Which I?ve > got working now on my environment (figured what I was doing wrong, and you > can?t use mmsmb to do some of the bits I need for it to work for user > defined mode for me...). But I still think it needs a patch to one of the > files for CES for use in user defined authentication. (Right now it > appears to remove all my ?user defined? settings from nsswitch.conf when > you stop CES/GPFS on a node). I?ve supplied my patch to IBM which works > for my case, we?ll see what they do about it? > > (If people are interested, I?ll gather my notes into a blog post). > > Simon > > On 06/07/2015 23:06, "Kallback-Rose, Kristy A" wrote: > >> Just to chime in as another interested party, we do something fairly >> similar but use sssd instead of nslcd. Very interested to see how >> accommodating the IBM Samba is to local configuration needs. >> >> Best, >> Kristy >> >> On Jul 6, 2015, at 6:09 AM, Simon Thompson (Research Computing - IT >> Services) wrote: >> >>> Hi, >>> >>> (sorry, lots of questions about this stuff at the moment!) >>> >>> I?m currently looking at removing the sernet smb configs we had >>> previously >>> and moving to IBM SMB. I?ve removed all the old packages and only now >>> have >>> gpfs.smb installed on the systems. >>> >>> I?m struggling to get the config tools to work for our environment. >>> >>> We have MS Windows AD Domain for authentication. For various reasons, >>> however doesn?t hold the UIDs/GIDs, which are instead held in a >>> different >>> LDAP directory. >>> >>> In the past, we?d configure the Linux servers running Samba so that >>> NSLCD >>> was configured to get details from the LDAP server. (e.g. getent passwd >>> would return the data for an AD user). The Linux boxes would also be >>> configured to use KRB5 authentication where users were allowed to ssh >>> etc >>> in for password authentication. >>> >>> So as far as Samba was concerned, it would do ?security = ADS? and then >>> we?d also have "idmap config * : backend = tdb2? >>> >>> I.e. Use Domain for authentication, but look locally for ID mapping >>> data. >>> >>> Now I can configured IBM SMB to use ADS for authentication: >>> >>> mmuserauth service create --type ad --data-access-method file >>> --netbios-name its-rds --user-name ADMINUSER --servers DOMAIN.ADF >>> --idmap-role subordinate >>> >>> >>> However I can?t see anyway for me to manipulate the config so that it >>> doesn?t use autorid. Using this we end up with: >>> >>> mmsmb config list | grep -i idmap >>> idmap config * : backend autorid >>> idmap config * : range 10000000-299999999 >>> idmap config * : rangesize 1000000 >>> idmap config * : read only yes >>> idmap:cache no >>> >>> >>> It also adds: >>> >>> mmsmb config list | grep -i auth >>> auth methods guest sam winbind >>> >>> (though I don?t think that is a problem). >>> >>> >>> I also can?t change the idmap using the mmsmb command (I think would >>> look >>> like this): >>> # mmsmb config change --option="idmap config * : backend=tdb2" >>> idmap config * : backend=tdb2: [E] Unsupported smb option. More >>> information about smb options is availabe in the man page. >>> >>> >>> >>> I can?t see anything in the docs at: >>> >>> http://www-01.ibm.com/support/knowledgecenter/#!/STXKQY_4.1.1/com.ibm.spe >>> ct >>> rum.scale.v4r11.adm.doc/bl1adm_configfileauthentication.htm >>> >>> That give me a clue how to do what I want. >>> >>> I?d be happy to do some mixture of AD for authentication and LDAP for >>> lookups (rather than just falling back to ?local? from nslcd), but I >>> can?t >>> see a way to do this, and ?manual? seems to stop ADS authentication in >>> Samba. >>> >>> Anyone got any suggestions? >>> >>> >>> Thanks >>> >>> Simon >>> >>> >>> _______________________________________________ >>> gpfsug-discuss mailing list >>> gpfsug-discuss at gpfsug.org >>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >> >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at gpfsug.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at gpfsug.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 495 bytes Desc: Message signed with OpenPGP using GPGMail URL: From S.J.Thompson at bham.ac.uk Wed Jul 22 11:59:56 2015 From: S.J.Thompson at bham.ac.uk (Simon Thompson (Research Computing - IT Services)) Date: Wed, 22 Jul 2015 10:59:56 +0000 Subject: [gpfsug-discuss] SMB support and config In-Reply-To: <203758A6-C7E0-4D3F-BA31-A130CF92DCBC@iu.edu> References: <0193637F-4448-4068-A049-3A40A9AFD998@iu.edu> , <203758A6-C7E0-4D3F-BA31-A130CF92DCBC@iu.edu> Message-ID: Hi Kristy, Funny you should ask, I wrote it up last night... http://www.roamingzebra.co.uk/2015/07/smb-protocol-support-with-spectrum.html They did tell me it was all tested with Samba 4, so should work, subject to you checking your own smb config options. But i like not having to build it myself now ;) The move was actually pretty easy and in theory you can run mixed over existing nodes and upgraded protocol nodes, but you might need a different clustered name. Simon ________________________________________ From: gpfsug-discuss-bounces at gpfsug.org [gpfsug-discuss-bounces at gpfsug.org] on behalf of Kristy Kallback-Rose [kallbac at iu.edu] Sent: 22 July 2015 11:50 To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] SMB support and config Yes interested, please post. We?ll probably keep running Samba separately, as we do today, for quite some time, but will be facing this transition at some point so we can be supported by IBM for Samba. On Jul 10, 2015, at 8:06 AM, Simon Thompson (Research Computing - IT Services) wrote: > So IBM came back and said what I was doing wasn?t supported. > > They did say that you can use ?user defined? authentication. Which I?ve > got working now on my environment (figured what I was doing wrong, and you > can?t use mmsmb to do some of the bits I need for it to work for user > defined mode for me...). But I still think it needs a patch to one of the > files for CES for use in user defined authentication. (Right now it > appears to remove all my ?user defined? settings from nsswitch.conf when > you stop CES/GPFS on a node). I?ve supplied my patch to IBM which works > for my case, we?ll see what they do about it? > > (If people are interested, I?ll gather my notes into a blog post). > > Simon > > On 06/07/2015 23:06, "Kallback-Rose, Kristy A" wrote: > >> Just to chime in as another interested party, we do something fairly >> similar but use sssd instead of nslcd. Very interested to see how >> accommodating the IBM Samba is to local configuration needs. >> >> Best, >> Kristy >> >> On Jul 6, 2015, at 6:09 AM, Simon Thompson (Research Computing - IT >> Services) wrote: >> >>> Hi, >>> >>> (sorry, lots of questions about this stuff at the moment!) >>> >>> I?m currently looking at removing the sernet smb configs we had >>> previously >>> and moving to IBM SMB. I?ve removed all the old packages and only now >>> have >>> gpfs.smb installed on the systems. >>> >>> I?m struggling to get the config tools to work for our environment. >>> >>> We have MS Windows AD Domain for authentication. For various reasons, >>> however doesn?t hold the UIDs/GIDs, which are instead held in a >>> different >>> LDAP directory. >>> >>> In the past, we?d configure the Linux servers running Samba so that >>> NSLCD >>> was configured to get details from the LDAP server. (e.g. getent passwd >>> would return the data for an AD user). The Linux boxes would also be >>> configured to use KRB5 authentication where users were allowed to ssh >>> etc >>> in for password authentication. >>> >>> So as far as Samba was concerned, it would do ?security = ADS? and then >>> we?d also have "idmap config * : backend = tdb2? >>> >>> I.e. Use Domain for authentication, but look locally for ID mapping >>> data. >>> >>> Now I can configured IBM SMB to use ADS for authentication: >>> >>> mmuserauth service create --type ad --data-access-method file >>> --netbios-name its-rds --user-name ADMINUSER --servers DOMAIN.ADF >>> --idmap-role subordinate >>> >>> >>> However I can?t see anyway for me to manipulate the config so that it >>> doesn?t use autorid. Using this we end up with: >>> >>> mmsmb config list | grep -i idmap >>> idmap config * : backend autorid >>> idmap config * : range 10000000-299999999 >>> idmap config * : rangesize 1000000 >>> idmap config * : read only yes >>> idmap:cache no >>> >>> >>> It also adds: >>> >>> mmsmb config list | grep -i auth >>> auth methods guest sam winbind >>> >>> (though I don?t think that is a problem). >>> >>> >>> I also can?t change the idmap using the mmsmb command (I think would >>> look >>> like this): >>> # mmsmb config change --option="idmap config * : backend=tdb2" >>> idmap config * : backend=tdb2: [E] Unsupported smb option. More >>> information about smb options is availabe in the man page. >>> >>> >>> >>> I can?t see anything in the docs at: >>> >>> http://www-01.ibm.com/support/knowledgecenter/#!/STXKQY_4.1.1/com.ibm.spe >>> ct >>> rum.scale.v4r11.adm.doc/bl1adm_configfileauthentication.htm >>> >>> That give me a clue how to do what I want. >>> >>> I?d be happy to do some mixture of AD for authentication and LDAP for >>> lookups (rather than just falling back to ?local? from nslcd), but I >>> can?t >>> see a way to do this, and ?manual? seems to stop ADS authentication in >>> Samba. >>> >>> Anyone got any suggestions? >>> >>> >>> Thanks >>> >>> Simon >>> >>> >>> _______________________________________________ >>> gpfsug-discuss mailing list >>> gpfsug-discuss at gpfsug.org >>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >> >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at gpfsug.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at gpfsug.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss From mhabib73 at gmail.com Wed Jul 22 13:58:51 2015 From: mhabib73 at gmail.com (Muhammad Habib) Date: Wed, 22 Jul 2015 08:58:51 -0400 Subject: [gpfsug-discuss] data interface and management infercace. In-Reply-To: <55A625BE.9000809@ebi.ac.uk> References: <559F9960.7010509@ebi.ac.uk> <55A3AF96.3060303@ebi.ac.uk> <5269A4E9-416B-4D70-AAE0-B86042FC96B9@ddn.com> <55A3BD4E.3000205@ebi.ac.uk> <55A625BE.9000809@ebi.ac.uk> Message-ID: did you implement it ? looks ok. All daemon traffic should be going through black network including inter-cluster daemon traffic ( assume black subnet routable). All data traffic should be going through the blue network. You may need to run iptrace or tcpdump to make sure proper network are in use. You can always open a PMR if you having issue during the configuration . Thanks On Wed, Jul 15, 2015 at 5:19 AM, Salvatore Di Nardo wrote: > Thanks for the input.. this is actually very interesting! > > Reading here: > https://www.ibm.com/developerworks/community/wikis/home?lang=en#!/wiki/General+Parallel+File+System+%28GPFS%29/page/GPFS+Network+Communication+Overview > , > specifically the " Using more than one network" part it seems to me that > this way we should be able to split the lease/token/ping from the data. > > Supposing that I implement a GSS cluster with only NDS and a second > cluster with only clients: > > > > As far i understood if on the NDS cluster add first the subnet > 10.20.0.0/16 and then 10.30.0.0 is should use the internal network for > all the node-to-node comunication, leaving the 10.30.0.0/30 only for data > traffic witht he remote cluster ( the clients). Similarly, in the client > cluster, adding first 10.10.0.0/16 and then 10.30.0.0, will guarantee > than the node-to-node comunication pass trough a different interface there > the data is passing. Since the client are just "clients" the traffic trough > 10.10.0.0/16 should be minimal (only token ,lease, ping and so on ) and > not affected by the rest. Should be possible at this point move aldo the > "admin network" on the internal interface, so we effectively splitted all > the "non data" traffic on a dedicated interface. > > I'm wondering if I'm missing something, and in case i didn't, what could > be the real traffic in the internal (black) networks ( 1g link its fine or > i still need 10g for that). Another thing I I'm wondering its the load of > the "non data" traffic between the clusters.. i suppose some "daemon > traffic" goes trough the blue interface for the inter-cluster > communication. > > > Any thoughts ? > > Salvatore > > On 13/07/15 18:19, Muhammad Habib wrote: > > Did you look at "subnets" parameter used with "mmchconfig" command. I > think you can use order list of subnets for daemon communication and then > actual daemon interface can be used for data transfer. When the GPFS will > start it will use actual daemon interface for communication , however , > once its started , it will use the IPs from the subnet list whichever > coming first in the list. To further validate , you can put network > sniffer before you do actual implementation or alternatively you can open a > PMR with IBM. > > If your cluster having expel situation , you may fine tune your cluster > e.g. increase ping timeout period , having multiple NSD servers and > distributing filesystems across these NSD servers. Also critical servers > can have HBA cards installed for direct I/O through fiber. > > Thanks > > On Mon, Jul 13, 2015 at 11:22 AM, Jason Hick wrote: > >> Hi, >> >> Yes having separate data and management networks has been critical for >> us for keeping health monitoring/communication unimpeded by data movement. >> >> Not as important, but you can also tune the networks differently >> (packet sizes, buffer sizes, SAK, etc) which can help. >> >> Jason >> >> On Jul 13, 2015, at 7:25 AM, Vic Cornell wrote: >> >> Hi Salvatore, >> >> I agree that that is what the manual - and some of the wiki entries say. >> >> However , when we have had problems (typically congestion) with >> ethernet networks in the past (20GbE or 40GbE) we have resolved them by >> setting up a separate ?Admin? network. >> >> The before and after cluster health we have seen measured in number of >> expels and waiters has been very marked. >> >> Maybe someone ?in the know? could comment on this split. >> >> Regards, >> >> Vic >> >> >> On 13 Jul 2015, at 14:29, Salvatore Di Nardo wrote: >> >> Hello Vic. >> We are currently draining our gpfs to do all the recabling to add a >> management network, but looking what the admin interface does ( man >> mmchnode ) it says something different: >> >> --admin-interface={hostname | ip_address} >> Specifies the name of the node to be used by >> GPFS administration commands when communicating between nodes. The admin >> node name must be specified as an IP >> address or a hostname that is resolved by the >> host command to the desired IP address. If the keyword DEFAULT is >> specified, the admin interface for the >> node is set to be equal to the daemon interface >> for the node. >> >> >> So, seems used only for commands propagation, hence have nothing to do >> with the node-to-node traffic. Infact the other interface description is: >> >> --daemon-interface={hostname | ip_address} >> Specifies the host name or IP address *to be >> used by the GPFS daemons for node-to-node communication*. The host name >> or IP address must refer to the commu- >> nication adapter over which the GPFS daemons >> communicate. Alias interfaces are not allowed. Use the original address or >> a name that is resolved by the >> host command to that original address. >> >> >> The "expired lease" issue and file locking mechanism a( most of our >> expells happens when 2 clients try to write in the same file) are exactly >> node-to node-comunication, so im wondering what's the point to separate >> the "admin network". I want to be sure to plan the right changes before we >> do a so massive task. We are talking about adding a new interface on 700 >> clients, so the recabling work its not small. >> >> >> Regards, >> Salvatore >> >> >> >> On 13/07/15 14:00, Vic Cornell wrote: >> >> Hi Salavatore, >> >> Does your GSS have the facility for a 1GbE ?management? network? If so >> I think that changing the ?admin? node names of the cluster members to a >> set of IPs on the management network would give you the split that you need. >> >> What about the clients? Can they also connect to a separate admin >> network? >> >> Remember that if you are using multi-cluster all of the nodes in both >> networks must share the same admin network. >> >> Kind Regards, >> >> Vic >> >> >> On 13 Jul 2015, at 13:31, Salvatore Di Nardo wrote: >> >> Anyone? >> >> On 10/07/15 11:07, Salvatore Di Nardo wrote: >> >> Hello guys. >> Quite a while ago i mentioned that we have a big expel issue on our gss >> ( first gen) and white a lot people suggested that the root cause could be >> that we use the same interface for all the traffic, and that we should >> split the data network from the admin network. Finally we could plan a >> downtime and we are migrating the data out so, i can soon safelly play with >> the change, but looking what exactly i should to do i'm a bit puzzled. Our >> mmlscluster looks like this: >> >> GPFS cluster information >> ======================== >> GPFS cluster name: GSS.ebi.ac.uk >> GPFS cluster id: 17987981184946329605 >> GPFS UID domain: GSS.ebi.ac.uk >> Remote shell command: /usr/bin/ssh >> Remote file copy command: /usr/bin/scp >> >> GPFS cluster configuration servers: >> ----------------------------------- >> Primary server: gss01a.ebi.ac.uk >> Secondary server: gss02b.ebi.ac.uk >> >> Node Daemon node name IP address Admin node name Designation >> ----------------------------------------------------------------------- >> 1 gss01a.ebi.ac.uk 10.7.28.2 gss01a.ebi.ac.uk quorum-manager >> 2 gss01b.ebi.ac.uk 10.7.28.3 gss01b.ebi.ac.uk quorum-manager >> 3 gss02a.ebi.ac.uk 10.7.28.67 gss02a.ebi.ac.uk quorum-manager >> 4 gss02b.ebi.ac.uk 10.7.28.66 gss02b.ebi.ac.uk quorum-manager >> 5 gss03a.ebi.ac.uk 10.7.28.34 gss03a.ebi.ac.uk quorum-manager >> 6 gss03b.ebi.ac.uk 10.7.28.35 gss03b.ebi.ac.uk quorum-manager >> >> >> It was my understanding that the "admin node" should use a different >> interface ( a 1g link copper should be fine), while the daemon node is >> where the data was passing , so should point to the bonded 10g interfaces. >> but when i read the mmchnode man page i start to be quite confused. It says: >> >> --daemon-interface={hostname | ip_address} >> Specifies the host name or IP address *to be >> used by the GPFS daemons for node-to-node communication*. The host name >> or IP address must refer to the communication adapter over which the GPFS >> daemons communicate. >> Alias interfaces are not allowed. Use the >> original address or a name that is resolved by the host command to that >> original address. >> >> --admin-interface={hostname | ip_address} >> Specifies the name of the node to be used by >> GPFS administration commands when communicating between nodes. The admin >> node name must be specified as an IP address or a hostname that is resolved >> by the host command >> to the desired IP address. If the keyword >> DEFAULT is specified, the admin interface for the node is set to be equal >> to the daemon interface for the node. >> >> What exactly means "node-to node-communications" ? >> Means DATA or also the "lease renew", and the token communication between >> the clients to get/steal the locks to be able to manage concurrent write to >> thr same file? >> Since we are getting expells ( especially when several clients contends >> the same file ) i assumed i have to split this type of packages from the >> data stream, but reading the documentation it looks to me that those >> internal comunication between nodes use the daemon-interface wich i suppose >> are used also for the data. so HOW exactly i can split them? >> >> >> Thanks in advance, >> Salvatore >> >> >> >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at gpfsug.orghttp://gpfsug.org/mailman/listinfo/gpfsug-discuss >> >> >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at gpfsug.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >> >> >> >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at gpfsug.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >> >> >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at gpfsug.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >> >> >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at gpfsug.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >> >> > > > -- > This communication contains confidential information intended only for the > persons to whom it is addressed. Any other distribution, copying or > disclosure is strictly prohibited. If you have received this communication > in error, please notify the sender and delete this e-mail message > immediately. > > Le pr?sent message contient des renseignements de nature confidentielle > r?serv?s uniquement ? l'usage du destinataire. Toute diffusion, > distribution, divulgation, utilisation ou reproduction de la pr?sente > communication, et de tout fichier qui y est joint, est strictement > interdite. Si vous avez re?u le pr?sent message ?lectronique par erreur, > veuillez informer imm?diatement l'exp?diteur et supprimer le message de > votre ordinateur et de votre serveur. > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at gpfsug.orghttp://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at gpfsug.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > -- This communication contains confidential information intended only for the persons to whom it is addressed. Any other distribution, copying or disclosure is strictly prohibited. If you have received this communication in error, please notify the sender and delete this e-mail message immediately. Le pr?sent message contient des renseignements de nature confidentielle r?serv?s uniquement ? l'usage du destinataire. Toute diffusion, distribution, divulgation, utilisation ou reproduction de la pr?sente communication, et de tout fichier qui y est joint, est strictement interdite. Si vous avez re?u le pr?sent message ?lectronique par erreur, veuillez informer imm?diatement l'exp?diteur et supprimer le message de votre ordinateur et de votre serveur. -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: gpfs.jpg Type: image/jpeg Size: 28904 bytes Desc: not available URL: From sdinardo at ebi.ac.uk Wed Jul 22 14:51:04 2015 From: sdinardo at ebi.ac.uk (Salvatore Di Nardo) Date: Wed, 22 Jul 2015 14:51:04 +0100 Subject: [gpfsug-discuss] data interface and management infercace. In-Reply-To: References: <559F9960.7010509@ebi.ac.uk> <55A3AF96.3060303@ebi.ac.uk> <5269A4E9-416B-4D70-AAE0-B86042FC96B9@ddn.com> <55A3BD4E.3000205@ebi.ac.uk> <55A625BE.9000809@ebi.ac.uk> Message-ID: <55AF9FC8.6050107@ebi.ac.uk> Hello, no, still didn't anything because we have to drain 2PB data , into a slower storage.. so it will take few weeks. I expect doing it the second half of August. Will let you all know the results once done and properly tested. Salvatore On 22/07/15 13:58, Muhammad Habib wrote: > did you implement it ? looks ok. All daemon traffic should be going > through black network including inter-cluster daemon traffic ( assume > black subnet routable). All data traffic should be going through the > blue network. You may need to run iptrace or tcpdump to make sure > proper network are in use. You can always open a PMR if you having > issue during the configuration . > > Thanks > > On Wed, Jul 15, 2015 at 5:19 AM, Salvatore Di Nardo > > wrote: > > Thanks for the input.. this is actually very interesting! > > Reading here: > https://www.ibm.com/developerworks/community/wikis/home?lang=en#!/wiki/General+Parallel+File+System+%28GPFS%29/page/GPFS+Network+Communication+Overview > > , > specifically the " Using more than one network" part it seems to > me that this way we should be able to split the lease/token/ping > from the data. > > Supposing that I implement a GSS cluster with only NDS and a > second cluster with only clients: > > > > As far i understood if on the NDS cluster add first the subnet > 10.20.0.0/16 and then 10.30.0.0 is should > use the internal network for all the node-to-node comunication, > leaving the 10.30.0.0/30 only for data > traffic witht he remote cluster ( the clients). Similarly, in the > client cluster, adding first 10.10.0.0/16 > and then 10.30.0.0, will guarantee than the node-to-node > comunication pass trough a different interface there the data is > passing. Since the client are just "clients" the traffic trough > 10.10.0.0/16 should be minimal (only token > ,lease, ping and so on ) and not affected by the rest. Should be > possible at this point move aldo the "admin network" on the > internal interface, so we effectively splitted all the "non data" > traffic on a dedicated interface. > > I'm wondering if I'm missing something, and in case i didn't, what > could be the real traffic in the internal (black) networks ( 1g > link its fine or i still need 10g for that). Another thing I I'm > wondering its the load of the "non data" traffic between the > clusters.. i suppose some "daemon traffic" goes trough the blue > interface for the inter-cluster communication. > > > Any thoughts ? > > Salvatore > > On 13/07/15 18:19, Muhammad Habib wrote: >> Did you look at "subnets" parameter used with "mmchconfig" >> command. I think you can use order list of subnets for daemon >> communication and then actual daemon interface can be used for >> data transfer. When the GPFS will start it will use actual >> daemon interface for communication , however , once its started , >> it will use the IPs from the subnet list whichever coming first >> in the list. To further validate , you can put network sniffer >> before you do actual implementation or alternatively you can open >> a PMR with IBM. >> >> If your cluster having expel situation , you may fine tune your >> cluster e.g. increase ping timeout period , having multiple NSD >> servers and distributing filesystems across these NSD servers. >> Also critical servers can have HBA cards installed for direct I/O >> through fiber. >> >> Thanks >> >> On Mon, Jul 13, 2015 at 11:22 AM, Jason Hick > > wrote: >> >> Hi, >> >> Yes having separate data and management networks has been >> critical for us for keeping health monitoring/communication >> unimpeded by data movement. >> >> Not as important, but you can also tune the networks >> differently (packet sizes, buffer sizes, SAK, etc) which can >> help. >> >> Jason >> >> On Jul 13, 2015, at 7:25 AM, Vic Cornell >> > wrote: >> >>> Hi Salvatore, >>> >>> I agree that that is what the manual - and some of the wiki >>> entries say. >>> >>> However , when we have had problems (typically congestion) >>> with ethernet networks in the past (20GbE or 40GbE) we have >>> resolved them by setting up a separate ?Admin? network. >>> >>> The before and after cluster health we have seen measured in >>> number of expels and waiters has been very marked. >>> >>> Maybe someone ?in the know? could comment on this split. >>> >>> Regards, >>> >>> Vic >>> >>> >>>> On 13 Jul 2015, at 14:29, Salvatore Di Nardo >>>> > wrote: >>>> >>>> Hello Vic. >>>> We are currently draining our gpfs to do all the recabling >>>> to add a management network, but looking what the admin >>>> interface does ( man mmchnode ) it says something different: >>>> >>>> --admin-interface={hostname | ip_address} >>>> Specifies the name of the node to be used by GPFS >>>> administration commands when communicating between >>>> nodes. The admin node name must be specified as an IP >>>> address or a hostname that is resolved by the host >>>> command to the desired IP address. If the keyword >>>> DEFAULT is specified, the admin interface for the >>>> node is set to be equal to the daemon interface for >>>> the node. >>>> >>>> >>>> So, seems used only for commands propagation, hence have >>>> nothing to do with the node-to-node traffic. Infact the >>>> other interface description is: >>>> >>>> --daemon-interface={hostname | ip_address} >>>> Specifies the host name or IP address _*to be used >>>> by the GPFS daemons for node-to-node >>>> communication*_. The host name or IP address must >>>> refer to the commu- >>>> nication adapter over which the GPFS daemons >>>> communicate. Alias interfaces are not allowed. Use >>>> the original address or a name that is resolved >>>> by the >>>> host command to that original address. >>>> >>>> >>>> The "expired lease" issue and file locking mechanism a( >>>> most of our expells happens when 2 clients try to write in >>>> the same file) are exactly node-to node-comunication, so >>>> im wondering what's the point to separate the "admin >>>> network". I want to be sure to plan the right changes >>>> before we do a so massive task. We are talking about adding >>>> a new interface on 700 clients, so the recabling work its >>>> not small. >>>> >>>> >>>> Regards, >>>> Salvatore >>>> >>>> >>>> >>>> On 13/07/15 14:00, Vic Cornell wrote: >>>>> Hi Salavatore, >>>>> >>>>> Does your GSS have the facility for a 1GbE ?management? >>>>> network? If so I think that changing the ?admin? node >>>>> names of the cluster members to a set of IPs on the >>>>> management network would give you the split that you need. >>>>> >>>>> What about the clients? Can they also connect to a >>>>> separate admin network? >>>>> >>>>> Remember that if you are using multi-cluster all of the >>>>> nodes in both networks must share the same admin network. >>>>> >>>>> Kind Regards, >>>>> >>>>> Vic >>>>> >>>>> >>>>>> On 13 Jul 2015, at 13:31, Salvatore Di Nardo >>>>>> > wrote: >>>>>> >>>>>> Anyone? >>>>>> >>>>>> On 10/07/15 11:07, Salvatore Di Nardo wrote: >>>>>>> Hello guys. >>>>>>> Quite a while ago i mentioned that we have a big expel >>>>>>> issue on our gss ( first gen) and white a lot people >>>>>>> suggested that the root cause could be that we use the >>>>>>> same interface for all the traffic, and that we should >>>>>>> split the data network from the admin network. Finally >>>>>>> we could plan a downtime and we are migrating the data >>>>>>> out so, i can soon safelly play with the change, but >>>>>>> looking what exactly i should to do i'm a bit puzzled. >>>>>>> Our mmlscluster looks like this: >>>>>>> >>>>>>> GPFS cluster information >>>>>>> ======================== >>>>>>> GPFS cluster name: GSS.ebi.ac.uk >>>>>>> >>>>>>> GPFS cluster id: 17987981184946329605 >>>>>>> GPFS UID domain: GSS.ebi.ac.uk >>>>>>> >>>>>>> Remote shell command: /usr/bin/ssh >>>>>>> Remote file copy command: /usr/bin/scp >>>>>>> >>>>>>> GPFS cluster configuration servers: >>>>>>> ----------------------------------- >>>>>>> Primary server: gss01a.ebi.ac.uk >>>>>>> >>>>>>> Secondary server: gss02b.ebi.ac.uk >>>>>>> >>>>>>> >>>>>>> Node Daemon node name IP address Admin >>>>>>> node name Designation >>>>>>> ----------------------------------------------------------------------- >>>>>>> 1 gss01a.ebi.ac.uk >>>>>>> 10.7.28.2 >>>>>>> gss01a.ebi.ac.uk >>>>>>> quorum-manager >>>>>>> 2 gss01b.ebi.ac.uk >>>>>>> 10.7.28.3 >>>>>>> gss01b.ebi.ac.uk >>>>>>> quorum-manager >>>>>>> 3 gss02a.ebi.ac.uk >>>>>>> 10.7.28.67 >>>>>>> gss02a.ebi.ac.uk >>>>>>> quorum-manager >>>>>>> 4 gss02b.ebi.ac.uk >>>>>>> 10.7.28.66 >>>>>>> gss02b.ebi.ac.uk >>>>>>> quorum-manager >>>>>>> 5 gss03a.ebi.ac.uk >>>>>>> 10.7.28.34 >>>>>>> gss03a.ebi.ac.uk >>>>>>> quorum-manager >>>>>>> 6 gss03b.ebi.ac.uk >>>>>>> 10.7.28.35 >>>>>>> gss03b.ebi.ac.uk >>>>>>> quorum-manager >>>>>>> >>>>>>> >>>>>>> It was my understanding that the "admin node" should use >>>>>>> a different interface ( a 1g link copper should be >>>>>>> fine), while the daemon node is where the data was >>>>>>> passing , so should point to the bonded 10g interfaces. >>>>>>> but when i read the mmchnode man page i start to be >>>>>>> quite confused. It says: >>>>>>> >>>>>>> --daemon-interface={hostname | ip_address} >>>>>>> Specifies the host name or IP address _*to be used by >>>>>>> the GPFS daemons for node-to-node communication*_. The >>>>>>> host name or IP address must refer to the communication >>>>>>> adapter over which the GPFS daemons communicate. >>>>>>> Alias interfaces are not allowed. Use the >>>>>>> original address or a name that is resolved by the host >>>>>>> command to that original address. >>>>>>> >>>>>>> --admin-interface={hostname | ip_address} >>>>>>> Specifies the name of the node to be used by GPFS >>>>>>> administration commands when communicating between >>>>>>> nodes. The admin node name must be specified as an IP >>>>>>> address or a hostname that is resolved by the host command >>>>>>> tothe desired IP address. If the keyword >>>>>>> DEFAULT is specified, the admin interface for the node >>>>>>> is set to be equal to the daemon interface for the node. >>>>>>> >>>>>>> What exactly means "node-to node-communications" ? >>>>>>> Means DATA or also the "lease renew", and the token >>>>>>> communication between the clients to get/steal the locks >>>>>>> to be able to manage concurrent write to thr same file? >>>>>>> Since we are getting expells ( especially when several >>>>>>> clients contends the same file ) i assumed i have to >>>>>>> split this type of packages from the data stream, but >>>>>>> reading the documentation it looks to me that those >>>>>>> internal comunication between nodes use the >>>>>>> daemon-interface wich i suppose are used also for the >>>>>>> data. so HOW exactly i can split them? >>>>>>> >>>>>>> >>>>>>> Thanks in advance, >>>>>>> Salvatore >>>>>>> >>>>>>> >>>>>>> >>>>>>> _______________________________________________ >>>>>>> gpfsug-discuss mailing list >>>>>>> gpfsug-discuss atgpfsug.org >>>>>>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >>>>>> >>>>>> _______________________________________________ >>>>>> gpfsug-discuss mailing list >>>>>> gpfsug-discuss at gpfsug.org >>>>>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >>>>> >>>> >>>> _______________________________________________ >>>> gpfsug-discuss mailing list >>>> gpfsug-discuss at gpfsug.org >>>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >>> >>> _______________________________________________ >>> gpfsug-discuss mailing list >>> gpfsug-discuss at gpfsug.org >>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >> >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at gpfsug.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >> >> >> >> >> -- >> This communication contains confidential information intended >> only for the persons to whom it is addressed. Any other >> distribution, copying or disclosure is strictly prohibited. If >> you have received this communication in error, please notify the >> sender and delete this e-mail message immediately. >> >> Le pr?sent message contient des renseignements de nature >> confidentielle r?serv?s uniquement ? l'usage du destinataire. >> Toute diffusion, distribution, divulgation, utilisation ou >> reproduction de la pr?sente communication, et de tout fichier qui >> y est joint, est strictement interdite. Si vous avez re?u le >> pr?sent message ?lectronique par erreur, veuillez informer >> imm?diatement l'exp?diteur et supprimer le message de votre >> ordinateur et de votre serveur. >> >> >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss atgpfsug.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at gpfsug.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > > -- > This communication contains confidential information intended only for > the persons to whom it is addressed. Any other distribution, copying > or disclosure is strictly prohibited. If you have received this > communication in error, please notify the sender and delete this > e-mail message immediately. > > Le pr?sent message contient des renseignements de nature > confidentielle r?serv?s uniquement ? l'usage du destinataire. Toute > diffusion, distribution, divulgation, utilisation ou reproduction de > la pr?sente communication, et de tout fichier qui y est joint, est > strictement interdite. Si vous avez re?u le pr?sent message > ?lectronique par erreur, veuillez informer imm?diatement l'exp?diteur > et supprimer le message de votre ordinateur et de votre serveur. > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at gpfsug.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/jpeg Size: 28904 bytes Desc: not available URL: From S.J.Thompson at bham.ac.uk Mon Jul 27 22:24:11 2015 From: S.J.Thompson at bham.ac.uk (Simon Thompson (Research Computing - IT Services)) Date: Mon, 27 Jul 2015 21:24:11 +0000 Subject: [gpfsug-discuss] GPFS and Community Scientific Cloud Message-ID: Hi Ulf, Thanks for the email, as suggested, I'm copying this to the GPFS UG mailing list as well as I'm sure the discussion is of interest to others. I guess what we're looking to do is to have arbitrary VMs running provided by users (I.e. Completely untrusted), but to provide them a way to get secure access to only their data. Right now we can't give them a GPFS client as this is too trusting, I was wondering how easy it would be for us to implement something like: User has a VM User runs 'kinit user at DOMAIN' to gain kerberos ticket and can then securely gain access to only their files from my NFS server. I also mentioned Janet ASSENT, which is a relatively recent project: https://jisc.ac.uk/assent (It was piloted as Janet Moonshot). Which builds on top of SAML to provide other software access to federation. My understanding is that site-specific UID mapping is needed (e.g. On the NFS/GPFS server). Simon >I have some experience with the following questions: > >> NFS just isn?t built for security really. I guess NFSv4 with KRB5 is >> one option to look at, with user based credentials. That might just >> about be feasible if the user were do authenticate with kinit before >> being able to access NFSv4 mounted files. I.e. Its done at the user >> level rather than the instance level. That might be an interesting >> project as a feasibility study to look at, will it work? How would >> we integrate into a federated access management system (something >> like UK Federation and ABFAB/Moonshot/Assent maybe?). Could we >> provide easy steps for a user in a VM to follow? Can we even make it >> work with Ganesha in such an environment? > > >Kerberized NFSv3 and Kerberized NFSv4 provide nearly the same level of >security. Kerberos makes the difference and not the NFS version. I have >posted some background information to the GPFS forum: >http://ibm.co/1VFLUR4 > >Kerberized NFSv4 has the advantage that it allows different UID/GID ranges >on NFS server and NFS client. I have led a proof-of-concept where we have >used this feature to provide secure data access to personalized patient >data for multiple tenants where the tenants had conflicting UID/GID >ranges. >I have some material which I will share via the GPFS forum. > >UK Federation seems to be based on SAML/Shibboleth. Unfortunately there is >no easy integration of network file protocols such as NFS and SMB and >SAML/Shibboleth, because file protocols require attributes which are >typically not stored in SAML/Shibboleth. Fortunately I provided technical >guidance to a customer who exactly implemented this integration in order >to >provide secure file service to multiple universities, again with >conflicting UID/GID ranges. I need some time to write it up and publish >it. From martin.gasthuber at desy.de Tue Jul 28 17:28:44 2015 From: martin.gasthuber at desy.de (Martin Gasthuber) Date: Tue, 28 Jul 2015 18:28:44 +0200 Subject: [gpfsug-discuss] fast ACL alter solution Message-ID: Hi, since a few months we're running a new infrastructure, with the core built on GPFS (4.1.0.8), for 'light source - X-Rays' experiments local at the site. The system is used for the data acquisition chain, data analysis, data exports and archive. Right now we got new detector types (homebuilt, experimental) generating millions of small files - the last run produced ~9 million files at 64 to 128K in size ;-). In our setup, the files gets copied to a (user accessible) GPFS instance which controls the access by NFSv4 ACLs (only !) and from time to time, we had to modify these ACLs (add/remove user/group etc.). Doing a (non policy-run based) simple approach, changing 9 million files requires ~200 hours to run - which we consider not really a good option. Running mmgetacl/mmputacl whithin a policy-run will clearly speed that up - but the biggest time consuming operations are the get and put ACL ops. Is anybody aware of any faster ACL access operation (whithin the policy-run) - or even a 'mod-acl' operation ? best regards, Martin From jonathan at buzzard.me.uk Tue Jul 28 19:06:30 2015 From: jonathan at buzzard.me.uk (Jonathan Buzzard) Date: Tue, 28 Jul 2015 19:06:30 +0100 Subject: [gpfsug-discuss] fast ACL alter solution In-Reply-To: References: Message-ID: <55B7C4A6.9020205@buzzard.me.uk> On 28/07/15 17:28, Martin Gasthuber wrote: > Hi, > > since a few months we're running a new infrastructure, with the core > built on GPFS (4.1.0.8), for 'light source - X-Rays' experiments > local at the site. The system is used for the data acquisition chain, > data analysis, data exports and archive. Right now we got new > detector types (homebuilt, experimental) generating millions of small > files - the last run produced ~9 million files at 64 to 128K in size > ;-). In our setup, the files gets copied to a (user accessible) GPFS > instance which controls the access by NFSv4 ACLs (only !) and from > time to time, we had to modify these ACLs (add/remove user/group > etc.). Doing a (non policy-run based) simple approach, changing 9 > million files requires ~200 hours to run - which we consider not > really a good option. Running mmgetacl/mmputacl whithin a policy-run > will clearly speed that up - but the biggest time consuming > operations are the get and put ACL ops. Is anybody aware of any > faster ACL access operation (whithin the policy-run) - or even a > 'mod-acl' operation ? > In the past IBM have said that their expectations are that the ACL's are set via Windows on remote workstations and not from the command line on the GPFS servers themselves!!! Crazy I know. There really needs to be a mm version of the NFSv4 setfacl/nfs4_getfacl commands that ideally makes use of the fast inode traversal features to make things better. In the past I wrote some C code that set specific ACL's on files. This however was to deal with migrating files onto a system and needed to set initial ACL's and didn't make use of the fast traversal features and is completely unpolished. A good starting point would probably be the FreeBSD setfacl/getfacl tools, that at least was my plan but I have never gotten around to it. JAB. -- Jonathan A. Buzzard Email: jonathan (at) buzzard.me.uk Fife, United Kingdom. From TROPPENS at de.ibm.com Wed Jul 29 09:02:59 2015 From: TROPPENS at de.ibm.com (Ulf Troppens) Date: Wed, 29 Jul 2015 10:02:59 +0200 Subject: [gpfsug-discuss] GPFS and Community Scientific Cloud In-Reply-To: References: Message-ID: Hi Simon, I have started to draft a response, but it gets longer and longer. I need some more time to respond. Best regards, Ulf. -- IBM Spectrum Scale Development - Client Engagements & Solutions Delivery Consulting IT Specialist Author "Storage Networks Explained" IBM Deutschland Research & Development GmbH Vorsitzende des Aufsichtsrats: Martina Koederitz Gesch?ftsf?hrung: Dirk Wittkopp Sitz der Gesellschaft: B?blingen / Registergericht: Amtsgericht Stuttgart, HRB 243294 From: "Simon Thompson (Research Computing - IT Services)" To: gpfsug main discussion list Date: 27.07.2015 23:24 Subject: Re: [gpfsug-discuss] GPFS and Community Scientific Cloud Sent by: gpfsug-discuss-bounces at gpfsug.org Hi Ulf, Thanks for the email, as suggested, I'm copying this to the GPFS UG mailing list as well as I'm sure the discussion is of interest to others. I guess what we're looking to do is to have arbitrary VMs running provided by users (I.e. Completely untrusted), but to provide them a way to get secure access to only their data. Right now we can't give them a GPFS client as this is too trusting, I was wondering how easy it would be for us to implement something like: User has a VM User runs 'kinit user at DOMAIN' to gain kerberos ticket and can then securely gain access to only their files from my NFS server. I also mentioned Janet ASSENT, which is a relatively recent project: https://jisc.ac.uk/assent (It was piloted as Janet Moonshot). Which builds on top of SAML to provide other software access to federation. My understanding is that site-specific UID mapping is needed (e.g. On the NFS/GPFS server). Simon >I have some experience with the following questions: > >> NFS just isn?t built for security really. I guess NFSv4 with KRB5 is >> one option to look at, with user based credentials. That might just >> about be feasible if the user were do authenticate with kinit before >> being able to access NFSv4 mounted files. I.e. Its done at the user >> level rather than the instance level. That might be an interesting >> project as a feasibility study to look at, will it work? How would >> we integrate into a federated access management system (something >> like UK Federation and ABFAB/Moonshot/Assent maybe?). Could we >> provide easy steps for a user in a VM to follow? Can we even make it >> work with Ganesha in such an environment? > > >Kerberized NFSv3 and Kerberized NFSv4 provide nearly the same level of >security. Kerberos makes the difference and not the NFS version. I have >posted some background information to the GPFS forum: >http://ibm.co/1VFLUR4 > >Kerberized NFSv4 has the advantage that it allows different UID/GID ranges >on NFS server and NFS client. I have led a proof-of-concept where we have >used this feature to provide secure data access to personalized patient >data for multiple tenants where the tenants had conflicting UID/GID >ranges. >I have some material which I will share via the GPFS forum. > >UK Federation seems to be based on SAML/Shibboleth. Unfortunately there is >no easy integration of network file protocols such as NFS and SMB and >SAML/Shibboleth, because file protocols require attributes which are >typically not stored in SAML/Shibboleth. Fortunately I provided technical >guidance to a customer who exactly implemented this integration in order >to >provide secure file service to multiple universities, again with >conflicting UID/GID ranges. I need some time to write it up and publish >it. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss From chair at gpfsug.org Thu Jul 30 21:36:07 2015 From: chair at gpfsug.org (chair-gpfsug.org) Date: Thu, 30 Jul 2015 21:36:07 +0100 Subject: [gpfsug-discuss] July Meet the devs Message-ID: I've heard some great feedback about the July meet the devs held at IBM Warwick this week. Thanks to Ross and Patrick at IBM and Clare for coordinating the registration for this! Jez has a few photos so we'll try and get those uploaded in the next week or so to the website. Simon (GPFS UG Chair) From secretary at gpfsug.org Wed Jul 1 09:00:51 2015 From: secretary at gpfsug.org (Secretary GPFS UG) Date: Wed, 01 Jul 2015 09:00:51 +0100 Subject: [gpfsug-discuss] Meet the Developers Message-ID: Dear All, We are planning the next 'Meet the Devs' event for Wednesday 29th July, 11am-3pm. Depending on interest, we are looking to hold in either Manchester or Warwick. The agenda promises to be hands on and give you the opportunity to speak face to face with the developers of GPFS. Guideline agenda: * Data analytic workloads - development to show and tell UK work on establishing use cases and tighter integration of Spark on top of GPFS * Show the GUI coming in 4.2 * Discuss 4.2 and beyond roadmap * How would you like IP management to work for protocol access? * Optional - Team can demo & discuss NFS/SMB/Object integration into Scale Lunch and refreshments will be provided. Please can you let me know by email if you are interested in attending along with your preferred venue by Friday 10th July. Thanks and we hope to see you there! -- Claire O'Toole (n?e Robson) GPFS User Group Secretary +44 (0)7508 033896 From chair at gpfsug.org Wed Jul 1 09:21:03 2015 From: chair at gpfsug.org (GPFS UG Chair) Date: Wed, 1 Jul 2015 09:21:03 +0100 Subject: [gpfsug-discuss] mailing list change Message-ID: Hi All, We've made a change to the mailing list so that only subscribers are now able to post to the list. We've done this as we've been getting a *lot* of spam held for moderation from non-members and the occasional legitimate post was getting lost in the spam. If you or colleagues routinely post from a different address from that subscribed to the list, you'll now need to be subscribed (you'll get an error back from the list when you try to post). As its a mailman list, if you do want to have multiple addresses subscribed, you can of course disable the address from the mailman interface from receiving posts. Simon -------------- next part -------------- An HTML attachment was scrubbed... URL: From oehmes at us.ibm.com Wed Jul 1 15:21:29 2015 From: oehmes at us.ibm.com (Sven Oehme) Date: Wed, 1 Jul 2015 07:21:29 -0700 Subject: [gpfsug-discuss] GPFS 4.1.1 without QoS for mmrestripefs? In-Reply-To: <2CDF270206A255459AC4FA6B08E52AF90114634DD0@ABCSYSEXC1.abcsystems.ch> References: <2CDF270206A255459AC4FA6B08E52AF90114634DD0@ABCSYSEXC1.abcsystems.ch> Message-ID: <201507011422.t61EMZmw011626@d01av01.pok.ibm.com> Daniel, as you know, we can't discuss future / confidential items on a mailing list. what i presented as an outlook to future releases hasn't changed from a technical standpoint, we just can't share a release date until we announce it official. there are multiple ways today to limit the impact on restripe and other tasks, the best way to do this is to run the task ( using -N) on a node (or very small number of nodes) that has no performance critical role. while this is not perfect, it should limit the impact significantly. . sven ------------------------------------------ Sven Oehme Scalable Storage Research email: oehmes at us.ibm.com Phone: +1 (408) 824-8904 IBM Almaden Research Lab ------------------------------------------ From: Daniel Vogel To: "'gpfsug-discuss at gpfsug.org'" Date: 07/01/2015 03:29 AM Subject: [gpfsug-discuss] GPFS 4.1.1 without QoS for mmrestripefs? Sent by: gpfsug-discuss-bounces at gpfsug.org Hi Years ago, IBM made some plan to do a implementation ?QoS for mmrestripefs, mmdeldisk??. If a ?mmfsrestripe? is running, very poor performance for NFS access. I opened a PMR to ask for QoS in version 4.1.1 (Spectrum Scale). PMR 61309,113,848: I discussed the question of QOS with the development team. These command changes that were noticed are not meant to be used as GA code which is why they are not documented. I cannot provide any further information from the support perspective. Anybody knows about QoS? The last hope was at ?GPFS Workshop Stuttgart M?rz 2015? with Sven Oehme as speaker. Daniel Vogel IT Consultant ABC SYSTEMS AG Hauptsitz Z?rich R?tistrasse 28 CH - 8952 Schlieren T +41 43 433 6 433 D +41 43 433 6 467 http://www.abcsystems.ch ABC - Always Better Concepts. Approved By Customers since 1981. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: From Kevin.Buterbaugh at Vanderbilt.Edu Wed Jul 1 15:32:50 2015 From: Kevin.Buterbaugh at Vanderbilt.Edu (Buterbaugh, Kevin L) Date: Wed, 1 Jul 2015 14:32:50 +0000 Subject: [gpfsug-discuss] GPFS 4.1.1 without QoS for mmrestripefs? In-Reply-To: <201507011422.t61EMZmw011626@d01av01.pok.ibm.com> References: <2CDF270206A255459AC4FA6B08E52AF90114634DD0@ABCSYSEXC1.abcsystems.ch> <201507011422.t61EMZmw011626@d01av01.pok.ibm.com> Message-ID: Sven, It?s been a while since I tried that, but the last time I tried to limit the impact of a restripe by only running it on a few NSD server nodes it made things worse. Everybody was as slowed down as they would?ve been if I?d thrown every last NSD server we have at it and they were slowed down for longer, since using fewer NSD servers meant the restripe ran longer. What we do is always kick off restripes on a Friday afternoon, throw every NSD server we have at them, and let them run over the weekend. Interactive use is lower then and people don?t notice or care if their batch jobs run longer. Of course, this is all just my experiences. YMMV... Kevin On Jul 1, 2015, at 9:21 AM, Sven Oehme > wrote: Daniel, as you know, we can't discuss future / confidential items on a mailing list. what i presented as an outlook to future releases hasn't changed from a technical standpoint, we just can't share a release date until we announce it official. there are multiple ways today to limit the impact on restripe and other tasks, the best way to do this is to run the task ( using -N) on a node (or very small number of nodes) that has no performance critical role. while this is not perfect, it should limit the impact significantly. . sven ------------------------------------------ Sven Oehme Scalable Storage Research email: oehmes at us.ibm.com Phone: +1 (408) 824-8904 IBM Almaden Research Lab ------------------------------------------ Daniel Vogel ---07/01/2015 03:29:11 AM---Hi Years ago, IBM made some plan to do a implementation "QoS for mmrestripefs, mmdeldisk...". If a " From: Daniel Vogel > To: "'gpfsug-discuss at gpfsug.org'" > Date: 07/01/2015 03:29 AM Subject: [gpfsug-discuss] GPFS 4.1.1 without QoS for mmrestripefs? Sent by: gpfsug-discuss-bounces at gpfsug.org ________________________________ Hi Years ago, IBM made some plan to do a implementation ?QoS for mmrestripefs, mmdeldisk??. If a ?mmfsrestripe? is running, very poor performance for NFS access. I opened a PMR to ask for QoS in version 4.1.1 (Spectrum Scale). PMR 61309,113,848: I discussed the question of QOS with the development team. These command changes that were noticed are not meant to be used as GA code which is why they are not documented. I cannot provide any further information from the support perspective. Anybody knows about QoS? The last hope was at ?GPFS Workshop Stuttgart M?rz 2015? with Sven Oehme as speaker. Daniel Vogel IT Consultant ABC SYSTEMS AG Hauptsitz Z?rich R?tistrasse 28 CH - 8952 Schlieren T +41 43 433 6 433 D +41 43 433 6 467 http://www.abcsystems.ch ABC - Always Better Concepts. Approved By Customers since 1981. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss ? Kevin Buterbaugh - Senior System Administrator Vanderbilt University - Advanced Computing Center for Research and Education Kevin.Buterbaugh at vanderbilt.edu - (615)875-9633 -------------- next part -------------- An HTML attachment was scrubbed... URL: From chris.hunter at yale.edu Wed Jul 1 16:52:07 2015 From: chris.hunter at yale.edu (Chris Hunter) Date: Wed, 01 Jul 2015 11:52:07 -0400 Subject: [gpfsug-discuss] gpfs rdma expels Message-ID: <55940CA7.9010506@yale.edu> Hi UG list, We have a large rdma/tcp multi-cluster gpfs filesystem, about 2/3 of clients use RDMA. We see a large number of expels of rdma clients but less of the tcp clients. Most of the gpfs config is at defaults. We are unclear if any of the non-RDMA config items (eg. Idle socket timeout) would help our issue. Any suggestions on gpfs config parameters we should investigate ? thank-you in advance, chris hunter yale hpc group From viccornell at gmail.com Wed Jul 1 16:58:31 2015 From: viccornell at gmail.com (Vic Cornell) Date: Wed, 1 Jul 2015 16:58:31 +0100 Subject: [gpfsug-discuss] gpfs rdma expels In-Reply-To: <55940CA7.9010506@yale.edu> References: <55940CA7.9010506@yale.edu> Message-ID: <6E28C0FB-2F99-4127-B1F2-272BA2532330@gmail.com> If it used to work then its probably not config. Most expels are the result of network connectivity problems. If your cluster is not too big try looking at ping from every node to every other node and look for large latencies. Also look to see who is expelling who. Ie - if your RDMA nodes are being expelled by non-RDMA nodes. It may point to a weakness in your network which GPFS ,being as it is a great finder of weaknesses, is having a problem with. Also more details (network config etc) will elicit more detailed suggestions. Cheers, Vic > On 1 Jul 2015, at 16:52, Chris Hunter wrote: > > Hi UG list, > We have a large rdma/tcp multi-cluster gpfs filesystem, about 2/3 of clients use RDMA. We see a large number of expels of rdma clients but less of the tcp clients. > Most of the gpfs config is at defaults. We are unclear if any of the non-RDMA config items (eg. Idle socket timeout) would help our issue. Any suggestions on gpfs config parameters we should investigate ? > > thank-you in advance, > chris hunter > yale hpc group > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at gpfsug.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss From stijn.deweirdt at ugent.be Thu Jul 2 07:42:30 2015 From: stijn.deweirdt at ugent.be (Stijn De Weirdt) Date: Thu, 02 Jul 2015 08:42:30 +0200 Subject: [gpfsug-discuss] gpfs rdma expels In-Reply-To: <6E28C0FB-2F99-4127-B1F2-272BA2532330@gmail.com> References: <55940CA7.9010506@yale.edu> <6E28C0FB-2F99-4127-B1F2-272BA2532330@gmail.com> Message-ID: <5594DD56.6010302@ugent.be> do you use ipoib for the rdma nodes or regular ethernet? and what OS are you on? we had issue with el7.1 kernel and ipoib; there's packet loss with ipoib and mlnx_ofed (and mlnx engineering told that it might be in basic ofed from el7.1 too). 7.0 kernels are ok) and client expels were the first signs on our setup. stijn On 07/01/2015 05:58 PM, Vic Cornell wrote: > If it used to work then its probably not config. Most expels are the result of network connectivity problems. > > If your cluster is not too big try looking at ping from every node to every other node and look for large latencies. > > Also look to see who is expelling who. Ie - if your RDMA nodes are being expelled by non-RDMA nodes. It may point to a weakness in your network which GPFS ,being as it is a great finder of weaknesses, is having a problem with. > > Also more details (network config etc) will elicit more detailed suggestions. > > Cheers, > > Vic > > > >> On 1 Jul 2015, at 16:52, Chris Hunter wrote: >> >> Hi UG list, >> We have a large rdma/tcp multi-cluster gpfs filesystem, about 2/3 of clients use RDMA. We see a large number of expels of rdma clients but less of the tcp clients. >> Most of the gpfs config is at defaults. We are unclear if any of the non-RDMA config items (eg. Idle socket timeout) would help our issue. Any suggestions on gpfs config parameters we should investigate ? >> >> thank-you in advance, >> chris hunter >> yale hpc group >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at gpfsug.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at gpfsug.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > From Daniel.Vogel at abcsystems.ch Thu Jul 2 08:12:32 2015 From: Daniel.Vogel at abcsystems.ch (Daniel Vogel) Date: Thu, 2 Jul 2015 07:12:32 +0000 Subject: [gpfsug-discuss] GPFS 4.1.1 without QoS for mmrestripefs? In-Reply-To: <201507011422.t61EMZmw011626@d01av01.pok.ibm.com> References: <2CDF270206A255459AC4FA6B08E52AF90114634DD0@ABCSYSEXC1.abcsystems.ch> <201507011422.t61EMZmw011626@d01av01.pok.ibm.com> Message-ID: <2CDF270206A255459AC4FA6B08E52AF901146351BD@ABCSYSEXC1.abcsystems.ch> Sven, Yes I agree, but ?using ?N? to reduce the load helps not really. If I use NFS, for example, as a ESX data store, ESX I/O latency for NFS goes very high, the VM?s hangs. By the way I use SSD PCIe cards, perfect ?mirror speed? but slow I/O on NFS. The GPFS cluster concept I use are different than GSS or traditional FC (shared storage). I use shared nothing with IB (no FPO), many GPFS nodes with NSD?s. I know the need to resync the FS with mmchdisk / mmrestripe will happen more often. The only one feature will help is QoS for the GPFS admin jobs. I hope we are not fare away from this. Thanks, Daniel Von: gpfsug-discuss-bounces at gpfsug.org [mailto:gpfsug-discuss-bounces at gpfsug.org] Im Auftrag von Sven Oehme Gesendet: Mittwoch, 1. Juli 2015 16:21 An: gpfsug main discussion list Betreff: Re: [gpfsug-discuss] GPFS 4.1.1 without QoS for mmrestripefs? Daniel, as you know, we can't discuss future / confidential items on a mailing list. what i presented as an outlook to future releases hasn't changed from a technical standpoint, we just can't share a release date until we announce it official. there are multiple ways today to limit the impact on restripe and other tasks, the best way to do this is to run the task ( using -N) on a node (or very small number of nodes) that has no performance critical role. while this is not perfect, it should limit the impact significantly. . sven ------------------------------------------ Sven Oehme Scalable Storage Research email: oehmes at us.ibm.com Phone: +1 (408) 824-8904 IBM Almaden Research Lab ------------------------------------------ [Inactive hide details for Daniel Vogel ---07/01/2015 03:29:11 AM---Hi Years ago, IBM made some plan to do a implementation "QoS]Daniel Vogel ---07/01/2015 03:29:11 AM---Hi Years ago, IBM made some plan to do a implementation "QoS for mmrestripefs, mmdeldisk...". If a " From: Daniel Vogel > To: "'gpfsug-discuss at gpfsug.org'" > Date: 07/01/2015 03:29 AM Subject: [gpfsug-discuss] GPFS 4.1.1 without QoS for mmrestripefs? Sent by: gpfsug-discuss-bounces at gpfsug.org ________________________________ Hi Years ago, IBM made some plan to do a implementation ?QoS for mmrestripefs, mmdeldisk??. If a ?mmfsrestripe? is running, very poor performance for NFS access. I opened a PMR to ask for QoS in version 4.1.1 (Spectrum Scale). PMR 61309,113,848: I discussed the question of QOS with the development team. These command changes that were noticed are not meant to be used as GA code which is why they are not documented. I cannot provide any further information from the support perspective. Anybody knows about QoS? The last hope was at ?GPFS Workshop Stuttgart M?rz 2015? with Sven Oehme as speaker. Daniel Vogel IT Consultant ABC SYSTEMS AG Hauptsitz Z?rich R?tistrasse 28 CH - 8952 Schlieren T +41 43 433 6 433 D +41 43 433 6 467 http://www.abcsystems.ch ABC - Always Better Concepts. Approved By Customers since 1981. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image001.gif Type: image/gif Size: 105 bytes Desc: image001.gif URL: From chris.howarth at citi.com Thu Jul 2 08:24:37 2015 From: chris.howarth at citi.com (Howarth, Chris ) Date: Thu, 2 Jul 2015 07:24:37 +0000 Subject: [gpfsug-discuss] GPFS 4.1.1 without QoS for mmrestripefs? In-Reply-To: <2CDF270206A255459AC4FA6B08E52AF901146351BD@ABCSYSEXC1.abcsystems.ch> References: <2CDF270206A255459AC4FA6B08E52AF90114634DD0@ABCSYSEXC1.abcsystems.ch> <201507011422.t61EMZmw011626@d01av01.pok.ibm.com> <2CDF270206A255459AC4FA6B08E52AF901146351BD@ABCSYSEXC1.abcsystems.ch> Message-ID: <0609A0AC1B1CA9408D88D4144C5C990B75D89CF5@EXLNMB52.eur.nsroot.net> Daniel ?in our environment we have data and metadata split out onto separate drives in separate servers. We also set the GPFS parameter ?mmchconfig defaultHelperNodes=?list_of_metadata_servers? which will automatically only use these nodes for the scan for restriping/rebalancing data (rather than having to specify the ?N option). This dramatically reduced the impact to clients accessing the data nodes while these activities are taking place. Also using SSDs for metadata nodes can make a big improvement. Chris From: gpfsug-discuss-bounces at gpfsug.org [mailto:gpfsug-discuss-bounces at gpfsug.org] On Behalf Of Daniel Vogel Sent: Thursday, July 02, 2015 8:13 AM To: 'gpfsug main discussion list' Subject: Re: [gpfsug-discuss] GPFS 4.1.1 without QoS for mmrestripefs? Sven, Yes I agree, but ?using ?N? to reduce the load helps not really. If I use NFS, for example, as a ESX data store, ESX I/O latency for NFS goes very high, the VM?s hangs. By the way I use SSD PCIe cards, perfect ?mirror speed? but slow I/O on NFS. The GPFS cluster concept I use are different than GSS or traditional FC (shared storage). I use shared nothing with IB (no FPO), many GPFS nodes with NSD?s. I know the need to resync the FS with mmchdisk / mmrestripe will happen more often. The only one feature will help is QoS for the GPFS admin jobs. I hope we are not fare away from this. Thanks, Daniel Von: gpfsug-discuss-bounces at gpfsug.org [mailto:gpfsug-discuss-bounces at gpfsug.org] Im Auftrag von Sven Oehme Gesendet: Mittwoch, 1. Juli 2015 16:21 An: gpfsug main discussion list Betreff: Re: [gpfsug-discuss] GPFS 4.1.1 without QoS for mmrestripefs? Daniel, as you know, we can't discuss future / confidential items on a mailing list. what i presented as an outlook to future releases hasn't changed from a technical standpoint, we just can't share a release date until we announce it official. there are multiple ways today to limit the impact on restripe and other tasks, the best way to do this is to run the task ( using -N) on a node (or very small number of nodes) that has no performance critical role. while this is not perfect, it should limit the impact significantly. . sven ------------------------------------------ Sven Oehme Scalable Storage Research email: oehmes at us.ibm.com Phone: +1 (408) 824-8904 IBM Almaden Research Lab ------------------------------------------ [Inactive hide details for Daniel Vogel ---07/01/2015 03:29:11 AM---Hi Years ago, IBM made some plan to do a implementation "QoS]Daniel Vogel ---07/01/2015 03:29:11 AM---Hi Years ago, IBM made some plan to do a implementation "QoS for mmrestripefs, mmdeldisk...". If a " From: Daniel Vogel > To: "'gpfsug-discuss at gpfsug.org'" > Date: 07/01/2015 03:29 AM Subject: [gpfsug-discuss] GPFS 4.1.1 without QoS for mmrestripefs? Sent by: gpfsug-discuss-bounces at gpfsug.org ________________________________ Hi Years ago, IBM made some plan to do a implementation ?QoS for mmrestripefs, mmdeldisk??. If a ?mmfsrestripe? is running, very poor performance for NFS access. I opened a PMR to ask for QoS in version 4.1.1 (Spectrum Scale). PMR 61309,113,848: I discussed the question of QOS with the development team. These command changes that were noticed are not meant to be used as GA code which is why they are not documented. I cannot provide any further information from the support perspective. Anybody knows about QoS? The last hope was at ?GPFS Workshop Stuttgart M?rz 2015? with Sven Oehme as speaker. Daniel Vogel IT Consultant ABC SYSTEMS AG Hauptsitz Z?rich R?tistrasse 28 CH - 8952 Schlieren T +41 43 433 6 433 D +41 43 433 6 467 http://www.abcsystems.ch ABC - Always Better Concepts. Approved By Customers since 1981. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image001.gif Type: image/gif Size: 105 bytes Desc: image001.gif URL: From chris.hunter at yale.edu Thu Jul 2 14:01:53 2015 From: chris.hunter at yale.edu (Chris Hunter) Date: Thu, 02 Jul 2015 09:01:53 -0400 Subject: [gpfsug-discuss] gpfs rdma expels Message-ID: <55953641.4010701@yale.edu> Thanks for the feedback. Our network is non-uniform, we have three (uniform) rdma networks connected by narrow uplinks. Previously we used gpfs on one network, now we wish to expand to the other networks. Previous experience shows we see "PortXmitWait" messages from traffic over the narrow uplinks. We find expels happen often from gpfs communication over the narrow uplinks. We acknowledge an inherent weakness with narrow uplinks but for practical reasons it would be difficult to resolve. So the question, is it possible to configure gpfs to be tolerant of non-uniform networks with narrow uplinks ? thanks, chris hunter > On 1 Jul 2015, at 16:52, Chris Hunter wrote: > > Hi UG list, > We have a large rdma/tcp multi-cluster gpfs filesystem, about 2/3 of > clients use RDMA. We see a large number of expels of rdma clients but > less of the tcp clients. Most of the gpfs config is at defaults. We > are unclear if any of the non-RDMA config items (eg. Idle socket > timeout) would help our issue. Any suggestions on gpfs config > parameters we should investigate ? From S.J.Thompson at bham.ac.uk Thu Jul 2 16:43:03 2015 From: S.J.Thompson at bham.ac.uk (Simon Thompson (Research Computing - IT Services)) Date: Thu, 2 Jul 2015 15:43:03 +0000 Subject: [gpfsug-discuss] 4.1.1 protocol support Message-ID: Hi, Just wondering if anyone has looked at the new protocol support stuff in 4.1.1 yet? >From what I can see, it wants to use the installer to add things like IBM Samba onto nodes in the cluster. The docs online seem to list manual installation as running the chef template, which is hardly manual... 1. Id like to know what is being run on my cluster 2. Its an existing install which was using sernet samba, so I don't want to go out and break anything inadvertently 3. My protocol nodes are in a multicluster, and I understand the installer doesn't support multicluster. (the docs state that multicluster isn't supported but something like its expected to work). So... Has anyone had a go at this yet and have a set of steps? I've started unpicking the chef recipe, but just wondering if anyone had already had a go at this? (and lets not start on the mildy bemusing error when you "enable" the service with "mmces service enable" (ces service not enabled) - there's other stuff to enable it)... Simon From GARWOODM at uk.ibm.com Thu Jul 2 16:55:42 2015 From: GARWOODM at uk.ibm.com (Michael Garwood7) Date: Thu, 2 Jul 2015 16:55:42 +0100 Subject: [gpfsug-discuss] 4.1.1 protocol support In-Reply-To: References: Message-ID: Hi Simon, 1. Most of the chef recipes involve installing the various packages required for the protocols, and some of the new performance monitoring packages required for mmperfquery. There is a series of steps for proper manual install at http://www-01.ibm.com/support/knowledgecenter/STXKQY_4.1.1/com.ibm.spectrum.scale.v4r11.adv.doc/bl1adv_ces_features.htm but this assumes you have all IBM Samba RPMs and prerequisites installed. The recipes *should* be split out so that at the very least, RPM install is done in its own recipe without configuring or enabling anything... 2. I am not 100% sure what deploying IBM Samba on the cluster will do with regards to sernet samba. As far as I am aware there is no code in the installer or chef recipes to check for other samba deployments running but I may be mistaken. Depending on how sernet samba hooks to GPFS I can't think of any reason why it would cause problems aside from the risk of the protocols not communicating and causing issues with file locks/data overwrites, depending on what workload you have running on samba. 3. I haven't personally seen multicluster deployments done or tested before, but no, it is not officially supported. The installer has been written with the assumption that you are installing to one cluster, so I wouldn't recommend trying with multiple clusters - unforseen consequences :) Regards, Michael Garwood IBM Systems Developer Phone: 44-161-905-4118 E-mail: GARWOODM at uk.ibm.com 40 Blackfriars Street Manchester, M3 2EG United Kingdom IBM United Kingdom Limited Registered in England and Wales with number 741598 Registered office: PO Box 41, North Harbour, Portsmouth, Hants. PO6 3AU From: "Simon Thompson (Research Computing - IT Services)" To: gpfsug main discussion list , Date: 02/07/2015 16:43 Subject: [gpfsug-discuss] 4.1.1 protocol support Sent by: gpfsug-discuss-bounces at gpfsug.org Hi, Just wondering if anyone has looked at the new protocol support stuff in 4.1.1 yet? >From what I can see, it wants to use the installer to add things like IBM Samba onto nodes in the cluster. The docs online seem to list manual installation as running the chef template, which is hardly manual... 1. Id like to know what is being run on my cluster 2. Its an existing install which was using sernet samba, so I don't want to go out and break anything inadvertently 3. My protocol nodes are in a multicluster, and I understand the installer doesn't support multicluster. (the docs state that multicluster isn't supported but something like its expected to work). So... Has anyone had a go at this yet and have a set of steps? I've started unpicking the chef recipe, but just wondering if anyone had already had a go at this? (and lets not start on the mildy bemusing error when you "enable" the service with "mmces service enable" (ces service not enabled) - there's other stuff to enable it)... Simon _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss Unless stated otherwise above: IBM United Kingdom Limited - Registered in England and Wales with number 741598. Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU -------------- next part -------------- An HTML attachment was scrubbed... URL: From oester at gmail.com Thu Jul 2 17:02:01 2015 From: oester at gmail.com (Bob Oesterlin) Date: Thu, 2 Jul 2015 11:02:01 -0500 Subject: [gpfsug-discuss] 4.1.1 protocol support In-Reply-To: References: Message-ID: Hi Simon I was part of a beta program for GPFS (ok, better start saying Spectrum Scale!) 4.1.1, so I've had some experience with the toolkit that installs the protocol nodes. The new protocol nodes MUST be RH7, so it's going to be a bit more of an involved process to migrate to this level than in the past. The GPFS server nodes/client nodes can remain at RH6 is needed. Overall it works pretty well. You do have the option of doing things manually as well. The guide that describes it is pretty good. If you want to discuss the process in detail, I'd be happy to do so - a bit too much to cover over a mailing list. Bob Oesterlin Sr Storage Engineer, Nuance Communications robert.oesterlin at nuance.com On Thu, Jul 2, 2015 at 10:43 AM, Simon Thompson (Research Computing - IT Services) wrote: > Hi, > > Just wondering if anyone has looked at the new protocol support stuff in > 4.1.1 yet? > > From what I can see, it wants to use the installer to add things like IBM > Samba onto nodes in the cluster. The docs online seem to list manual > installation as running the chef template, which is hardly manual... > > 1. Id like to know what is being run on my cluster > 2. Its an existing install which was using sernet samba, so I don't want > to go out and break anything inadvertently > 3. My protocol nodes are in a multicluster, and I understand the installer > doesn't support multicluster. > > (the docs state that multicluster isn't supported but something like its > expected to work). > > So... Has anyone had a go at this yet and have a set of steps? > > I've started unpicking the chef recipe, but just wondering if anyone had > already had a go at this? > > (and lets not start on the mildy bemusing error when you "enable" the > service with "mmces service enable" (ces service not enabled) - there's > other stuff to enable it)... > > Simon > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at gpfsug.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: From S.J.Thompson at bham.ac.uk Thu Jul 2 19:52:28 2015 From: S.J.Thompson at bham.ac.uk (Simon Thompson (Research Computing - IT Services)) Date: Thu, 2 Jul 2015 18:52:28 +0000 Subject: [gpfsug-discuss] 4.1.1 protocol support In-Reply-To: References: Message-ID: Hi Michael, Thanks for that link. This is the docs I?d found before: http://www-01.ibm.com/support/knowledgecenter/#!/STXKQY_4.1.1/com.ibm.spectrum.scale.v4r11.ins.doc/bl1ins_manualprotocols.htm I guess one of the reasons for wanting to unpick is because we already have configuration management tools all in place. I have no issue about GPFS config being inside GPFS, but we really need to know what is going on (and we can manage to get the RPMs all on etc if we know what is needed from the config management tool). I do note that it needs CCR enabled, which we currently don?t have. Now I think this was because we saw issues with mmsdrestore when adding a node that had been reinstalled back into the cluster. I need to check if that is still the case (we work on being able to pull clients, NSDs etc from the cluster and using xcat to reprovision and the a config tool to do the relevant bits to rejoin the cluster ? makes it easier for us to stage kernel, GPFS, OFED updates as we just blat on a new image). I don?t really want to have a mix of Sernet and IBM samba on there, so am happy to pull out those bits, but obviously need to get the IBM bits working as well. Multicluster ? well, our ?protocol? cluster is a separate cluster from the NSD cluster (can?t remote expel, might want to add other GPFS clusters to the protocol layer etc). Of course the multi cluster talks GPFS protocol, so I don?t see any reason why it shouldn?t work, but yes, noted its not supported. Simon From: Michael Garwood7 > Reply-To: gpfsug main discussion list > Date: Thursday, 2 July 2015 16:55 To: gpfsug main discussion list > Subject: Re: [gpfsug-discuss] 4.1.1 protocol support Hi Simon, 1. Most of the chef recipes involve installing the various packages required for the protocols, and some of the new performance monitoring packages required for mmperfquery. There is a series of steps for proper manual install at http://www-01.ibm.com/support/knowledgecenter/STXKQY_4.1.1/com.ibm.spectrum.scale.v4r11.adv.doc/bl1adv_ces_features.htm but this assumes you have all IBM Samba RPMs and prerequisites installed. The recipes *should* be split out so that at the very least, RPM install is done in its own recipe without configuring or enabling anything... 2. I am not 100% sure what deploying IBM Samba on the cluster will do with regards to sernet samba. As far as I am aware there is no code in the installer or chef recipes to check for other samba deployments running but I may be mistaken. Depending on how sernet samba hooks to GPFS I can't think of any reason why it would cause problems aside from the risk of the protocols not communicating and causing issues with file locks/data overwrites, depending on what workload you have running on samba. 3. I haven't personally seen multicluster deployments done or tested before, but no, it is not officially supported. The installer has been written with the assumption that you are installing to one cluster, so I wouldn't recommend trying with multiple clusters - unforseen consequences :) Regards, Michael Garwood IBM Systems Developer ________________________________ Phone: 44-161-905-4118 E-mail: GARWOODM at uk.ibm.com 40 Blackfriars Street Manchester, M3 2EG United Kingdom IBM United Kingdom Limited Registered in England and Wales with number 741598 Registered office: PO Box 41, North Harbour, Portsmouth, Hants. PO6 3AU From: "Simon Thompson (Research Computing - IT Services)" > To: gpfsug main discussion list >, Date: 02/07/2015 16:43 Subject: [gpfsug-discuss] 4.1.1 protocol support Sent by: gpfsug-discuss-bounces at gpfsug.org ________________________________ Hi, Just wondering if anyone has looked at the new protocol support stuff in 4.1.1 yet? >From what I can see, it wants to use the installer to add things like IBM Samba onto nodes in the cluster. The docs online seem to list manual installation as running the chef template, which is hardly manual... 1. Id like to know what is being run on my cluster 2. Its an existing install which was using sernet samba, so I don't want to go out and break anything inadvertently 3. My protocol nodes are in a multicluster, and I understand the installer doesn't support multicluster. (the docs state that multicluster isn't supported but something like its expected to work). So... Has anyone had a go at this yet and have a set of steps? I've started unpicking the chef recipe, but just wondering if anyone had already had a go at this? (and lets not start on the mildy bemusing error when you "enable" the service with "mmces service enable" (ces service not enabled) - there's other stuff to enable it)... Simon _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss Unless stated otherwise above: IBM United Kingdom Limited - Registered in England and Wales with number 741598. Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU -------------- next part -------------- An HTML attachment was scrubbed... URL: From S.J.Thompson at bham.ac.uk Thu Jul 2 19:58:12 2015 From: S.J.Thompson at bham.ac.uk (Simon Thompson (Research Computing - IT Services)) Date: Thu, 2 Jul 2015 18:58:12 +0000 Subject: [gpfsug-discuss] 4.1.1 protocol support In-Reply-To: References: Message-ID: Hi Bob, Thanks, I?ll have a look through the link Michael sent me and shout if I get stuck? Looks a bit different to the previous way were we running this with ctdb etc. Our protocol nodes are already running 7.1 (though CentOS which means the mmbuildgpl command doesn?t work, would be much nice of course if the init script detected the kernel had changed and did a build etc automagically ?). Simon From: Bob Oesterlin > Reply-To: gpfsug main discussion list > Date: Thursday, 2 July 2015 17:02 To: gpfsug main discussion list > Subject: Re: [gpfsug-discuss] 4.1.1 protocol support Hi Simon I was part of a beta program for GPFS (ok, better start saying Spectrum Scale!) 4.1.1, so I've had some experience with the toolkit that installs the protocol nodes. The new protocol nodes MUST be RH7, so it's going to be a bit more of an involved process to migrate to this level than in the past. The GPFS server nodes/client nodes can remain at RH6 is needed. Overall it works pretty well. You do have the option of doing things manually as well. The guide that describes it is pretty good. If you want to discuss the process in detail, I'd be happy to do so - a bit too much to cover over a mailing list. Bob Oesterlin Sr Storage Engineer, Nuance Communications robert.oesterlin at nuance.com On Thu, Jul 2, 2015 at 10:43 AM, Simon Thompson (Research Computing - IT Services) > wrote: Hi, Just wondering if anyone has looked at the new protocol support stuff in 4.1.1 yet? >From what I can see, it wants to use the installer to add things like IBM Samba onto nodes in the cluster. The docs online seem to list manual installation as running the chef template, which is hardly manual... 1. Id like to know what is being run on my cluster 2. Its an existing install which was using sernet samba, so I don't want to go out and break anything inadvertently 3. My protocol nodes are in a multicluster, and I understand the installer doesn't support multicluster. (the docs state that multicluster isn't supported but something like its expected to work). So... Has anyone had a go at this yet and have a set of steps? I've started unpicking the chef recipe, but just wondering if anyone had already had a go at this? (and lets not start on the mildy bemusing error when you "enable" the service with "mmces service enable" (ces service not enabled) - there's other stuff to enable it)... Simon _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From oester at gmail.com Thu Jul 2 20:03:02 2015 From: oester at gmail.com (Bob Oesterlin) Date: Thu, 2 Jul 2015 14:03:02 -0500 Subject: [gpfsug-discuss] 4.1.1 protocol support In-Reply-To: References: Message-ID: On Thu, Jul 2, 2015 at 1:52 PM, Simon Thompson (Research Computing - IT Services) wrote: > I do note that it needs CCR enabled, which we currently don?t have. Now I > think this was because we saw issues with mmsdrestore when adding a node > that had been reinstalled back into the cluster. I need to check if that is > still the case (we work on being able to pull clients, NSDs etc from the > cluster and using xcat to reprovision and the a config tool to do the > relevant bits to rejoin the cluster ? makes it easier for us to stage > kernel, GPFS, OFED updates as we just blat on a new image). > Yes, and this is why we couldn't use CCR - our compute nodes are netboot, so they go thru a mmsdrrestore every time they reboot. Now, they have fixed this in 4.1.1, which means if you can get (the cluster) to 4.1.1 and turn on CCR, mmsdrrestore should work. Note to self: Test this out in your sandbox cluster. :-) Bob Oesterlin -------------- next part -------------- An HTML attachment was scrubbed... URL: From S.J.Thompson at bham.ac.uk Fri Jul 3 12:22:06 2015 From: S.J.Thompson at bham.ac.uk (Simon Thompson (Research Computing - IT Services)) Date: Fri, 3 Jul 2015 11:22:06 +0000 Subject: [gpfsug-discuss] mmsdrestore with CCR enabled (was Re: 4.1.1 protocol support) Message-ID: Bob, (anyone?) Have you tried mmsdrestore to see if its working in 4.1.1? # mmsdrrestore -p PRIMARY -R /usr/bin/scp Fri 3 Jul 11:56:05 BST 2015: mmsdrrestore: Processing node PRIMARY ccrio initialization failed (err 811) mmsdrrestore: Unable to retrieve GPFS cluster files from CCR. mmsdrrestore: Unexpected error from updateMmfsEnvironment. Return code: 1 mmsdrrestore: Command failed. Examine previous error messages to determine cause. It seems to copy the mmsdrfs file to the local node into /var/mmfs/gen/mmsdrfs but then fails to actually work. Simon From: Bob Oesterlin > Reply-To: gpfsug main discussion list > Date: Thursday, 2 July 2015 20:03 To: gpfsug main discussion list > Subject: Re: [gpfsug-discuss] 4.1.1 protocol support On Thu, Jul 2, 2015 at 1:52 PM, Simon Thompson (Research Computing - IT Services) > wrote: I do note that it needs CCR enabled, which we currently don?t have. Now I think this was because we saw issues with mmsdrestore when adding a node that had been reinstalled back into the cluster. I need to check if that is still the case (we work on being able to pull clients, NSDs etc from the cluster and using xcat to reprovision and the a config tool to do the relevant bits to rejoin the cluster ? makes it easier for us to stage kernel, GPFS, OFED updates as we just blat on a new image). Yes, and this is why we couldn't use CCR - our compute nodes are netboot, so they go thru a mmsdrrestore every time they reboot. Now, they have fixed this in 4.1.1, which means if you can get (the cluster) to 4.1.1 and turn on CCR, mmsdrrestore should work. Note to self: Test this out in your sandbox cluster. :-) Bob Oesterlin -------------- next part -------------- An HTML attachment was scrubbed... URL: From S.J.Thompson at bham.ac.uk Fri Jul 3 12:50:31 2015 From: S.J.Thompson at bham.ac.uk (Simon Thompson (Research Computing - IT Services)) Date: Fri, 3 Jul 2015 11:50:31 +0000 Subject: [gpfsug-discuss] mmsdrestore with CCR enabled (was Re: 4.1.1 protocol support) In-Reply-To: References: Message-ID: Actually, no just ignore me, it does appear to be fixed in 4.1.1 * I cleaned up the node by removing the 4.1.1 packages, then cleaned up /var/mmfs, but then when the config tool reinstalled, it put 4.1.0 back on and didn?t apply the updates to 4.1.1, so it must have been an older version of mmsdrrestore Simon From: Simon Thompson > Reply-To: gpfsug main discussion list > Date: Friday, 3 July 2015 12:22 To: gpfsug main discussion list > Subject: [gpfsug-discuss] mmsdrestore with CCR enabled (was Re: 4.1.1 protocol support) Bob, (anyone?) Have you tried mmsdrestore to see if its working in 4.1.1? # mmsdrrestore -p PRIMARY -R /usr/bin/scp Fri 3 Jul 11:56:05 BST 2015: mmsdrrestore: Processing node PRIMARY ccrio initialization failed (err 811) mmsdrrestore: Unable to retrieve GPFS cluster files from CCR. mmsdrrestore: Unexpected error from updateMmfsEnvironment. Return code: 1 mmsdrrestore: Command failed. Examine previous error messages to determine cause. It seems to copy the mmsdrfs file to the local node into /var/mmfs/gen/mmsdrfs but then fails to actually work. Simon From: Bob Oesterlin > Reply-To: gpfsug main discussion list > Date: Thursday, 2 July 2015 20:03 To: gpfsug main discussion list > Subject: Re: [gpfsug-discuss] 4.1.1 protocol support On Thu, Jul 2, 2015 at 1:52 PM, Simon Thompson (Research Computing - IT Services) > wrote: I do note that it needs CCR enabled, which we currently don?t have. Now I think this was because we saw issues with mmsdrestore when adding a node that had been reinstalled back into the cluster. I need to check if that is still the case (we work on being able to pull clients, NSDs etc from the cluster and using xcat to reprovision and the a config tool to do the relevant bits to rejoin the cluster ? makes it easier for us to stage kernel, GPFS, OFED updates as we just blat on a new image). Yes, and this is why we couldn't use CCR - our compute nodes are netboot, so they go thru a mmsdrrestore every time they reboot. Now, they have fixed this in 4.1.1, which means if you can get (the cluster) to 4.1.1 and turn on CCR, mmsdrrestore should work. Note to self: Test this out in your sandbox cluster. :-) Bob Oesterlin -------------- next part -------------- An HTML attachment was scrubbed... URL: From oester at gmail.com Fri Jul 3 13:21:43 2015 From: oester at gmail.com (Bob Oesterlin) Date: Fri, 3 Jul 2015 07:21:43 -0500 Subject: [gpfsug-discuss] mmsdrestore with CCR enabled (was Re: 4.1.1 protocol support) In-Reply-To: References: Message-ID: On Fri, Jul 3, 2015 at 6:22 AM, Simon Thompson (Research Computing - IT Services) wrote: > Have you tried mmsdrestore to see if its working in 4.1.1? Well, no actually :) They told me it was fixed but I have never got 'round to checking it during my beta testing. If it's not, I say submit a PMR and let's get them to fix it - I will do the same. It would be nice to actually use CCR, especially if the new protocol support depends on it. Bob Oesterlin -------------- next part -------------- An HTML attachment was scrubbed... URL: From oester at gmail.com Fri Jul 3 13:22:37 2015 From: oester at gmail.com (Bob Oesterlin) Date: Fri, 3 Jul 2015 07:22:37 -0500 Subject: [gpfsug-discuss] mmsdrestore with CCR enabled (was Re: 4.1.1 protocol support) In-Reply-To: References: Message-ID: On Fri, Jul 3, 2015 at 6:22 AM, Simon Thompson (Research Computing - IT Services) wrote: > Have you tried mmsdrestore to see if its working in 4.1.1? One thing - did you try this on a pure 4.1.1 cluster with release=LATEST? Bob Oesterlin -------------- next part -------------- An HTML attachment was scrubbed... URL: From S.J.Thompson at bham.ac.uk Fri Jul 3 13:28:10 2015 From: S.J.Thompson at bham.ac.uk (Simon Thompson (Research Computing - IT Services)) Date: Fri, 3 Jul 2015 12:28:10 +0000 Subject: [gpfsug-discuss] mmsdrestore with CCR enabled (was Re: 4.1.1 protocol support) In-Reply-To: References: Message-ID: It was on a pure cluster with 4.1.1 only. (I had to do that a precursor to start enabling CES). As I mentioned, I messed up with 4.1.0 client installed so it doesn?t work from a mixed version, but did work from pure 4.1.1 Simon From: Bob Oesterlin > Reply-To: gpfsug main discussion list > Date: Friday, 3 July 2015 13:22 To: gpfsug main discussion list > Subject: Re: [gpfsug-discuss] mmsdrestore with CCR enabled (was Re: 4.1.1 protocol support) On Fri, Jul 3, 2015 at 6:22 AM, Simon Thompson (Research Computing - IT Services) > wrote: Have you tried mmsdrestore to see if its working in 4.1.1? One thing - did you try this on a pure 4.1.1 cluster with release=LATEST? Bob Oesterlin -------------- next part -------------- An HTML attachment was scrubbed... URL: From oehmes at us.ibm.com Fri Jul 3 23:48:38 2015 From: oehmes at us.ibm.com (Sven Oehme) Date: Fri, 3 Jul 2015 15:48:38 -0700 Subject: [gpfsug-discuss] GPFS 4.1.1 without QoS for mmrestripefs? In-Reply-To: <2CDF270206A255459AC4FA6B08E52AF901146351BD@ABCSYSEXC1.abcsystems.ch> References: <2CDF270206A255459AC4FA6B08E52AF90114634DD0@ABCSYSEXC1.abcsystems.ch><201507011422.t61EMZmw011626@d01av01.pok.ibm.com> <2CDF270206A255459AC4FA6B08E52AF901146351BD@ABCSYSEXC1.abcsystems.ch> Message-ID: <201507032249.t63Mnffp025995@d03av03.boulder.ibm.com> this triggers a few questions 1. have you tried running it only on a node that doesn't serve NFS data ? 2. what NFS stack are you using ? is this the kernel NFS Server as part of linux means you use cNFS ? if the answer to 2 is yes, have you adjusted the nfsd threads in /etc/sysconfig/nfs ? the default is only 8 and if you run with the default you have a very low number of threads from the outside competing with a larger number of threads doing restripe, increasing the nfsd threads could help. you could also reduce the number of internal restripe threads to try out if that helps mitigating the impact. to try an extreme low value set the following : mmchconfig pitWorkerThreadsPerNode=1 -i and retry the restripe again, to reset it back to default run mmchconfig pitWorkerThreadsPerNode=DEFAULT -i sven ------------------------------------------ Sven Oehme Scalable Storage Research email: oehmes at us.ibm.com Phone: +1 (408) 824-8904 IBM Almaden Research Lab ------------------------------------------ From: Daniel Vogel To: "'gpfsug main discussion list'" Date: 07/02/2015 12:12 AM Subject: Re: [gpfsug-discuss] GPFS 4.1.1 without QoS for mmrestripefs? Sent by: gpfsug-discuss-bounces at gpfsug.org Sven, Yes I agree, but ?using ?N? to reduce the load helps not really. If I use NFS, for example, as a ESX data store, ESX I/O latency for NFS goes very high, the VM?s hangs. By the way I use SSD PCIe cards, perfect ?mirror speed? but slow I/O on NFS. The GPFS cluster concept I use are different than GSS or traditional FC (shared storage). I use shared nothing with IB (no FPO), many GPFS nodes with NSD?s. I know the need to resync the FS with mmchdisk / mmrestripe will happen more often. The only one feature will help is QoS for the GPFS admin jobs. I hope we are not fare away from this. Thanks, Daniel Von: gpfsug-discuss-bounces at gpfsug.org [ mailto:gpfsug-discuss-bounces at gpfsug.org] Im Auftrag von Sven Oehme Gesendet: Mittwoch, 1. Juli 2015 16:21 An: gpfsug main discussion list Betreff: Re: [gpfsug-discuss] GPFS 4.1.1 without QoS for mmrestripefs? Daniel, as you know, we can't discuss future / confidential items on a mailing list. what i presented as an outlook to future releases hasn't changed from a technical standpoint, we just can't share a release date until we announce it official. there are multiple ways today to limit the impact on restripe and other tasks, the best way to do this is to run the task ( using -N) on a node (or very small number of nodes) that has no performance critical role. while this is not perfect, it should limit the impact significantly. . sven ------------------------------------------ Sven Oehme Scalable Storage Research email: oehmes at us.ibm.com Phone: +1 (408) 824-8904 IBM Almaden Research Lab ------------------------------------------ Inactive hide details for Daniel Vogel ---07/01/2015 03:29:11 AM---Hi Years ago, IBM made some plan to do a implementation "QoSDaniel Vogel ---07/01/2015 03:29:11 AM---Hi Years ago, IBM made some plan to do a implementation "QoS for mmrestripefs, mmdeldisk...". If a " From: Daniel Vogel To: "'gpfsug-discuss at gpfsug.org'" Date: 07/01/2015 03:29 AM Subject: [gpfsug-discuss] GPFS 4.1.1 without QoS for mmrestripefs? Sent by: gpfsug-discuss-bounces at gpfsug.org Hi Years ago, IBM made some plan to do a implementation ?QoS for mmrestripefs, mmdeldisk??. If a ?mmfsrestripe? is running, very poor performance for NFS access. I opened a PMR to ask for QoS in version 4.1.1 (Spectrum Scale). PMR 61309,113,848: I discussed the question of QOS with the development team. These command changes that were noticed are not meant to be used as GA code which is why they are not documented. I cannot provide any further information from the support perspective. Anybody knows about QoS? The last hope was at ?GPFS Workshop Stuttgart M?rz 2015? with Sven Oehme as speaker. Daniel Vogel IT Consultant ABC SYSTEMS AG Hauptsitz Z?rich R?tistrasse 28 CH - 8952 Schlieren T +41 43 433 6 433 D +41 43 433 6 467 http://www.abcsystems.ch ABC - Always Better Concepts. Approved By Customers since 1981. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: From S.J.Thompson at bham.ac.uk Mon Jul 6 11:09:08 2015 From: S.J.Thompson at bham.ac.uk (Simon Thompson (Research Computing - IT Services)) Date: Mon, 6 Jul 2015 10:09:08 +0000 Subject: [gpfsug-discuss] SMB support and config Message-ID: Hi, (sorry, lots of questions about this stuff at the moment!) I?m currently looking at removing the sernet smb configs we had previously and moving to IBM SMB. I?ve removed all the old packages and only now have gpfs.smb installed on the systems. I?m struggling to get the config tools to work for our environment. We have MS Windows AD Domain for authentication. For various reasons, however doesn?t hold the UIDs/GIDs, which are instead held in a different LDAP directory. In the past, we?d configure the Linux servers running Samba so that NSLCD was configured to get details from the LDAP server. (e.g. getent passwd would return the data for an AD user). The Linux boxes would also be configured to use KRB5 authentication where users were allowed to ssh etc in for password authentication. So as far as Samba was concerned, it would do ?security = ADS? and then we?d also have "idmap config * : backend = tdb2? I.e. Use Domain for authentication, but look locally for ID mapping data. Now I can configured IBM SMB to use ADS for authentication: mmuserauth service create --type ad --data-access-method file --netbios-name its-rds --user-name ADMINUSER --servers DOMAIN.ADF --idmap-role subordinate However I can?t see anyway for me to manipulate the config so that it doesn?t use autorid. Using this we end up with: mmsmb config list | grep -i idmap idmap config * : backend autorid idmap config * : range 10000000-299999999 idmap config * : rangesize 1000000 idmap config * : read only yes idmap:cache no It also adds: mmsmb config list | grep -i auth auth methods guest sam winbind (though I don?t think that is a problem). I also can?t change the idmap using the mmsmb command (I think would look like this): # mmsmb config change --option="idmap config * : backend=tdb2" idmap config * : backend=tdb2: [E] Unsupported smb option. More information about smb options is availabe in the man page. I can?t see anything in the docs at: http://www-01.ibm.com/support/knowledgecenter/#!/STXKQY_4.1.1/com.ibm.spect rum.scale.v4r11.adm.doc/bl1adm_configfileauthentication.htm That give me a clue how to do what I want. I?d be happy to do some mixture of AD for authentication and LDAP for lookups (rather than just falling back to ?local? from nslcd), but I can?t see a way to do this, and ?manual? seems to stop ADS authentication in Samba. Anyone got any suggestions? Thanks Simon From kallbac at iu.edu Mon Jul 6 23:06:00 2015 From: kallbac at iu.edu (Kallback-Rose, Kristy A) Date: Mon, 6 Jul 2015 22:06:00 +0000 Subject: [gpfsug-discuss] SMB support and config In-Reply-To: References: Message-ID: <0193637F-4448-4068-A049-3A40A9AFD998@iu.edu> Just to chime in as another interested party, we do something fairly similar but use sssd instead of nslcd. Very interested to see how accommodating the IBM Samba is to local configuration needs. Best, Kristy On Jul 6, 2015, at 6:09 AM, Simon Thompson (Research Computing - IT Services) wrote: > Hi, > > (sorry, lots of questions about this stuff at the moment!) > > I?m currently looking at removing the sernet smb configs we had previously > and moving to IBM SMB. I?ve removed all the old packages and only now have > gpfs.smb installed on the systems. > > I?m struggling to get the config tools to work for our environment. > > We have MS Windows AD Domain for authentication. For various reasons, > however doesn?t hold the UIDs/GIDs, which are instead held in a different > LDAP directory. > > In the past, we?d configure the Linux servers running Samba so that NSLCD > was configured to get details from the LDAP server. (e.g. getent passwd > would return the data for an AD user). The Linux boxes would also be > configured to use KRB5 authentication where users were allowed to ssh etc > in for password authentication. > > So as far as Samba was concerned, it would do ?security = ADS? and then > we?d also have "idmap config * : backend = tdb2? > > I.e. Use Domain for authentication, but look locally for ID mapping data. > > Now I can configured IBM SMB to use ADS for authentication: > > mmuserauth service create --type ad --data-access-method file > --netbios-name its-rds --user-name ADMINUSER --servers DOMAIN.ADF > --idmap-role subordinate > > > However I can?t see anyway for me to manipulate the config so that it > doesn?t use autorid. Using this we end up with: > > mmsmb config list | grep -i idmap > idmap config * : backend autorid > idmap config * : range 10000000-299999999 > idmap config * : rangesize 1000000 > idmap config * : read only yes > idmap:cache no > > > It also adds: > > mmsmb config list | grep -i auth > auth methods guest sam winbind > > (though I don?t think that is a problem). > > > I also can?t change the idmap using the mmsmb command (I think would look > like this): > # mmsmb config change --option="idmap config * : backend=tdb2" > idmap config * : backend=tdb2: [E] Unsupported smb option. More > information about smb options is availabe in the man page. > > > > I can?t see anything in the docs at: > http://www-01.ibm.com/support/knowledgecenter/#!/STXKQY_4.1.1/com.ibm.spect > rum.scale.v4r11.adm.doc/bl1adm_configfileauthentication.htm > > That give me a clue how to do what I want. > > I?d be happy to do some mixture of AD for authentication and LDAP for > lookups (rather than just falling back to ?local? from nslcd), but I can?t > see a way to do this, and ?manual? seems to stop ADS authentication in > Samba. > > Anyone got any suggestions? > > > Thanks > > Simon > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at gpfsug.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss From S.J.Thompson at bham.ac.uk Tue Jul 7 12:39:24 2015 From: S.J.Thompson at bham.ac.uk (Simon Thompson (Research Computing - IT Services)) Date: Tue, 7 Jul 2015 11:39:24 +0000 Subject: [gpfsug-discuss] SMB support and config In-Reply-To: <0193637F-4448-4068-A049-3A40A9AFD998@iu.edu> References: <0193637F-4448-4068-A049-3A40A9AFD998@iu.edu> Message-ID: So based on what I?m seeing ... When you run mmstartup, the start process edits /etc/nsswitch.conf. I?ve managed to make it work in my environment, but I had to edit the file /usr/lpp/mmfs/bin/mmcesop to make it put ldap instead of winbind when it starts up. I also had to do some studious use of "net conf delparm? ? Which is probably not a good idea. I did try using: mmuserauth service create --type userdefined --data-access-method file And the setting the "security = ADS? parameters by hand with "net conf? (can?t do it with mmsmb), and a manual ?net ads join" but I couldn?t get it to authenticate clients properly. I can?t work out why just at the moment. But even then when mmshutdown runs, it still goes ahead and edits /etc/nsswitch.conf I?ve got a ticket open with IBM at the moment via our integrator to see what they say. But I?m not sure I like something going off and poking things like /etc/nsswitch.conf at startup/shutdown. I can sorta see that at config time, but when service start etc, I?m not sure I really like that idea! Simon On 06/07/2015 23:06, "Kallback-Rose, Kristy A" wrote: >Just to chime in as another interested party, we do something fairly >similar but use sssd instead of nslcd. Very interested to see how >accommodating the IBM Samba is to local configuration needs. > >Best, >Kristy > >On Jul 6, 2015, at 6:09 AM, Simon Thompson (Research Computing - IT >Services) wrote: > >> Hi, >> >> (sorry, lots of questions about this stuff at the moment!) >> >> I?m currently looking at removing the sernet smb configs we had >>previously >> and moving to IBM SMB. I?ve removed all the old packages and only now >>have >> gpfs.smb installed on the systems. >> >> I?m struggling to get the config tools to work for our environment. >> >> We have MS Windows AD Domain for authentication. For various reasons, >> however doesn?t hold the UIDs/GIDs, which are instead held in a >>different >> LDAP directory. >> >> In the past, we?d configure the Linux servers running Samba so that >>NSLCD >> was configured to get details from the LDAP server. (e.g. getent passwd >> would return the data for an AD user). The Linux boxes would also be >> configured to use KRB5 authentication where users were allowed to ssh >>etc >> in for password authentication. >> >> So as far as Samba was concerned, it would do ?security = ADS? and then >> we?d also have "idmap config * : backend = tdb2? >> >> I.e. Use Domain for authentication, but look locally for ID mapping >>data. >> >> Now I can configured IBM SMB to use ADS for authentication: >> >> mmuserauth service create --type ad --data-access-method file >> --netbios-name its-rds --user-name ADMINUSER --servers DOMAIN.ADF >> --idmap-role subordinate >> >> >> However I can?t see anyway for me to manipulate the config so that it >> doesn?t use autorid. Using this we end up with: >> >> mmsmb config list | grep -i idmap >> idmap config * : backend autorid >> idmap config * : range 10000000-299999999 >> idmap config * : rangesize 1000000 >> idmap config * : read only yes >> idmap:cache no >> >> >> It also adds: >> >> mmsmb config list | grep -i auth >> auth methods guest sam winbind >> >> (though I don?t think that is a problem). >> >> >> I also can?t change the idmap using the mmsmb command (I think would >>look >> like this): >> # mmsmb config change --option="idmap config * : backend=tdb2" >> idmap config * : backend=tdb2: [E] Unsupported smb option. More >> information about smb options is availabe in the man page. >> >> >> >> I can?t see anything in the docs at: >> >>http://www-01.ibm.com/support/knowledgecenter/#!/STXKQY_4.1.1/com.ibm.spe >>ct >> rum.scale.v4r11.adm.doc/bl1adm_configfileauthentication.htm >> >> That give me a clue how to do what I want. >> >> I?d be happy to do some mixture of AD for authentication and LDAP for >> lookups (rather than just falling back to ?local? from nslcd), but I >>can?t >> see a way to do this, and ?manual? seems to stop ADS authentication in >> Samba. >> >> Anyone got any suggestions? >> >> >> Thanks >> >> Simon >> >> >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at gpfsug.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss > >_______________________________________________ >gpfsug-discuss mailing list >gpfsug-discuss at gpfsug.org >http://gpfsug.org/mailman/listinfo/gpfsug-discuss From TROPPENS at de.ibm.com Thu Jul 9 07:55:24 2015 From: TROPPENS at de.ibm.com (Ulf Troppens) Date: Thu, 9 Jul 2015 08:55:24 +0200 Subject: [gpfsug-discuss] ISC 2015 Message-ID: Anybody at ISC 2015 in Frankfurt next week? I am happy to share my experience with supporting four ESP (a.k.a beta) customers of the new protocol feature. You can find me at the IBM booth (Booth 928). -- IBM Spectrum Scale Development - Client Engagements & Solutions Delivery Consulting IT Specialist Author "Storage Networks Explained" IBM Deutschland Research & Development GmbH Vorsitzende des Aufsichtsrats: Martina Koederitz Gesch?ftsf?hrung: Dirk Wittkopp Sitz der Gesellschaft: B?blingen / Registergericht: Amtsgericht Stuttgart, HRB 243294 From daniel.kidger at uk.ibm.com Thu Jul 9 09:12:51 2015 From: daniel.kidger at uk.ibm.com (Daniel Kidger) Date: Thu, 9 Jul 2015 09:12:51 +0100 Subject: [gpfsug-discuss] ISC 2015 In-Reply-To: Message-ID: <1970894201.4637011436429559512.JavaMail.notes@d06wgw86.portsmouth.uk.ibm.com> Ulf, I am certainly interested. You can find me on the IBM booth too :-) Looking forward to meeting you. Daniel Sent from IBM Verse Ulf Troppens --- [gpfsug-discuss] ISC 2015 --- From:"Ulf Troppens" To:"gpfsug main discussion list" Date:Thu, 9 Jul 2015 08:55Subject:[gpfsug-discuss] ISC 2015 Anybody at ISC 2015 in Frankfurt next week? I am happy to share my experience with supporting four ESP (a.k.a beta) customers of the new protocol feature. You can find me at the IBM booth (Booth 928). -- IBM Spectrum Scale Development - Client Engagements & Solutions Delivery Consulting IT Specialist Author "Storage Networks Explained" IBM Deutschland Research & Development GmbH Vorsitzende des Aufsichtsrats: Martina Koederitz Gesch?ftsf?hrung: Dirk Wittkopp Sitz der Gesellschaft: B?blingen / Registergericht: Amtsgericht Stuttgart, HRB 243294 _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From ewahl at osc.edu Thu Jul 9 15:56:42 2015 From: ewahl at osc.edu (Wahl, Edward) Date: Thu, 9 Jul 2015 14:56:42 +0000 Subject: [gpfsug-discuss] 4.1.1 protocol support In-Reply-To: References: , Message-ID: <9DA9EC7A281AC7428A9618AFDC49049955A5DCC4@CIO-KRC-D1MBX02.osuad.osu.edu> Please please please please PLEASE tell me that support for RHEL 6 is in the plan for protocol nodes. Forcing us to 7 seems rather VERY premature. been out sick a week so I just saw this, FYI. I'd sell my co-workers to test out protocol nodes, but frankly NOT on RHEL 7. Definitely NOT an HPC ready release. ugh. Ed Wahl OSC ________________________________ From: gpfsug-discuss-bounces at gpfsug.org [gpfsug-discuss-bounces at gpfsug.org] on behalf of Bob Oesterlin [oester at gmail.com] Sent: Thursday, July 02, 2015 12:02 PM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] 4.1.1 protocol support Hi Simon I was part of a beta program for GPFS (ok, better start saying Spectrum Scale!) 4.1.1, so I've had some experience with the toolkit that installs the protocol nodes. The new protocol nodes MUST be RH7, so it's going to be a bit more of an involved process to migrate to this level than in the past. The GPFS server nodes/client nodes can remain at RH6 is needed. Overall it works pretty well. You do have the option of doing things manually as well. The guide that describes it is pretty good. If you want to discuss the process in detail, I'd be happy to do so - a bit too much to cover over a mailing list. Bob Oesterlin Sr Storage Engineer, Nuance Communications robert.oesterlin at nuance.com On Thu, Jul 2, 2015 at 10:43 AM, Simon Thompson (Research Computing - IT Services) > wrote: Hi, Just wondering if anyone has looked at the new protocol support stuff in 4.1.1 yet? >From what I can see, it wants to use the installer to add things like IBM Samba onto nodes in the cluster. The docs online seem to list manual installation as running the chef template, which is hardly manual... 1. Id like to know what is being run on my cluster 2. Its an existing install which was using sernet samba, so I don't want to go out and break anything inadvertently 3. My protocol nodes are in a multicluster, and I understand the installer doesn't support multicluster. (the docs state that multicluster isn't supported but something like its expected to work). So... Has anyone had a go at this yet and have a set of steps? I've started unpicking the chef recipe, but just wondering if anyone had already had a go at this? (and lets not start on the mildy bemusing error when you "enable" the service with "mmces service enable" (ces service not enabled) - there's other stuff to enable it)... Simon _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From sdinardo at ebi.ac.uk Fri Jul 10 11:07:28 2015 From: sdinardo at ebi.ac.uk (Salvatore Di Nardo) Date: Fri, 10 Jul 2015 11:07:28 +0100 Subject: [gpfsug-discuss] data interface and management infercace. Message-ID: <559F9960.7010509@ebi.ac.uk> Hello guys. Quite a while ago i mentioned that we have a big expel issue on our gss ( first gen) and white a lot people suggested that the root cause could be that we use the same interface for all the traffic, and that we should split the data network from the admin network. Finally we could plan a downtime and we are migrating the data out so, i can soon safelly play with the change, but looking what exactly i should to do i'm a bit puzzled. Our mmlscluster looks like this: GPFS cluster information ======================== GPFS cluster name: GSS.ebi.ac.uk GPFS cluster id: 17987981184946329605 GPFS UID domain: GSS.ebi.ac.uk Remote shell command: /usr/bin/ssh Remote file copy command: /usr/bin/scp GPFS cluster configuration servers: ----------------------------------- Primary server: gss01a.ebi.ac.uk Secondary server: gss02b.ebi.ac.uk Node Daemon node name IP address Admin node name Designation ----------------------------------------------------------------------- 1 gss01a.ebi.ac.uk 10.7.28.2 gss01a.ebi.ac.uk quorum-manager 2 gss01b.ebi.ac.uk 10.7.28.3 gss01b.ebi.ac.uk quorum-manager 3 gss02a.ebi.ac.uk 10.7.28.67 gss02a.ebi.ac.uk quorum-manager 4 gss02b.ebi.ac.uk 10.7.28.66 gss02b.ebi.ac.uk quorum-manager 5 gss03a.ebi.ac.uk 10.7.28.34 gss03a.ebi.ac.uk quorum-manager 6 gss03b.ebi.ac.uk 10.7.28.35 gss03b.ebi.ac.uk quorum-manager It was my understanding that the "admin node" should use a different interface ( a 1g link copper should be fine), while the daemon node is where the data was passing , so should point to the bonded 10g interfaces. but when i read the mmchnode man page i start to be quite confused. It says: --daemon-interface={hostname | ip_address} Specifies the host name or IP address _*to be used by the GPFS daemons for node-to-node communication*_. The host name or IP address must refer to the communication adapter over which the GPFS daemons communicate. Alias interfaces are not allowed. Use the original address or a name that is resolved by the host command to that original address. --admin-interface={hostname | ip_address} Specifies the name of the node to be used by GPFS administration commands when communicating between nodes. The admin node name must be specified as an IP address or a hostname that is resolved by the host command tothe desired IP address. If the keyword DEFAULT is specified, the admin interface for the node is set to be equal to the daemon interface for the node. What exactly means "node-to node-communications" ? Means DATA or also the "lease renew", and the token communication between the clients to get/steal the locks to be able to manage concurrent write to thr same file? Since we are getting expells ( especially when several clients contends the same file ) i assumed i have to split this type of packages from the data stream, but reading the documentation it looks to me that those internal comunication between nodes use the daemon-interface wich i suppose are used also for the data. so HOW exactly i can split them? _**_ Thanks in advance, Salvatore -------------- next part -------------- An HTML attachment was scrubbed... URL: From secretary at gpfsug.org Fri Jul 10 12:33:48 2015 From: secretary at gpfsug.org (Secretary GPFS UG) Date: Fri, 10 Jul 2015 12:33:48 +0100 Subject: [gpfsug-discuss] Places available: Meet the Devs Message-ID: Dear All, There are a couple of places remaining at the next 'Meet the Devs' event on Wednesday 29th July, 11am-3pm. The event is being held at IBM Warwick. The agenda promises to be hands on and give you the opportunity to speak face to face with the developers of GPFS. Guideline agenda: * Data analytic workloads - development to show and tell UK work on establishing use cases and tighter integration of Spark on top of GPFS * Show the GUI coming in 4.2 * Discuss 4.2 and beyond roadmap * How would you like IP management to work for protocol access? * Optional - Team can demo & discuss NFS/SMB/Object integration into Scale Lunch and refreshments will be provided. Please can you let me know by email if you are interested in attending and I'll register your place. Thanks and we hope to see you there! -- Claire O'Toole (n?e Robson) GPFS User Group Secretary +44 (0)7508 033896 www.gpfsug.org From S.J.Thompson at bham.ac.uk Fri Jul 10 12:59:19 2015 From: S.J.Thompson at bham.ac.uk (Simon Thompson (Research Computing - IT Services)) Date: Fri, 10 Jul 2015 11:59:19 +0000 Subject: [gpfsug-discuss] 4.1.1 protocol support In-Reply-To: <9DA9EC7A281AC7428A9618AFDC49049955A5DCC4@CIO-KRC-D1MBX02.osuad.osu.edu> References: <9DA9EC7A281AC7428A9618AFDC49049955A5DCC4@CIO-KRC-D1MBX02.osuad.osu.edu> Message-ID: Hi Ed, Well, technically: http://www-01.ibm.com/support/knowledgecenter/#!/STXKQY_4.1.1/com.ibm.spectrum.scale.v4r11.ins.doc/bl1ins_protocolsprerequisites.htm Says "The spectrumscale installation toolkit supports Red Hat Enterprise Linux 7.0 and 7.1 platforms on x86_64 and ppc64 architectures" So maybe if you don?t want to use the installer, you don't need RHEL 7. Of course where or not that is supported, only IBM would be able to say ? I?ve only looked at gpfs.smb, but as its provided as a binary RPM, it might or might not work in a 6 environment (it bundles ctdb etc all in). For object, as its a bundle of openstack RPMs, then potentially it won?t work on EL6 depending on the python requirements? And surely you aren?t running protocol support on HPC nodes anyway ... so maybe a few EL7 nodes could work for you? Simon From: , Edward > Reply-To: gpfsug main discussion list > Date: Thursday, 9 July 2015 15:56 To: gpfsug main discussion list > Subject: Re: [gpfsug-discuss] 4.1.1 protocol support Please please please please PLEASE tell me that support for RHEL 6 is in the plan for protocol nodes. Forcing us to 7 seems rather VERY premature. been out sick a week so I just saw this, FYI. I'd sell my co-workers to test out protocol nodes, but frankly NOT on RHEL 7. Definitely NOT an HPC ready release. ugh. Ed Wahl OSC ________________________________ From: gpfsug-discuss-bounces at gpfsug.org [gpfsug-discuss-bounces at gpfsug.org] on behalf of Bob Oesterlin [oester at gmail.com] Sent: Thursday, July 02, 2015 12:02 PM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] 4.1.1 protocol support Hi Simon I was part of a beta program for GPFS (ok, better start saying Spectrum Scale!) 4.1.1, so I've had some experience with the toolkit that installs the protocol nodes. The new protocol nodes MUST be RH7, so it's going to be a bit more of an involved process to migrate to this level than in the past. The GPFS server nodes/client nodes can remain at RH6 is needed. Overall it works pretty well. You do have the option of doing things manually as well. The guide that describes it is pretty good. If you want to discuss the process in detail, I'd be happy to do so - a bit too much to cover over a mailing list. Bob Oesterlin Sr Storage Engineer, Nuance Communications robert.oesterlin at nuance.com On Thu, Jul 2, 2015 at 10:43 AM, Simon Thompson (Research Computing - IT Services) > wrote: Hi, Just wondering if anyone has looked at the new protocol support stuff in 4.1.1 yet? >From what I can see, it wants to use the installer to add things like IBM Samba onto nodes in the cluster. The docs online seem to list manual installation as running the chef template, which is hardly manual... 1. Id like to know what is being run on my cluster 2. Its an existing install which was using sernet samba, so I don't want to go out and break anything inadvertently 3. My protocol nodes are in a multicluster, and I understand the installer doesn't support multicluster. (the docs state that multicluster isn't supported but something like its expected to work). So... Has anyone had a go at this yet and have a set of steps? I've started unpicking the chef recipe, but just wondering if anyone had already had a go at this? (and lets not start on the mildy bemusing error when you "enable" the service with "mmces service enable" (ces service not enabled) - there's other stuff to enable it)... Simon _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From S.J.Thompson at bham.ac.uk Fri Jul 10 13:06:01 2015 From: S.J.Thompson at bham.ac.uk (Simon Thompson (Research Computing - IT Services)) Date: Fri, 10 Jul 2015 12:06:01 +0000 Subject: [gpfsug-discuss] SMB support and config In-Reply-To: <0193637F-4448-4068-A049-3A40A9AFD998@iu.edu> References: <0193637F-4448-4068-A049-3A40A9AFD998@iu.edu> Message-ID: So IBM came back and said what I was doing wasn?t supported. They did say that you can use ?user defined? authentication. Which I?ve got working now on my environment (figured what I was doing wrong, and you can?t use mmsmb to do some of the bits I need for it to work for user defined mode for me...). But I still think it needs a patch to one of the files for CES for use in user defined authentication. (Right now it appears to remove all my ?user defined? settings from nsswitch.conf when you stop CES/GPFS on a node). I?ve supplied my patch to IBM which works for my case, we?ll see what they do about it? (If people are interested, I?ll gather my notes into a blog post). Simon On 06/07/2015 23:06, "Kallback-Rose, Kristy A" wrote: >Just to chime in as another interested party, we do something fairly >similar but use sssd instead of nslcd. Very interested to see how >accommodating the IBM Samba is to local configuration needs. > >Best, >Kristy > >On Jul 6, 2015, at 6:09 AM, Simon Thompson (Research Computing - IT >Services) wrote: > >> Hi, >> >> (sorry, lots of questions about this stuff at the moment!) >> >> I?m currently looking at removing the sernet smb configs we had >>previously >> and moving to IBM SMB. I?ve removed all the old packages and only now >>have >> gpfs.smb installed on the systems. >> >> I?m struggling to get the config tools to work for our environment. >> >> We have MS Windows AD Domain for authentication. For various reasons, >> however doesn?t hold the UIDs/GIDs, which are instead held in a >>different >> LDAP directory. >> >> In the past, we?d configure the Linux servers running Samba so that >>NSLCD >> was configured to get details from the LDAP server. (e.g. getent passwd >> would return the data for an AD user). The Linux boxes would also be >> configured to use KRB5 authentication where users were allowed to ssh >>etc >> in for password authentication. >> >> So as far as Samba was concerned, it would do ?security = ADS? and then >> we?d also have "idmap config * : backend = tdb2? >> >> I.e. Use Domain for authentication, but look locally for ID mapping >>data. >> >> Now I can configured IBM SMB to use ADS for authentication: >> >> mmuserauth service create --type ad --data-access-method file >> --netbios-name its-rds --user-name ADMINUSER --servers DOMAIN.ADF >> --idmap-role subordinate >> >> >> However I can?t see anyway for me to manipulate the config so that it >> doesn?t use autorid. Using this we end up with: >> >> mmsmb config list | grep -i idmap >> idmap config * : backend autorid >> idmap config * : range 10000000-299999999 >> idmap config * : rangesize 1000000 >> idmap config * : read only yes >> idmap:cache no >> >> >> It also adds: >> >> mmsmb config list | grep -i auth >> auth methods guest sam winbind >> >> (though I don?t think that is a problem). >> >> >> I also can?t change the idmap using the mmsmb command (I think would >>look >> like this): >> # mmsmb config change --option="idmap config * : backend=tdb2" >> idmap config * : backend=tdb2: [E] Unsupported smb option. More >> information about smb options is availabe in the man page. >> >> >> >> I can?t see anything in the docs at: >> >>http://www-01.ibm.com/support/knowledgecenter/#!/STXKQY_4.1.1/com.ibm.spe >>ct >> rum.scale.v4r11.adm.doc/bl1adm_configfileauthentication.htm >> >> That give me a clue how to do what I want. >> >> I?d be happy to do some mixture of AD for authentication and LDAP for >> lookups (rather than just falling back to ?local? from nslcd), but I >>can?t >> see a way to do this, and ?manual? seems to stop ADS authentication in >> Samba. >> >> Anyone got any suggestions? >> >> >> Thanks >> >> Simon >> >> >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at gpfsug.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss > >_______________________________________________ >gpfsug-discuss mailing list >gpfsug-discuss at gpfsug.org >http://gpfsug.org/mailman/listinfo/gpfsug-discuss From Daniel.Vogel at abcsystems.ch Fri Jul 10 15:19:11 2015 From: Daniel.Vogel at abcsystems.ch (Daniel Vogel) Date: Fri, 10 Jul 2015 14:19:11 +0000 Subject: [gpfsug-discuss] GPFS 4.1.1 without QoS for mmrestripefs? In-Reply-To: <201507032249.t63Mnffp025995@d03av03.boulder.ibm.com> References: <2CDF270206A255459AC4FA6B08E52AF90114634DD0@ABCSYSEXC1.abcsystems.ch><201507011422.t61EMZmw011626@d01av01.pok.ibm.com> <2CDF270206A255459AC4FA6B08E52AF901146351BD@ABCSYSEXC1.abcsystems.ch> <201507032249.t63Mnffp025995@d03av03.boulder.ibm.com> Message-ID: <2CDF270206A255459AC4FA6B08E52AF90114635E8E@ABCSYSEXC1.abcsystems.ch> For ?1? we use the quorum node to do ?start disk? or ?restripe file system? (quorum node without disks). For ?2? we use kernel NFS with cNFS I used the command ?cnfsNFSDprocs 64? to set the NFS threads. Is this correct? gpfs01:~ # cat /proc/fs/nfsd/threads 64 I will verify the settings in our lab, will use the following configuration: mmchconfig worker1Threads=128 mmchconfig prefetchThreads=128 mmchconfig nsdMaxWorkerThreads=128 mmchconfig cnfsNFSDprocs=256 daniel Von: gpfsug-discuss-bounces at gpfsug.org [mailto:gpfsug-discuss-bounces at gpfsug.org] Im Auftrag von Sven Oehme Gesendet: Samstag, 4. Juli 2015 00:49 An: gpfsug main discussion list Betreff: Re: [gpfsug-discuss] GPFS 4.1.1 without QoS for mmrestripefs? this triggers a few questions 1. have you tried running it only on a node that doesn't serve NFS data ? 2. what NFS stack are you using ? is this the kernel NFS Server as part of linux means you use cNFS ? if the answer to 2 is yes, have you adjusted the nfsd threads in /etc/sysconfig/nfs ? the default is only 8 and if you run with the default you have a very low number of threads from the outside competing with a larger number of threads doing restripe, increasing the nfsd threads could help. you could also reduce the number of internal restripe threads to try out if that helps mitigating the impact. to try an extreme low value set the following : mmchconfig pitWorkerThreadsPerNode=1 -i and retry the restripe again, to reset it back to default run mmchconfig pitWorkerThreadsPerNode=DEFAULT -i sven ------------------------------------------ Sven Oehme Scalable Storage Research email: oehmes at us.ibm.com Phone: +1 (408) 824-8904 IBM Almaden Research Lab ------------------------------------------ [Beschreibung: Inactive hide details for Daniel Vogel ---07/02/2015 12:12:46 AM---Sven, Yes I agree, but ?using ?N? to reduce the load help]Daniel Vogel ---07/02/2015 12:12:46 AM---Sven, Yes I agree, but ?using ?N? to reduce the load helps not really. If I use NFS, for example, as From: Daniel Vogel > To: "'gpfsug main discussion list'" > Date: 07/02/2015 12:12 AM Subject: Re: [gpfsug-discuss] GPFS 4.1.1 without QoS for mmrestripefs? Sent by: gpfsug-discuss-bounces at gpfsug.org ________________________________ Sven, Yes I agree, but ?using ?N? to reduce the load helps not really. If I use NFS, for example, as a ESX data store, ESX I/O latency for NFS goes very high, the VM?s hangs. By the way I use SSD PCIe cards, perfect ?mirror speed? but slow I/O on NFS. The GPFS cluster concept I use are different than GSS or traditional FC (shared storage). I use shared nothing with IB (no FPO), many GPFS nodes with NSD?s. I know the need to resync the FS with mmchdisk / mmrestripe will happen more often. The only one feature will help is QoS for the GPFS admin jobs. I hope we are not fare away from this. Thanks, Daniel Von: gpfsug-discuss-bounces at gpfsug.org [mailto:gpfsug-discuss-bounces at gpfsug.org] Im Auftrag von Sven Oehme Gesendet: Mittwoch, 1. Juli 2015 16:21 An: gpfsug main discussion list Betreff: Re: [gpfsug-discuss] GPFS 4.1.1 without QoS for mmrestripefs? Daniel, as you know, we can't discuss future / confidential items on a mailing list. what i presented as an outlook to future releases hasn't changed from a technical standpoint, we just can't share a release date until we announce it official. there are multiple ways today to limit the impact on restripe and other tasks, the best way to do this is to run the task ( using -N) on a node (or very small number of nodes) that has no performance critical role. while this is not perfect, it should limit the impact significantly. . sven ------------------------------------------ Sven Oehme Scalable Storage Research email: oehmes at us.ibm.com Phone: +1 (408) 824-8904 IBM Almaden Research Lab ------------------------------------------ [Beschreibung: Inactive hide details for Daniel Vogel ---07/01/2015 03:29:11 AM---Hi Years ago, IBM made some plan to do a implementation "QoS]Daniel Vogel ---07/01/2015 03:29:11 AM---Hi Years ago, IBM made some plan to do a implementation "QoS for mmrestripefs, mmdeldisk...". If a " From: Daniel Vogel > To: "'gpfsug-discuss at gpfsug.org'" > Date: 07/01/2015 03:29 AM Subject: [gpfsug-discuss] GPFS 4.1.1 without QoS for mmrestripefs? Sent by: gpfsug-discuss-bounces at gpfsug.org ________________________________ Hi Years ago, IBM made some plan to do a implementation ?QoS for mmrestripefs, mmdeldisk??. If a ?mmfsrestripe? is running, very poor performance for NFS access. I opened a PMR to ask for QoS in version 4.1.1 (Spectrum Scale). PMR 61309,113,848: I discussed the question of QOS with the development team. These command changes that were noticed are not meant to be used as GA code which is why they are not documented. I cannot provide any further information from the support perspective. Anybody knows about QoS? The last hope was at ?GPFS Workshop Stuttgart M?rz 2015? with Sven Oehme as speaker. Daniel Vogel IT Consultant ABC SYSTEMS AG Hauptsitz Z?rich R?tistrasse 28 CH - 8952 Schlieren T +41 43 433 6 433 D +41 43 433 6 467 http://www.abcsystems.ch ABC - Always Better Concepts. Approved By Customers since 1981. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image001.gif Type: image/gif Size: 105 bytes Desc: image001.gif URL: From Kevin.Buterbaugh at Vanderbilt.Edu Fri Jul 10 15:56:04 2015 From: Kevin.Buterbaugh at Vanderbilt.Edu (Buterbaugh, Kevin L) Date: Fri, 10 Jul 2015 14:56:04 +0000 Subject: [gpfsug-discuss] Fwd: GPFS 4.1, NFSv4, and authenticating against AD References: <69C83493-2E22-4B11-BF15-A276DA6D4901@vanderbilt.edu> Message-ID: <55426129-67A0-4071-91F4-715BAC1F0DBE@vanderbilt.edu> Begin forwarded message: From: buterbkl > Subject: GPFS 4.1, NFSv4, and authenticating against AD Date: July 10, 2015 at 9:52:38 AM CDT To: gpfs-general at sdsc.edu Hi All, We are under the (hopefully not mistaken) impression that with GPFS 4.1 supporting NFSv4 it should be possible to have a CNFS setup authenticate against Active Directory as long as you use NFSv4. I also thought that I had seen somewhere (possibly one of the two GPFS related mailing lists I?m on, or in a DeveloperWorks article, or ???) that IBM has published documentation on how to set this up (a kind of cookbook). I?ve done a fair amount of Googling looking for such a document, but I seem to be uniquely talented in not being able to find things with Google! :-( Does anyone know of such a document and could send me the link to it? It would be very helpful to us as I?ve got essentially zero experience with Kerberos (which I think is required to talk to AD) and the institutions? AD environment is managed by a separate department. Thanks in advance? Kevin ? Kevin Buterbaugh - Senior System Administrator Vanderbilt University - Advanced Computing Center for Research and Education Kevin.Buterbaugh at vanderbilt.edu - (615)875-9633 -------------- next part -------------- An HTML attachment was scrubbed... URL: From sdinardo at ebi.ac.uk Mon Jul 13 13:31:18 2015 From: sdinardo at ebi.ac.uk (Salvatore Di Nardo) Date: Mon, 13 Jul 2015 13:31:18 +0100 Subject: [gpfsug-discuss] data interface and management infercace. In-Reply-To: <559F9960.7010509@ebi.ac.uk> References: <559F9960.7010509@ebi.ac.uk> Message-ID: <55A3AF96.3060303@ebi.ac.uk> Anyone? On 10/07/15 11:07, Salvatore Di Nardo wrote: > Hello guys. > Quite a while ago i mentioned that we have a big expel issue on our > gss ( first gen) and white a lot people suggested that the root cause > could be that we use the same interface for all the traffic, and that > we should split the data network from the admin network. Finally we > could plan a downtime and we are migrating the data out so, i can soon > safelly play with the change, but looking what exactly i should to do > i'm a bit puzzled. Our mmlscluster looks like this: > > GPFS cluster information > ======================== > GPFS cluster name: GSS.ebi.ac.uk > GPFS cluster id: 17987981184946329605 > GPFS UID domain: GSS.ebi.ac.uk > Remote shell command: /usr/bin/ssh > Remote file copy command: /usr/bin/scp > > GPFS cluster configuration servers: > ----------------------------------- > Primary server: gss01a.ebi.ac.uk > Secondary server: gss02b.ebi.ac.uk > > Node Daemon node name IP address Admin node name > Designation > ----------------------------------------------------------------------- > 1 gss01a.ebi.ac.uk 10.7.28.2 gss01a.ebi.ac.uk > quorum-manager > 2 gss01b.ebi.ac.uk 10.7.28.3 gss01b.ebi.ac.uk > quorum-manager > 3 gss02a.ebi.ac.uk 10.7.28.67 gss02a.ebi.ac.uk > quorum-manager > 4 gss02b.ebi.ac.uk 10.7.28.66 gss02b.ebi.ac.uk > quorum-manager > 5 gss03a.ebi.ac.uk 10.7.28.34 gss03a.ebi.ac.uk > quorum-manager > 6 gss03b.ebi.ac.uk 10.7.28.35 gss03b.ebi.ac.uk > quorum-manager > > > It was my understanding that the "admin node" should use a different > interface ( a 1g link copper should be fine), while the daemon node is > where the data was passing , so should point to the bonded 10g > interfaces. but when i read the mmchnode man page i start to be quite > confused. It says: > > --daemon-interface={hostname | ip_address} > Specifies the host name or IP address _*to > be used by the GPFS daemons for node-to-node communication*_. The > host name or IP address must refer to the communication adapter over > which the GPFS daemons communicate. > Alias interfaces are not allowed. Use the > original address or a name that is resolved by the host command to > that original address. > > --admin-interface={hostname | ip_address} > Specifies the name of the node to be used by > GPFS administration commands when communicating between nodes. The > admin node name must be specified as an IP address or a hostname that > is resolved by the host command > tothe desired IP address. If the keyword > DEFAULT is specified, the admin interface for the node is set to be > equal to the daemon interface for the node. > > What exactly means "node-to node-communications" ? > Means DATA or also the "lease renew", and the token communication > between the clients to get/steal the locks to be able to manage > concurrent write to thr same file? > Since we are getting expells ( especially when several clients > contends the same file ) i assumed i have to split this type of > packages from the data stream, but reading the documentation it looks > to me that those internal comunication between nodes use the > daemon-interface wich i suppose are used also for the data. so HOW > exactly i can split them? > > > Thanks in advance, > Salvatore > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at gpfsug.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From sdinardo at ebi.ac.uk Mon Jul 13 14:29:50 2015 From: sdinardo at ebi.ac.uk (Salvatore Di Nardo) Date: Mon, 13 Jul 2015 14:29:50 +0100 Subject: [gpfsug-discuss] data interface and management infercace. In-Reply-To: <5269A4E9-416B-4D70-AAE0-B86042FC96B9@ddn.com> References: <559F9960.7010509@ebi.ac.uk> <55A3AF96.3060303@ebi.ac.uk> <5269A4E9-416B-4D70-AAE0-B86042FC96B9@ddn.com> Message-ID: <55A3BD4E.3000205@ebi.ac.uk> Hello Vic. We are currently draining our gpfs to do all the recabling to add a management network, but looking what the admin interface does ( man mmchnode ) it says something different: --admin-interface={hostname | ip_address} Specifies the name of the node to be used by GPFS administration commands when communicating between nodes. The admin node name must be specified as an IP address or a hostname that is resolved by the host command to the desired IP address. If the keyword DEFAULT is specified, the admin interface for the node is set to be equal to the daemon interface for the node. So, seems used only for commands propagation, hence have nothing to do with the node-to-node traffic. Infact the other interface description is: --daemon-interface={hostname | ip_address} Specifies the host name or IP address _*to be used by the GPFS daemons for node-to-node communication*_. The host name or IP address must refer to the commu- nication adapter over which the GPFS daemons communicate. Alias interfaces are not allowed. Use the original address or a name that is resolved by the host command to that original address. The "expired lease" issue and file locking mechanism a( most of our expells happens when 2 clients try to write in the same file) are exactly node-to node-comunication, so im wondering what's the point to separate the "admin network". I want to be sure to plan the right changes before we do a so massive task. We are talking about adding a new interface on 700 clients, so the recabling work its not small. Regards, Salvatore On 13/07/15 14:00, Vic Cornell wrote: > Hi Salavatore, > > Does your GSS have the facility for a 1GbE ?management? network? If so > I think that changing the ?admin? node names of the cluster members to > a set of IPs on the management network would give you the split that > you need. > > What about the clients? Can they also connect to a separate admin network? > > Remember that if you are using multi-cluster all of the nodes in both > networks must share the same admin network. > > Kind Regards, > > Vic > > >> On 13 Jul 2015, at 13:31, Salvatore Di Nardo > > wrote: >> >> Anyone? >> >> On 10/07/15 11:07, Salvatore Di Nardo wrote: >>> Hello guys. >>> Quite a while ago i mentioned that we have a big expel issue on our >>> gss ( first gen) and white a lot people suggested that the root >>> cause could be that we use the same interface for all the traffic, >>> and that we should split the data network from the admin network. >>> Finally we could plan a downtime and we are migrating the data out >>> so, i can soon safelly play with the change, but looking what >>> exactly i should to do i'm a bit puzzled. Our mmlscluster looks like >>> this: >>> >>> GPFS cluster information >>> ======================== >>> GPFS cluster name: GSS.ebi.ac.uk >>> GPFS cluster id: 17987981184946329605 >>> GPFS UID domain: GSS.ebi.ac.uk >>> Remote shell command: /usr/bin/ssh >>> Remote file copy command: /usr/bin/scp >>> >>> GPFS cluster configuration servers: >>> ----------------------------------- >>> Primary server: gss01a.ebi.ac.uk >>> Secondary server: gss02b.ebi.ac.uk >>> >>> Node Daemon node name IP address Admin node >>> name Designation >>> ----------------------------------------------------------------------- >>> 1 gss01a.ebi.ac.uk >>> 10.7.28.2 gss01a.ebi.ac.uk >>> quorum-manager >>> 2 gss01b.ebi.ac.uk >>> 10.7.28.3 gss01b.ebi.ac.uk >>> quorum-manager >>> 3 gss02a.ebi.ac.uk >>> 10.7.28.67 gss02a.ebi.ac.uk >>> quorum-manager >>> 4 gss02b.ebi.ac.uk >>> 10.7.28.66 gss02b.ebi.ac.uk >>> quorum-manager >>> 5 gss03a.ebi.ac.uk >>> 10.7.28.34 gss03a.ebi.ac.uk >>> quorum-manager >>> 6 gss03b.ebi.ac.uk >>> 10.7.28.35 gss03b.ebi.ac.uk >>> quorum-manager >>> >>> >>> It was my understanding that the "admin node" should use a different >>> interface ( a 1g link copper should be fine), while the daemon node >>> is where the data was passing , so should point to the bonded 10g >>> interfaces. but when i read the mmchnode man page i start to be >>> quite confused. It says: >>> >>> --daemon-interface={hostname | ip_address} >>> Specifies the host name or IP address >>> _*to be used by the GPFS daemons for node-to-node communication*_. >>> The host name or IP address must refer to the communication adapter >>> over which the GPFS daemons communicate. >>> Alias interfaces are not allowed. Use the >>> original address or a name that is resolved by the host command to >>> that original address. >>> >>> --admin-interface={hostname | ip_address} >>> Specifies the name of the node to be used >>> by GPFS administration commands when communicating between nodes. >>> The admin node name must be specified as an IP address or a hostname >>> that is resolved by the host command >>> tothe desired IP address. If the keyword >>> DEFAULT is specified, the admin interface for the node is set to be >>> equal to the daemon interface for the node. >>> >>> What exactly means "node-to node-communications" ? >>> Means DATA or also the "lease renew", and the token communication >>> between the clients to get/steal the locks to be able to manage >>> concurrent write to thr same file? >>> Since we are getting expells ( especially when several clients >>> contends the same file ) i assumed i have to split this type of >>> packages from the data stream, but reading the documentation it >>> looks to me that those internal comunication between nodes use the >>> daemon-interface wich i suppose are used also for the data. so HOW >>> exactly i can split them? >>> >>> >>> Thanks in advance, >>> Salvatore >>> >>> >>> >>> _______________________________________________ >>> gpfsug-discuss mailing list >>> gpfsug-discuss atgpfsug.org >>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >> >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at gpfsug.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: From viccornell at gmail.com Mon Jul 13 15:25:32 2015 From: viccornell at gmail.com (Vic Cornell) Date: Mon, 13 Jul 2015 15:25:32 +0100 Subject: [gpfsug-discuss] data interface and management infercace. In-Reply-To: <55A3BD4E.3000205@ebi.ac.uk> References: <559F9960.7010509@ebi.ac.uk> <55A3AF96.3060303@ebi.ac.uk> <5269A4E9-416B-4D70-AAE0-B86042FC96B9@ddn.com> <55A3BD4E.3000205@ebi.ac.uk> Message-ID: Hi Salvatore, I agree that that is what the manual - and some of the wiki entries say. However , when we have had problems (typically congestion) with ethernet networks in the past (20GbE or 40GbE) we have resolved them by setting up a separate ?Admin? network. The before and after cluster health we have seen measured in number of expels and waiters has been very marked. Maybe someone ?in the know? could comment on this split. Regards, Vic > On 13 Jul 2015, at 14:29, Salvatore Di Nardo wrote: > > Hello Vic. > We are currently draining our gpfs to do all the recabling to add a management network, but looking what the admin interface does ( man mmchnode ) it says something different: > > --admin-interface={hostname | ip_address} > Specifies the name of the node to be used by GPFS administration commands when communicating between nodes. The admin node name must be specified as an IP > address or a hostname that is resolved by the host command to the desired IP address. If the keyword DEFAULT is specified, the admin interface for the > node is set to be equal to the daemon interface for the node. > > So, seems used only for commands propagation, hence have nothing to do with the node-to-node traffic. Infact the other interface description is: > > --daemon-interface={hostname | ip_address} > Specifies the host name or IP address to be used by the GPFS daemons for node-to-node communication. The host name or IP address must refer to the commu- > nication adapter over which the GPFS daemons communicate. Alias interfaces are not allowed. Use the original address or a name that is resolved by the > host command to that original address. > > The "expired lease" issue and file locking mechanism a( most of our expells happens when 2 clients try to write in the same file) are exactly node-to node-comunication, so im wondering what's the point to separate the "admin network". I want to be sure to plan the right changes before we do a so massive task. We are talking about adding a new interface on 700 clients, so the recabling work its not small. > > > Regards, > Salvatore > > > > On 13/07/15 14:00, Vic Cornell wrote: >> Hi Salavatore, >> >> Does your GSS have the facility for a 1GbE ?management? network? If so I think that changing the ?admin? node names of the cluster members to a set of IPs on the management network would give you the split that you need. >> >> What about the clients? Can they also connect to a separate admin network? >> >> Remember that if you are using multi-cluster all of the nodes in both networks must share the same admin network. >> >> Kind Regards, >> >> Vic >> >> >>> On 13 Jul 2015, at 13:31, Salvatore Di Nardo > wrote: >>> >>> Anyone? >>> >>> On 10/07/15 11:07, Salvatore Di Nardo wrote: >>>> Hello guys. >>>> Quite a while ago i mentioned that we have a big expel issue on our gss ( first gen) and white a lot people suggested that the root cause could be that we use the same interface for all the traffic, and that we should split the data network from the admin network. Finally we could plan a downtime and we are migrating the data out so, i can soon safelly play with the change, but looking what exactly i should to do i'm a bit puzzled. Our mmlscluster looks like this: >>>> >>>> GPFS cluster information >>>> ======================== >>>> GPFS cluster name: GSS.ebi.ac.uk >>>> GPFS cluster id: 17987981184946329605 >>>> GPFS UID domain: GSS.ebi.ac.uk >>>> Remote shell command: /usr/bin/ssh >>>> Remote file copy command: /usr/bin/scp >>>> >>>> GPFS cluster configuration servers: >>>> ----------------------------------- >>>> Primary server: gss01a.ebi.ac.uk >>>> Secondary server: gss02b.ebi.ac.uk >>>> >>>> Node Daemon node name IP address Admin node name Designation >>>> ----------------------------------------------------------------------- >>>> 1 gss01a.ebi.ac.uk 10.7.28.2 gss01a.ebi.ac.uk quorum-manager >>>> 2 gss01b.ebi.ac.uk 10.7.28.3 gss01b.ebi.ac.uk quorum-manager >>>> 3 gss02a.ebi.ac.uk 10.7.28.67 gss02a.ebi.ac.uk quorum-manager >>>> 4 gss02b.ebi.ac.uk 10.7.28.66 gss02b.ebi.ac.uk quorum-manager >>>> 5 gss03a.ebi.ac.uk 10.7.28.34 gss03a.ebi.ac.uk quorum-manager >>>> 6 gss03b.ebi.ac.uk 10.7.28.35 gss03b.ebi.ac.uk quorum-manager >>>> >>>> It was my understanding that the "admin node" should use a different interface ( a 1g link copper should be fine), while the daemon node is where the data was passing , so should point to the bonded 10g interfaces. but when i read the mmchnode man page i start to be quite confused. It says: >>>> >>>> --daemon-interface={hostname | ip_address} >>>> Specifies the host name or IP address to be used by the GPFS daemons for node-to-node communication. The host name or IP address must refer to the communication adapter over which the GPFS daemons communicate. >>>> Alias interfaces are not allowed. Use the original address or a name that is resolved by the host command to that original address. >>>> >>>> --admin-interface={hostname | ip_address} >>>> Specifies the name of the node to be used by GPFS administration commands when communicating between nodes. The admin node name must be specified as an IP address or a hostname that is resolved by the host command >>>> to the desired IP address. If the keyword DEFAULT is specified, the admin interface for the node is set to be equal to the daemon interface for the node. >>>> >>>> What exactly means "node-to node-communications" ? >>>> Means DATA or also the "lease renew", and the token communication between the clients to get/steal the locks to be able to manage concurrent write to thr same file? >>>> Since we are getting expells ( especially when several clients contends the same file ) i assumed i have to split this type of packages from the data stream, but reading the documentation it looks to me that those internal comunication between nodes use the daemon-interface wich i suppose are used also for the data. so HOW exactly i can split them? >>>> >>>> >>>> Thanks in advance, >>>> Salvatore >>>> >>>> >>>> >>>> _______________________________________________ >>>> gpfsug-discuss mailing list >>>> gpfsug-discuss at gpfsug.org >>>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >>> >>> _______________________________________________ >>> gpfsug-discuss mailing list >>> gpfsug-discuss at gpfsug.org >>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >> > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at gpfsug.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From jhick at lbl.gov Mon Jul 13 16:22:58 2015 From: jhick at lbl.gov (Jason Hick) Date: Mon, 13 Jul 2015 08:22:58 -0700 Subject: [gpfsug-discuss] data interface and management infercace. In-Reply-To: References: <559F9960.7010509@ebi.ac.uk> <55A3AF96.3060303@ebi.ac.uk> <5269A4E9-416B-4D70-AAE0-B86042FC96B9@ddn.com> <55A3BD4E.3000205@ebi.ac.uk> Message-ID: Hi, Yes having separate data and management networks has been critical for us for keeping health monitoring/communication unimpeded by data movement. Not as important, but you can also tune the networks differently (packet sizes, buffer sizes, SAK, etc) which can help. Jason > On Jul 13, 2015, at 7:25 AM, Vic Cornell wrote: > > Hi Salvatore, > > I agree that that is what the manual - and some of the wiki entries say. > > However , when we have had problems (typically congestion) with ethernet networks in the past (20GbE or 40GbE) we have resolved them by setting up a separate ?Admin? network. > > The before and after cluster health we have seen measured in number of expels and waiters has been very marked. > > Maybe someone ?in the know? could comment on this split. > > Regards, > > Vic > > >> On 13 Jul 2015, at 14:29, Salvatore Di Nardo wrote: >> >> Hello Vic. >> We are currently draining our gpfs to do all the recabling to add a management network, but looking what the admin interface does ( man mmchnode ) it says something different: >> >> --admin-interface={hostname | ip_address} >> Specifies the name of the node to be used by GPFS administration commands when communicating between nodes. The admin node name must be specified as an IP >> address or a hostname that is resolved by the host command to the desired IP address. If the keyword DEFAULT is specified, the admin interface for the >> node is set to be equal to the daemon interface for the node. >> >> So, seems used only for commands propagation, hence have nothing to do with the node-to-node traffic. Infact the other interface description is: >> >> --daemon-interface={hostname | ip_address} >> Specifies the host name or IP address to be used by the GPFS daemons for node-to-node communication. The host name or IP address must refer to the commu- >> nication adapter over which the GPFS daemons communicate. Alias interfaces are not allowed. Use the original address or a name that is resolved by the >> host command to that original address. >> >> The "expired lease" issue and file locking mechanism a( most of our expells happens when 2 clients try to write in the same file) are exactly node-to node-comunication, so im wondering what's the point to separate the "admin network". I want to be sure to plan the right changes before we do a so massive task. We are talking about adding a new interface on 700 clients, so the recabling work its not small. >> >> >> Regards, >> Salvatore >> >> >> >>> On 13/07/15 14:00, Vic Cornell wrote: >>> Hi Salavatore, >>> >>> Does your GSS have the facility for a 1GbE ?management? network? If so I think that changing the ?admin? node names of the cluster members to a set of IPs on the management network would give you the split that you need. >>> >>> What about the clients? Can they also connect to a separate admin network? >>> >>> Remember that if you are using multi-cluster all of the nodes in both networks must share the same admin network. >>> >>> Kind Regards, >>> >>> Vic >>> >>> >>>> On 13 Jul 2015, at 13:31, Salvatore Di Nardo wrote: >>>> >>>> Anyone? >>>> >>>>> On 10/07/15 11:07, Salvatore Di Nardo wrote: >>>>> Hello guys. >>>>> Quite a while ago i mentioned that we have a big expel issue on our gss ( first gen) and white a lot people suggested that the root cause could be that we use the same interface for all the traffic, and that we should split the data network from the admin network. Finally we could plan a downtime and we are migrating the data out so, i can soon safelly play with the change, but looking what exactly i should to do i'm a bit puzzled. Our mmlscluster looks like this: >>>>> >>>>> GPFS cluster information >>>>> ======================== >>>>> GPFS cluster name: GSS.ebi.ac.uk >>>>> GPFS cluster id: 17987981184946329605 >>>>> GPFS UID domain: GSS.ebi.ac.uk >>>>> Remote shell command: /usr/bin/ssh >>>>> Remote file copy command: /usr/bin/scp >>>>> >>>>> GPFS cluster configuration servers: >>>>> ----------------------------------- >>>>> Primary server: gss01a.ebi.ac.uk >>>>> Secondary server: gss02b.ebi.ac.uk >>>>> >>>>> Node Daemon node name IP address Admin node name Designation >>>>> ----------------------------------------------------------------------- >>>>> 1 gss01a.ebi.ac.uk 10.7.28.2 gss01a.ebi.ac.uk quorum-manager >>>>> 2 gss01b.ebi.ac.uk 10.7.28.3 gss01b.ebi.ac.uk quorum-manager >>>>> 3 gss02a.ebi.ac.uk 10.7.28.67 gss02a.ebi.ac.uk quorum-manager >>>>> 4 gss02b.ebi.ac.uk 10.7.28.66 gss02b.ebi.ac.uk quorum-manager >>>>> 5 gss03a.ebi.ac.uk 10.7.28.34 gss03a.ebi.ac.uk quorum-manager >>>>> 6 gss03b.ebi.ac.uk 10.7.28.35 gss03b.ebi.ac.uk quorum-manager >>>>> >>>>> It was my understanding that the "admin node" should use a different interface ( a 1g link copper should be fine), while the daemon node is where the data was passing , so should point to the bonded 10g interfaces. but when i read the mmchnode man page i start to be quite confused. It says: >>>>> >>>>> --daemon-interface={hostname | ip_address} >>>>> Specifies the host name or IP address to be used by the GPFS daemons for node-to-node communication. The host name or IP address must refer to the communication adapter over which the GPFS daemons communicate. >>>>> Alias interfaces are not allowed. Use the original address or a name that is resolved by the host command to that original address. >>>>> >>>>> --admin-interface={hostname | ip_address} >>>>> Specifies the name of the node to be used by GPFS administration commands when communicating between nodes. The admin node name must be specified as an IP address or a hostname that is resolved by the host command >>>>> to the desired IP address. If the keyword DEFAULT is specified, the admin interface for the node is set to be equal to the daemon interface for the node. >>>>> >>>>> What exactly means "node-to node-communications" ? >>>>> Means DATA or also the "lease renew", and the token communication between the clients to get/steal the locks to be able to manage concurrent write to thr same file? >>>>> Since we are getting expells ( especially when several clients contends the same file ) i assumed i have to split this type of packages from the data stream, but reading the documentation it looks to me that those internal comunication between nodes use the daemon-interface wich i suppose are used also for the data. so HOW exactly i can split them? >>>>> >>>>> >>>>> Thanks in advance, >>>>> Salvatore >>>>> >>>>> >>>>> >>>>> _______________________________________________ >>>>> gpfsug-discuss mailing list >>>>> gpfsug-discuss at gpfsug.org >>>>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >>>> >>>> _______________________________________________ >>>> gpfsug-discuss mailing list >>>> gpfsug-discuss at gpfsug.org >>>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >> >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at gpfsug.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at gpfsug.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From sdenham at gmail.com Mon Jul 13 17:45:48 2015 From: sdenham at gmail.com (Scott D) Date: Mon, 13 Jul 2015 11:45:48 -0500 Subject: [gpfsug-discuss] data interface and management infercace. Message-ID: I spent a good deal of time exploring this topic when I was at IBM. I think there are two key aspects here; the congestion of the actual interfaces on the [cluster, FS, token] management nodes and competition for other resources like CPU cycles on those nodes. When using a single Ethernet interface (or for that matter IB RDMA + IPoIB over the same interface), at some point the two kinds of traffic begin to conflict. The management traffic being much more time sensitive suffers as a result. One solution is to separate the traffic. For larger clusters though (1000s of nodes), a better solution, that may avoid having to have a 2nd interface on every client node, is to add dedicated nodes as managers and not rely on NSD servers for this. It does cost you some modest servers and GPFS server licenses. My previous client generally used previous-generation retired compute nodes for this job. Scott Date: Mon, 13 Jul 2015 15:25:32 +0100 > From: Vic Cornell > Subject: Re: [gpfsug-discuss] data interface and management infercace. > > Hi Salvatore, > > I agree that that is what the manual - and some of the wiki entries say. > > However , when we have had problems (typically congestion) with ethernet > networks in the past (20GbE or 40GbE) we have resolved them by setting up a > separate ?Admin? network. > > The before and after cluster health we have seen measured in number of > expels and waiters has been very marked. > > Maybe someone ?in the know? could comment on this split. > > Regards, > > Vic > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mhabib73 at gmail.com Mon Jul 13 18:19:36 2015 From: mhabib73 at gmail.com (Muhammad Habib) Date: Mon, 13 Jul 2015 13:19:36 -0400 Subject: [gpfsug-discuss] data interface and management infercace. In-Reply-To: References: <559F9960.7010509@ebi.ac.uk> <55A3AF96.3060303@ebi.ac.uk> <5269A4E9-416B-4D70-AAE0-B86042FC96B9@ddn.com> <55A3BD4E.3000205@ebi.ac.uk> Message-ID: Did you look at "subnets" parameter used with "mmchconfig" command. I think you can use order list of subnets for daemon communication and then actual daemon interface can be used for data transfer. When the GPFS will start it will use actual daemon interface for communication , however , once its started , it will use the IPs from the subnet list whichever coming first in the list. To further validate , you can put network sniffer before you do actual implementation or alternatively you can open a PMR with IBM. If your cluster having expel situation , you may fine tune your cluster e.g. increase ping timeout period , having multiple NSD servers and distributing filesystems across these NSD servers. Also critical servers can have HBA cards installed for direct I/O through fiber. Thanks On Mon, Jul 13, 2015 at 11:22 AM, Jason Hick wrote: > Hi, > > Yes having separate data and management networks has been critical for us > for keeping health monitoring/communication unimpeded by data movement. > > Not as important, but you can also tune the networks differently (packet > sizes, buffer sizes, SAK, etc) which can help. > > Jason > > On Jul 13, 2015, at 7:25 AM, Vic Cornell wrote: > > Hi Salvatore, > > I agree that that is what the manual - and some of the wiki entries say. > > However , when we have had problems (typically congestion) with ethernet > networks in the past (20GbE or 40GbE) we have resolved them by setting up a > separate ?Admin? network. > > The before and after cluster health we have seen measured in number of > expels and waiters has been very marked. > > Maybe someone ?in the know? could comment on this split. > > Regards, > > Vic > > > On 13 Jul 2015, at 14:29, Salvatore Di Nardo wrote: > > Hello Vic. > We are currently draining our gpfs to do all the recabling to add a > management network, but looking what the admin interface does ( man > mmchnode ) it says something different: > > --admin-interface={hostname | ip_address} > Specifies the name of the node to be used by GPFS > administration commands when communicating between nodes. The admin node > name must be specified as an IP > address or a hostname that is resolved by the > host command to the desired IP address. If the keyword DEFAULT is > specified, the admin interface for the > node is set to be equal to the daemon interface > for the node. > > > So, seems used only for commands propagation, hence have nothing to do > with the node-to-node traffic. Infact the other interface description is: > > --daemon-interface={hostname | ip_address} > Specifies the host name or IP address *to be > used by the GPFS daemons for node-to-node communication*. The host name > or IP address must refer to the commu- > nication adapter over which the GPFS daemons > communicate. Alias interfaces are not allowed. Use the original address or > a name that is resolved by the > host command to that original address. > > > The "expired lease" issue and file locking mechanism a( most of our > expells happens when 2 clients try to write in the same file) are exactly > node-to node-comunication, so im wondering what's the point to separate > the "admin network". I want to be sure to plan the right changes before we > do a so massive task. We are talking about adding a new interface on 700 > clients, so the recabling work its not small. > > > Regards, > Salvatore > > > > On 13/07/15 14:00, Vic Cornell wrote: > > Hi Salavatore, > > Does your GSS have the facility for a 1GbE ?management? network? If so I > think that changing the ?admin? node names of the cluster members to a set > of IPs on the management network would give you the split that you need. > > What about the clients? Can they also connect to a separate admin > network? > > Remember that if you are using multi-cluster all of the nodes in both > networks must share the same admin network. > > Kind Regards, > > Vic > > > On 13 Jul 2015, at 13:31, Salvatore Di Nardo wrote: > > Anyone? > > On 10/07/15 11:07, Salvatore Di Nardo wrote: > > Hello guys. > Quite a while ago i mentioned that we have a big expel issue on our gss ( > first gen) and white a lot people suggested that the root cause could be > that we use the same interface for all the traffic, and that we should > split the data network from the admin network. Finally we could plan a > downtime and we are migrating the data out so, i can soon safelly play with > the change, but looking what exactly i should to do i'm a bit puzzled. Our > mmlscluster looks like this: > > GPFS cluster information > ======================== > GPFS cluster name: GSS.ebi.ac.uk > GPFS cluster id: 17987981184946329605 > GPFS UID domain: GSS.ebi.ac.uk > Remote shell command: /usr/bin/ssh > Remote file copy command: /usr/bin/scp > > GPFS cluster configuration servers: > ----------------------------------- > Primary server: gss01a.ebi.ac.uk > Secondary server: gss02b.ebi.ac.uk > > Node Daemon node name IP address Admin node name Designation > ----------------------------------------------------------------------- > 1 gss01a.ebi.ac.uk 10.7.28.2 gss01a.ebi.ac.uk quorum-manager > 2 gss01b.ebi.ac.uk 10.7.28.3 gss01b.ebi.ac.uk quorum-manager > 3 gss02a.ebi.ac.uk 10.7.28.67 gss02a.ebi.ac.uk quorum-manager > 4 gss02b.ebi.ac.uk 10.7.28.66 gss02b.ebi.ac.uk quorum-manager > 5 gss03a.ebi.ac.uk 10.7.28.34 gss03a.ebi.ac.uk quorum-manager > 6 gss03b.ebi.ac.uk 10.7.28.35 gss03b.ebi.ac.uk quorum-manager > > > It was my understanding that the "admin node" should use a different > interface ( a 1g link copper should be fine), while the daemon node is > where the data was passing , so should point to the bonded 10g interfaces. > but when i read the mmchnode man page i start to be quite confused. It says: > > --daemon-interface={hostname | ip_address} > Specifies the host name or IP address *to be > used by the GPFS daemons for node-to-node communication*. The host name > or IP address must refer to the communication adapter over which the GPFS > daemons communicate. > Alias interfaces are not allowed. Use the > original address or a name that is resolved by the host command to that > original address. > > --admin-interface={hostname | ip_address} > Specifies the name of the node to be used by GPFS > administration commands when communicating between nodes. The admin node > name must be specified as an IP address or a hostname that is resolved by > the host command > to the desired IP address. If the keyword > DEFAULT is specified, the admin interface for the node is set to be equal > to the daemon interface for the node. > > What exactly means "node-to node-communications" ? > Means DATA or also the "lease renew", and the token communication between > the clients to get/steal the locks to be able to manage concurrent write to > thr same file? > Since we are getting expells ( especially when several clients contends > the same file ) i assumed i have to split this type of packages from the > data stream, but reading the documentation it looks to me that those > internal comunication between nodes use the daemon-interface wich i suppose > are used also for the data. so HOW exactly i can split them? > > > Thanks in advance, > Salvatore > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at gpfsug.orghttp://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at gpfsug.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at gpfsug.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at gpfsug.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at gpfsug.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > -- This communication contains confidential information intended only for the persons to whom it is addressed. Any other distribution, copying or disclosure is strictly prohibited. If you have received this communication in error, please notify the sender and delete this e-mail message immediately. Le pr?sent message contient des renseignements de nature confidentielle r?serv?s uniquement ? l'usage du destinataire. Toute diffusion, distribution, divulgation, utilisation ou reproduction de la pr?sente communication, et de tout fichier qui y est joint, est strictement interdite. Si vous avez re?u le pr?sent message ?lectronique par erreur, veuillez informer imm?diatement l'exp?diteur et supprimer le message de votre ordinateur et de votre serveur. -------------- next part -------------- An HTML attachment was scrubbed... URL: From oester at gmail.com Mon Jul 13 18:42:47 2015 From: oester at gmail.com (Bob Oesterlin) Date: Mon, 13 Jul 2015 12:42:47 -0500 Subject: [gpfsug-discuss] data interface and management infercace. In-Reply-To: References: Message-ID: Some thoughts on node expels, based on the last 2-3 months of "expel hell" here. We've spent a lot of time looking at this issue, across multiple clusters. A big thanks to IBM for helping us center in on the right issues. First, you need to understand if the expels are due to "expired lease" message, or expels due to "communication issues". It sounds like you are talking about the latter. In the case of nodes being expelled due to communication issues, it's more likely the problem in related to network congestion. This can occur at many levels - the node, the network, or the switch. When it's a communication issue, changing prams like "missed ping timeout" isn't going to help you. The problem for us ended up being that GPFS wasn't getting a response to a periodic "keep alive" poll to the node, and after 300 seconds, it declared the node dead and expelled it. You can tell if this is the issue by starting to look at the RPC waiters just before the expel. If you see something like "Waiting for poll on sock" RPC, that the node is waiting for that periodic poll to return, and it's not seeing it. The response is either lost in the network, sitting on the network queue, or the node is too busy to send it. You may also see RPC's like "waiting for exclusive use of connection" RPC - this is another clear indication of network congestion. Look at the GPFSUG presentions (http://www.gpfsug.org/presentations/) for one by Jason Hick (NERSC) - he also talks about these issues. You need to take a look at net.ipv4.tcp_wmem and net.ipv4.tcp_rmem, especially if you have client nodes that are on slower network interfaces. In our case, it was a number of factors - adjusting these settings, looking at congestion at the switch level, and some physical hardware issues. I would be happy to discuss in more detail (offline) if you want). There are no simple solutions. :-) Bob Oesterlin, Sr Storage Engineer, Nuance Communications robert.oesterlin at nuance.com On Mon, Jul 13, 2015 at 11:45 AM, Scott D wrote: > I spent a good deal of time exploring this topic when I was at IBM. I > think there are two key aspects here; the congestion of the actual > interfaces on the [cluster, FS, token] management nodes and competition for > other resources like CPU cycles on those nodes. When using a single > Ethernet interface (or for that matter IB RDMA + IPoIB over the same > interface), at some point the two kinds of traffic begin to conflict. The > management traffic being much more time sensitive suffers as a result. One > solution is to separate the traffic. For larger clusters though (1000s of > nodes), a better solution, that may avoid having to have a 2nd interface on > every client node, is to add dedicated nodes as managers and not rely on > NSD servers for this. It does cost you some modest servers and GPFS server > licenses. My previous client generally used previous-generation retired > compute nodes for this job. > -------------- next part -------------- An HTML attachment was scrubbed... URL: From hagley at cscs.ch Tue Jul 14 08:31:04 2015 From: hagley at cscs.ch (Hagley Birgit) Date: Tue, 14 Jul 2015 07:31:04 +0000 Subject: [gpfsug-discuss] data interface and management infercace. In-Reply-To: <55A3BD4E.3000205@ebi.ac.uk> References: <559F9960.7010509@ebi.ac.uk> <55A3AF96.3060303@ebi.ac.uk> <5269A4E9-416B-4D70-AAE0-B86042FC96B9@ddn.com>, <55A3BD4E.3000205@ebi.ac.uk> Message-ID: <97B2355E006F044E9B8518711889B13719CF3810@MBX114.d.ethz.ch> Hello Salvatore, as you wrote that you have about 700 clients, maybe also the tuning recommendations for large GPFS clusters are helpful for you. They are on the developerworks GPFS wiki: https://www.ibm.com/developerworks/community/wikis/home?lang=en#!/wiki/General%20Parallel%20File%20System%20%28GPFS%29/page/Best%20Practices%20Network%20Tuning To my experience especially "failureDetectionTime" and "minMissedPingTimeout" may help in case of expelled nodes. In case you use InfiniBand, for RDMA, there also is a "Best Practices RDMA Tuning" page: https://www.ibm.com/developerworks/community/wikis/home?lang=en#!/wiki/General%20Parallel%20File%20System%20%28GPFS%29/page/Best%20Practices%20RDMA%20Tuning Regards Birgit ________________________________ From: gpfsug-discuss-bounces at gpfsug.org [gpfsug-discuss-bounces at gpfsug.org] on behalf of Salvatore Di Nardo [sdinardo at ebi.ac.uk] Sent: Monday, July 13, 2015 3:29 PM To: Vic Cornell Cc: gpfsug main discussion list Subject: Re: [gpfsug-discuss] data interface and management infercace. Hello Vic. We are currently draining our gpfs to do all the recabling to add a management network, but looking what the admin interface does ( man mmchnode ) it says something different: --admin-interface={hostname | ip_address} Specifies the name of the node to be used by GPFS administration commands when communicating between nodes. The admin node name must be specified as an IP address or a hostname that is resolved by the host command to the desired IP address. If the keyword DEFAULT is specified, the admin interface for the node is set to be equal to the daemon interface for the node. So, seems used only for commands propagation, hence have nothing to do with the node-to-node traffic. Infact the other interface description is: --daemon-interface={hostname | ip_address} Specifies the host name or IP address to be used by the GPFS daemons for node-to-node communication. The host name or IP address must refer to the commu- nication adapter over which the GPFS daemons communicate. Alias interfaces are not allowed. Use the original address or a name that is resolved by the host command to that original address. The "expired lease" issue and file locking mechanism a( most of our expells happens when 2 clients try to write in the same file) are exactly node-to node-comunication, so im wondering what's the point to separate the "admin network". I want to be sure to plan the right changes before we do a so massive task. We are talking about adding a new interface on 700 clients, so the recabling work its not small. Regards, Salvatore On 13/07/15 14:00, Vic Cornell wrote: Hi Salavatore, Does your GSS have the facility for a 1GbE ?management? network? If so I think that changing the ?admin? node names of the cluster members to a set of IPs on the management network would give you the split that you need. What about the clients? Can they also connect to a separate admin network? Remember that if you are using multi-cluster all of the nodes in both networks must share the same admin network. Kind Regards, Vic On 13 Jul 2015, at 13:31, Salvatore Di Nardo > wrote: Anyone? On 10/07/15 11:07, Salvatore Di Nardo wrote: Hello guys. Quite a while ago i mentioned that we have a big expel issue on our gss ( first gen) and white a lot people suggested that the root cause could be that we use the same interface for all the traffic, and that we should split the data network from the admin network. Finally we could plan a downtime and we are migrating the data out so, i can soon safelly play with the change, but looking what exactly i should to do i'm a bit puzzled. Our mmlscluster looks like this: GPFS cluster information ======================== GPFS cluster name: GSS.ebi.ac.uk GPFS cluster id: 17987981184946329605 GPFS UID domain: GSS.ebi.ac.uk Remote shell command: /usr/bin/ssh Remote file copy command: /usr/bin/scp GPFS cluster configuration servers: ----------------------------------- Primary server: gss01a.ebi.ac.uk Secondary server: gss02b.ebi.ac.uk Node Daemon node name IP address Admin node name Designation ----------------------------------------------------------------------- 1 gss01a.ebi.ac.uk 10.7.28.2 gss01a.ebi.ac.uk quorum-manager 2 gss01b.ebi.ac.uk 10.7.28.3 gss01b.ebi.ac.uk quorum-manager 3 gss02a.ebi.ac.uk 10.7.28.67 gss02a.ebi.ac.uk quorum-manager 4 gss02b.ebi.ac.uk 10.7.28.66 gss02b.ebi.ac.uk quorum-manager 5 gss03a.ebi.ac.uk 10.7.28.34 gss03a.ebi.ac.uk quorum-manager 6 gss03b.ebi.ac.uk 10.7.28.35 gss03b.ebi.ac.uk quorum-manager It was my understanding that the "admin node" should use a different interface ( a 1g link copper should be fine), while the daemon node is where the data was passing , so should point to the bonded 10g interfaces. but when i read the mmchnode man page i start to be quite confused. It says: --daemon-interface={hostname | ip_address} Specifies the host name or IP address to be used by the GPFS daemons for node-to-node communication. The host name or IP address must refer to the communication adapter over which the GPFS daemons communicate. Alias interfaces are not allowed. Use the original address or a name that is resolved by the host command to that original address. --admin-interface={hostname | ip_address} Specifies the name of the node to be used by GPFS administration commands when communicating between nodes. The admin node name must be specified as an IP address or a hostname that is resolved by the host command to the desired IP address. If the keyword DEFAULT is specified, the admin interface for the node is set to be equal to the daemon interface for the node. What exactly means "node-to node-communications" ? Means DATA or also the "lease renew", and the token communication between the clients to get/steal the locks to be able to manage concurrent write to thr same file? Since we are getting expells ( especially when several clients contends the same file ) i assumed i have to split this type of packages from the data stream, but reading the documentation it looks to me that those internal comunication between nodes use the daemon-interface wich i suppose are used also for the data. so HOW exactly i can split them? Thanks in advance, Salvatore _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From sdinardo at ebi.ac.uk Tue Jul 14 09:15:26 2015 From: sdinardo at ebi.ac.uk (Salvatore Di Nardo) Date: Tue, 14 Jul 2015 09:15:26 +0100 Subject: [gpfsug-discuss] data interface and management infercace. In-Reply-To: <97B2355E006F044E9B8518711889B13719CF3810@MBX114.d.ethz.ch> References: <559F9960.7010509@ebi.ac.uk> <55A3AF96.3060303@ebi.ac.uk> <5269A4E9-416B-4D70-AAE0-B86042FC96B9@ddn.com>, <55A3BD4E.3000205@ebi.ac.uk> <97B2355E006F044E9B8518711889B13719CF3810@MBX114.d.ethz.ch> Message-ID: <55A4C51E.8050606@ebi.ac.uk> Thanks, this has already been done ( without too much success). We need to rearrange the networking and since somebody experience was to add a copper interface for management i want to do the same, so i'm digging a bit to aundertsand the best way yo do it. Regards, Salvatore On 14/07/15 08:31, Hagley Birgit wrote: > Hello Salvatore, > > as you wrote that you have about 700 clients, maybe also the tuning > recommendations for large GPFS clusters are helpful for you. They are > on the developerworks GPFS wiki: > > https://www.ibm.com/developerworks/community/wikis/home?lang=en#!/wiki/General%20Parallel%20File%20System%20%28GPFS%29/page/Best%20Practices%20Network%20Tuning > > > > To my experience especially "failureDetectionTime" and > "minMissedPingTimeout" may help in case of expelled nodes. > > > In case you use InfiniBand, for RDMA, there also is a "Best Practices > RDMA Tuning" page: > > https://www.ibm.com/developerworks/community/wikis/home?lang=en#!/wiki/General%20Parallel%20File%20System%20%28GPFS%29/page/Best%20Practices%20RDMA%20Tuning > > > > > Regards > Birgit > > ------------------------------------------------------------------------ > *From:* gpfsug-discuss-bounces at gpfsug.org > [gpfsug-discuss-bounces at gpfsug.org] on behalf of Salvatore Di Nardo > [sdinardo at ebi.ac.uk] > *Sent:* Monday, July 13, 2015 3:29 PM > *To:* Vic Cornell > *Cc:* gpfsug main discussion list > *Subject:* Re: [gpfsug-discuss] data interface and management infercace. > > Hello Vic. > We are currently draining our gpfs to do all the recabling to add a > management network, but looking what the admin interface does ( man > mmchnode ) it says something different: > > --admin-interface={hostname | ip_address} > Specifies the name of the node to be used by GPFS > administration commands when communicating between nodes. The > admin node name must be specified as an IP > address or a hostname that is resolved by the host command to > the desired IP address. If the keyword DEFAULT is specified, > the admin interface for the > node is set to be equal to the daemon interface for the node. > > > So, seems used only for commands propagation, hence have nothing to > do with the node-to-node traffic. Infact the other interface > description is: > > --daemon-interface={hostname | ip_address} > Specifies the host name or IP address _*to be used by the GPFS > daemons for node-to-node communication*_. The host name or IP > address must refer to the commu- > nication adapter over which the GPFS daemons communicate. > Alias interfaces are not allowed. Use the original address or > a name that is resolved by the > host command to that original address. > > > The "expired lease" issue and file locking mechanism a( most of our > expells happens when 2 clients try to write in the same file) are > exactly node-to node-comunication, so im wondering what's the point to > separate the "admin network". I want to be sure to plan the right > changes before we do a so massive task. We are talking about adding a > new interface on 700 clients, so the recabling work its not small. > > > Regards, > Salvatore > > > > On 13/07/15 14:00, Vic Cornell wrote: >> Hi Salavatore, >> >> Does your GSS have the facility for a 1GbE ?management? network? If >> so I think that changing the ?admin? node names of the cluster >> members to a set of IPs on the management network would give you the >> split that you need. >> >> What about the clients? Can they also connect to a separate admin >> network? >> >> Remember that if you are using multi-cluster all of the nodes in both >> networks must share the same admin network. >> >> Kind Regards, >> >> Vic >> >> >>> On 13 Jul 2015, at 13:31, Salvatore Di Nardo >> > wrote: >>> >>> Anyone? >>> >>> On 10/07/15 11:07, Salvatore Di Nardo wrote: >>>> Hello guys. >>>> Quite a while ago i mentioned that we have a big expel issue on >>>> our gss ( first gen) and white a lot people suggested that the root >>>> cause could be that we use the same interface for all the traffic, >>>> and that we should split the data network from the admin network. >>>> Finally we could plan a downtime and we are migrating the data out >>>> so, i can soon safelly play with the change, but looking what >>>> exactly i should to do i'm a bit puzzled. Our mmlscluster looks >>>> like this: >>>> >>>> GPFS cluster information >>>> ======================== >>>> GPFS cluster name: GSS.ebi.ac.uk >>>> GPFS cluster id: 17987981184946329605 >>>> GPFS UID domain: GSS.ebi.ac.uk >>>> Remote shell command: /usr/bin/ssh >>>> Remote file copy command: /usr/bin/scp >>>> >>>> GPFS cluster configuration servers: >>>> ----------------------------------- >>>> Primary server: gss01a.ebi.ac.uk >>>> Secondary server: gss02b.ebi.ac.uk >>>> >>>> >>>> Node Daemon node name IP address Admin node >>>> name Designation >>>> ----------------------------------------------------------------------- >>>> 1 gss01a.ebi.ac.uk >>>> 10.7.28.2 gss01a.ebi.ac.uk >>>> quorum-manager >>>> 2 gss01b.ebi.ac.uk >>>> 10.7.28.3 gss01b.ebi.ac.uk >>>> quorum-manager >>>> 3 gss02a.ebi.ac.uk >>>> 10.7.28.67 gss02a.ebi.ac.uk >>>> quorum-manager >>>> 4 gss02b.ebi.ac.uk >>>> 10.7.28.66 gss02b.ebi.ac.uk >>>> quorum-manager >>>> 5 gss03a.ebi.ac.uk >>>> 10.7.28.34 gss03a.ebi.ac.uk >>>> quorum-manager >>>> 6 gss03b.ebi.ac.uk >>>> 10.7.28.35 gss03b.ebi.ac.uk >>>> quorum-manager >>>> >>>> >>>> It was my understanding that the "admin node" should use a >>>> different interface ( a 1g link copper should be fine), while the >>>> daemon node is where the data was passing , so should point to the >>>> bonded 10g interfaces. but when i read the mmchnode man page i >>>> start to be quite confused. It says: >>>> >>>> --daemon-interface={hostname | ip_address} >>>> Specifies the host name or IP address _*to be used by the GPFS >>>> daemons for node-to-node communication*_. The host name or IP >>>> address must refer to the communication adapter over which the GPFS >>>> daemons communicate. >>>> Alias interfaces are not allowed. Use the >>>> original address or a name that is resolved by the host command to >>>> that original address. >>>> >>>> --admin-interface={hostname | ip_address} >>>> Specifies the name of the node to be used by GPFS administration >>>> commands when communicating between nodes. The admin node name must >>>> be specified as an IP address or a hostname that is resolved by >>>> the host command >>>> tothe desired IP address. If the keyword >>>> DEFAULT is specified, the admin interface for the node is set to be >>>> equal to the daemon interface for the node. >>>> >>>> What exactly means "node-to node-communications" ? >>>> Means DATA or also the "lease renew", and the token communication >>>> between the clients to get/steal the locks to be able to manage >>>> concurrent write to thr same file? >>>> Since we are getting expells ( especially when several clients >>>> contends the same file ) i assumed i have to split this type of >>>> packages from the data stream, but reading the documentation it >>>> looks to me that those internal comunication between nodes use the >>>> daemon-interface wich i suppose are used also for the data. so HOW >>>> exactly i can split them? >>>> >>>> >>>> Thanks in advance, >>>> Salvatore >>>> >>>> >>>> >>>> _______________________________________________ >>>> gpfsug-discuss mailing list >>>> gpfsug-discuss atgpfsug.org >>>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >>> >>> _______________________________________________ >>> gpfsug-discuss mailing list >>> gpfsug-discuss at gpfsug.org >>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >> > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at gpfsug.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From jtucker at pixitmedia.com Tue Jul 14 16:11:51 2015 From: jtucker at pixitmedia.com (Jez Tucker) Date: Tue, 14 Jul 2015 16:11:51 +0100 Subject: [gpfsug-discuss] Vim highlighting for GPFS available Message-ID: <55A526B7.6080602@pixitmedia.com> Hi everyone, I've released vim highlighting for GPFS policies as a public git repo. https://github.com/arcapix/vim-gpfs Pull requests welcome. Please enjoy your new colourful world. Jez p.s. Apologies to Emacs users. Head of R&D ArcaStream/Pixit Media -- This email is confidential in that it is intended for the exclusive attention of the addressee(s) indicated. If you are not the intended recipient, this email should not be read or disclosed to any other person. Please notify the sender immediately and delete this email from your computer system. Any opinions expressed are not necessarily those of the company from which this email was sent and, whilst to the best of our knowledge no viruses or defects exist, no responsibility can be accepted for any loss or damage arising from its receipt or subsequent use of this email. From jonbernard at gmail.com Wed Jul 15 09:19:49 2015 From: jonbernard at gmail.com (Jon Bernard) Date: Wed, 15 Jul 2015 10:19:49 +0200 Subject: [gpfsug-discuss] GPFS UG 10 Presentations - Sven Oehme In-Reply-To: References: Message-ID: If I may revive this: is trcio publicly available? Jon Bernard On Fri, May 2, 2014 at 5:06 PM, Bob Oesterlin wrote: > It Sven's presentation, he mentions a tools "trcio" (in > /xcat/oehmes/gpfs-clone) > > Where can I find that? > > Bob Oesterlin > > > > On Fri, May 2, 2014 at 9:49 AM, Jez Tucker (Chair) > wrote: > >> Hello all >> >> Firstly, thanks for the feedback we've had so far. Very much >> appreciated. >> >> Secondly, GPFS UG 10 Presentations are now available on the Presentations >> section of the website. >> Any outstanding presentations will follow shortly. >> >> See: http://www.gpfsug.org/ >> >> Best regards, >> >> Jez >> >> UG Chair >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at gpfsug.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >> > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at gpfsug.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From sdinardo at ebi.ac.uk Wed Jul 15 10:19:58 2015 From: sdinardo at ebi.ac.uk (Salvatore Di Nardo) Date: Wed, 15 Jul 2015 10:19:58 +0100 Subject: [gpfsug-discuss] data interface and management infercace. In-Reply-To: References: <559F9960.7010509@ebi.ac.uk> <55A3AF96.3060303@ebi.ac.uk> <5269A4E9-416B-4D70-AAE0-B86042FC96B9@ddn.com> <55A3BD4E.3000205@ebi.ac.uk> Message-ID: <55A625BE.9000809@ebi.ac.uk> Thanks for the input.. this is actually very interesting! Reading here: https://www.ibm.com/developerworks/community/wikis/home?lang=en#!/wiki/General+Parallel+File+System+%28GPFS%29/page/GPFS+Network+Communication+Overview , specifically the " Using more than one network" part it seems to me that this way we should be able to split the lease/token/ping from the data. Supposing that I implement a GSS cluster with only NDS and a second cluster with only clients: As far i understood if on the NDS cluster add first the subnet 10.20.0.0/16 and then 10.30.0.0 is should use the internal network for all the node-to-node comunication, leaving the 10.30.0.0/30 only for data traffic witht he remote cluster ( the clients). Similarly, in the client cluster, adding first 10.10.0.0/16 and then 10.30.0.0, will guarantee than the node-to-node comunication pass trough a different interface there the data is passing. Since the client are just "clients" the traffic trough 10.10.0.0/16 should be minimal (only token ,lease, ping and so on ) and not affected by the rest. Should be possible at this point move aldo the "admin network" on the internal interface, so we effectively splitted all the "non data" traffic on a dedicated interface. I'm wondering if I'm missing something, and in case i didn't, what could be the real traffic in the internal (black) networks ( 1g link its fine or i still need 10g for that). Another thing I I'm wondering its the load of the "non data" traffic between the clusters.. i suppose some "daemon traffic" goes trough the blue interface for the inter-cluster communication. Any thoughts ? Salvatore On 13/07/15 18:19, Muhammad Habib wrote: > Did you look at "subnets" parameter used with "mmchconfig" command. I > think you can use order list of subnets for daemon communication and > then actual daemon interface can be used for data transfer. When the > GPFS will start it will use actual daemon interface for communication > , however , once its started , it will use the IPs from the subnet > list whichever coming first in the list. To further validate , you > can put network sniffer before you do actual implementation or > alternatively you can open a PMR with IBM. > > If your cluster having expel situation , you may fine tune your > cluster e.g. increase ping timeout period , having multiple NSD > servers and distributing filesystems across these NSD servers. Also > critical servers can have HBA cards installed for direct I/O through > fiber. > > Thanks > > On Mon, Jul 13, 2015 at 11:22 AM, Jason Hick > wrote: > > Hi, > > Yes having separate data and management networks has been critical > for us for keeping health monitoring/communication unimpeded by > data movement. > > Not as important, but you can also tune the networks differently > (packet sizes, buffer sizes, SAK, etc) which can help. > > Jason > > On Jul 13, 2015, at 7:25 AM, Vic Cornell > wrote: > >> Hi Salvatore, >> >> I agree that that is what the manual - and some of the wiki >> entries say. >> >> However , when we have had problems (typically congestion) with >> ethernet networks in the past (20GbE or 40GbE) we have resolved >> them by setting up a separate ?Admin? network. >> >> The before and after cluster health we have seen measured in >> number of expels and waiters has been very marked. >> >> Maybe someone ?in the know? could comment on this split. >> >> Regards, >> >> Vic >> >> >>> On 13 Jul 2015, at 14:29, Salvatore Di Nardo >> > wrote: >>> >>> Hello Vic. >>> We are currently draining our gpfs to do all the recabling to >>> add a management network, but looking what the admin interface >>> does ( man mmchnode ) it says something different: >>> >>> --admin-interface={hostname | ip_address} >>> Specifies the name of the node to be used by GPFS >>> administration commands when communicating between >>> nodes. The admin node name must be specified as an IP >>> address or a hostname that is resolved by the host >>> command to the desired IP address. If the keyword >>> DEFAULT is specified, the admin interface for the >>> node is set to be equal to the daemon interface for the >>> node. >>> >>> >>> So, seems used only for commands propagation, hence have >>> nothing to do with the node-to-node traffic. Infact the other >>> interface description is: >>> >>> --daemon-interface={hostname | ip_address} >>> Specifies the host name or IP address _*to be used by >>> the GPFS daemons for node-to-node communication*_. The >>> host name or IP address must refer to the commu- >>> nication adapter over which the GPFS daemons >>> communicate. Alias interfaces are not allowed. Use the >>> original address or a name that is resolved by the >>> host command to that original address. >>> >>> >>> The "expired lease" issue and file locking mechanism a( most of >>> our expells happens when 2 clients try to write in the same >>> file) are exactly node-to node-comunication, so im wondering >>> what's the point to separate the "admin network". I want to be >>> sure to plan the right changes before we do a so massive task. >>> We are talking about adding a new interface on 700 clients, so >>> the recabling work its not small. >>> >>> >>> Regards, >>> Salvatore >>> >>> >>> >>> On 13/07/15 14:00, Vic Cornell wrote: >>>> Hi Salavatore, >>>> >>>> Does your GSS have the facility for a 1GbE ?management? >>>> network? If so I think that changing the ?admin? node names of >>>> the cluster members to a set of IPs on the management network >>>> would give you the split that you need. >>>> >>>> What about the clients? Can they also connect to a separate >>>> admin network? >>>> >>>> Remember that if you are using multi-cluster all of the nodes >>>> in both networks must share the same admin network. >>>> >>>> Kind Regards, >>>> >>>> Vic >>>> >>>> >>>>> On 13 Jul 2015, at 13:31, Salvatore Di Nardo >>>>> > wrote: >>>>> >>>>> Anyone? >>>>> >>>>> On 10/07/15 11:07, Salvatore Di Nardo wrote: >>>>>> Hello guys. >>>>>> Quite a while ago i mentioned that we have a big expel issue >>>>>> on our gss ( first gen) and white a lot people suggested that >>>>>> the root cause could be that we use the same interface for >>>>>> all the traffic, and that we should split the data network >>>>>> from the admin network. Finally we could plan a downtime and >>>>>> we are migrating the data out so, i can soon safelly play >>>>>> with the change, but looking what exactly i should to do i'm >>>>>> a bit puzzled. Our mmlscluster looks like this: >>>>>> >>>>>> GPFS cluster information >>>>>> ======================== >>>>>> GPFS cluster name: GSS.ebi.ac.uk >>>>>> >>>>>> GPFS cluster id: 17987981184946329605 >>>>>> GPFS UID domain: GSS.ebi.ac.uk >>>>>> >>>>>> Remote shell command: /usr/bin/ssh >>>>>> Remote file copy command: /usr/bin/scp >>>>>> >>>>>> GPFS cluster configuration servers: >>>>>> ----------------------------------- >>>>>> Primary server: gss01a.ebi.ac.uk >>>>>> >>>>>> Secondary server: gss02b.ebi.ac.uk >>>>>> >>>>>> >>>>>> Node Daemon node name IP address Admin node >>>>>> name Designation >>>>>> ----------------------------------------------------------------------- >>>>>> 1 gss01a.ebi.ac.uk >>>>>> 10.7.28.2 gss01a.ebi.ac.uk >>>>>> quorum-manager >>>>>> 2 gss01b.ebi.ac.uk >>>>>> 10.7.28.3 gss01b.ebi.ac.uk >>>>>> quorum-manager >>>>>> 3 gss02a.ebi.ac.uk >>>>>> 10.7.28.67 gss02a.ebi.ac.uk >>>>>> quorum-manager >>>>>> 4 gss02b.ebi.ac.uk >>>>>> 10.7.28.66 gss02b.ebi.ac.uk >>>>>> quorum-manager >>>>>> 5 gss03a.ebi.ac.uk >>>>>> 10.7.28.34 gss03a.ebi.ac.uk >>>>>> quorum-manager >>>>>> 6 gss03b.ebi.ac.uk >>>>>> 10.7.28.35 gss03b.ebi.ac.uk >>>>>> quorum-manager >>>>>> >>>>>> >>>>>> It was my understanding that the "admin node" should use a >>>>>> different interface ( a 1g link copper should be fine), while >>>>>> the daemon node is where the data was passing , so should >>>>>> point to the bonded 10g interfaces. but when i read the >>>>>> mmchnode man page i start to be quite confused. It says: >>>>>> >>>>>> --daemon-interface={hostname | ip_address} >>>>>> Specifies the host name or IP address _*to be used by the >>>>>> GPFS daemons for node-to-node communication*_. The host name >>>>>> or IP address must refer to the communication adapter over >>>>>> which the GPFS daemons communicate. >>>>>> Alias interfaces are not allowed. Use the original address or >>>>>> a name that is resolved by the host command to that original >>>>>> address. >>>>>> >>>>>> --admin-interface={hostname | ip_address} >>>>>> Specifies the name of the node to be used by GPFS >>>>>> administration commands when communicating between nodes. The >>>>>> admin node name must be specified as an IP address or a >>>>>> hostname that is resolved by the host command >>>>>> tothe desired IP address. If the >>>>>> keyword DEFAULT is specified, the admin interface for the >>>>>> node is set to be equal to the daemon interface for the node. >>>>>> >>>>>> What exactly means "node-to node-communications" ? >>>>>> Means DATA or also the "lease renew", and the token >>>>>> communication between the clients to get/steal the locks to >>>>>> be able to manage concurrent write to thr same file? >>>>>> Since we are getting expells ( especially when several >>>>>> clients contends the same file ) i assumed i have to split >>>>>> this type of packages from the data stream, but reading the >>>>>> documentation it looks to me that those internal comunication >>>>>> between nodes use the daemon-interface wich i suppose are >>>>>> used also for the data. so HOW exactly i can split them? >>>>>> >>>>>> >>>>>> Thanks in advance, >>>>>> Salvatore >>>>>> >>>>>> >>>>>> >>>>>> _______________________________________________ >>>>>> gpfsug-discuss mailing list >>>>>> gpfsug-discuss atgpfsug.org >>>>>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >>>>> >>>>> _______________________________________________ >>>>> gpfsug-discuss mailing list >>>>> gpfsug-discuss at gpfsug.org >>>>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >>>> >>> >>> _______________________________________________ >>> gpfsug-discuss mailing list >>> gpfsug-discuss at gpfsug.org >>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >> >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at gpfsug.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at gpfsug.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > > -- > This communication contains confidential information intended only for > the persons to whom it is addressed. Any other distribution, copying > or disclosure is strictly prohibited. If you have received this > communication in error, please notify the sender and delete this > e-mail message immediately. > > Le pr?sent message contient des renseignements de nature > confidentielle r?serv?s uniquement ? l'usage du destinataire. Toute > diffusion, distribution, divulgation, utilisation ou reproduction de > la pr?sente communication, et de tout fichier qui y est joint, est > strictement interdite. Si vous avez re?u le pr?sent message > ?lectronique par erreur, veuillez informer imm?diatement l'exp?diteur > et supprimer le message de votre ordinateur et de votre serveur. > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at gpfsug.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: gpfs.jpg Type: image/jpeg Size: 28904 bytes Desc: not available URL: From oehmes at gmail.com Wed Jul 15 15:33:11 2015 From: oehmes at gmail.com (Sven Oehme) Date: Wed, 15 Jul 2015 14:33:11 +0000 Subject: [gpfsug-discuss] GPFS UG 10 Presentations - Sven Oehme In-Reply-To: References: Message-ID: Hi Jon, the answer is no, its an development internal tool. sven On Wed, Jul 15, 2015 at 1:20 AM Jon Bernard wrote: > If I may revive this: is trcio publicly available? > > Jon Bernard > > On Fri, May 2, 2014 at 5:06 PM, Bob Oesterlin wrote: > >> It Sven's presentation, he mentions a tools "trcio" (in >> /xcat/oehmes/gpfs-clone) >> >> Where can I find that? >> >> Bob Oesterlin >> >> >> >> On Fri, May 2, 2014 at 9:49 AM, Jez Tucker (Chair) >> wrote: >> >>> Hello all >>> >>> Firstly, thanks for the feedback we've had so far. Very much >>> appreciated. >>> >>> Secondly, GPFS UG 10 Presentations are now available on the >>> Presentations section of the website. >>> Any outstanding presentations will follow shortly. >>> >>> See: http://www.gpfsug.org/ >>> >>> Best regards, >>> >>> Jez >>> >>> UG Chair >>> _______________________________________________ >>> gpfsug-discuss mailing list >>> gpfsug-discuss at gpfsug.org >>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >>> >> >> >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at gpfsug.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >> >> > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at gpfsug.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ewahl at osc.edu Wed Jul 15 15:37:57 2015 From: ewahl at osc.edu (Wahl, Edward) Date: Wed, 15 Jul 2015 14:37:57 +0000 Subject: [gpfsug-discuss] data interface and management infercace. In-Reply-To: <55A625BE.9000809@ebi.ac.uk> References: <559F9960.7010509@ebi.ac.uk> <55A3AF96.3060303@ebi.ac.uk> <5269A4E9-416B-4D70-AAE0-B86042FC96B9@ddn.com> <55A3BD4E.3000205@ebi.ac.uk> , <55A625BE.9000809@ebi.ac.uk> Message-ID: <9DA9EC7A281AC7428A9618AFDC49049955A606E4@CIO-KRC-D1MBX02.osuad.osu.edu> I don't see this in the thread but perhaps I missed it, what version are you running? I'm still on 3.5 so this is all based on that. A few notes for a little "heads up" here hoping to help with the pitfalls. I seem to recall a number of caveats when I did this a while back. Such as using the 'subnets' option being discussed, stops GPFS from failing over to other TCP networks when there are failures. VERY important! 'mmdiag --network' will show your setup. Definitely verify this if failing downwards is in your plans. We fail from 56Gb RDMA->10GbE TCP-> 1GbE here. And having had it work during some bad power events last year it was VERY nice that the users only noticed a slowdown when we completely lost Lustre and other resources. Also I recall that there was a restriction on having multiple private networks, and some special switch to force this. I have a note about "privateSubnetOverride" so you might read up about this. I seem to recall this was for TCP connections and daemonnodename being a private IP. Or maybe it was that AND mmlscluster having private IPs as well? I think the developerworks wiki had some writeup on this. I don't see it in the admin manuals. Hopefully this may help as you plan this out. Ed Wahl OSC ________________________________ From: gpfsug-discuss-bounces at gpfsug.org [gpfsug-discuss-bounces at gpfsug.org] on behalf of Salvatore Di Nardo [sdinardo at ebi.ac.uk] Sent: Wednesday, July 15, 2015 5:19 AM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] data interface and management infercace. Thanks for the input.. this is actually very interesting! Reading here: https://www.ibm.com/developerworks/community/wikis/home?lang=en#!/wiki/General+Parallel+File+System+%28GPFS%29/page/GPFS+Network+Communication+Overview , specifically the " Using more than one network" part it seems to me that this way we should be able to split the lease/token/ping from the data. Supposing that I implement a GSS cluster with only NDS and a second cluster with only clients: [cid:part1.03040109.00080709 at ebi.ac.uk] As far i understood if on the NDS cluster add first the subnet 10.20.0.0/16 and then 10.30.0.0 is should use the internal network for all the node-to-node comunication, leaving the 10.30.0.0/30 only for data traffic witht he remote cluster ( the clients). Similarly, in the client cluster, adding first 10.10.0.0/16 and then 10.30.0.0, will guarantee than the node-to-node comunication pass trough a different interface there the data is passing. Since the client are just "clients" the traffic trough 10.10.0.0/16 should be minimal (only token ,lease, ping and so on ) and not affected by the rest. Should be possible at this point move aldo the "admin network" on the internal interface, so we effectively splitted all the "non data" traffic on a dedicated interface. I'm wondering if I'm missing something, and in case i didn't, what could be the real traffic in the internal (black) networks ( 1g link its fine or i still need 10g for that). Another thing I I'm wondering its the load of the "non data" traffic between the clusters.. i suppose some "daemon traffic" goes trough the blue interface for the inter-cluster communication. Any thoughts ? Salvatore On 13/07/15 18:19, Muhammad Habib wrote: Did you look at "subnets" parameter used with "mmchconfig" command. I think you can use order list of subnets for daemon communication and then actual daemon interface can be used for data transfer. When the GPFS will start it will use actual daemon interface for communication , however , once its started , it will use the IPs from the subnet list whichever coming first in the list. To further validate , you can put network sniffer before you do actual implementation or alternatively you can open a PMR with IBM. If your cluster having expel situation , you may fine tune your cluster e.g. increase ping timeout period , having multiple NSD servers and distributing filesystems across these NSD servers. Also critical servers can have HBA cards installed for direct I/O through fiber. Thanks On Mon, Jul 13, 2015 at 11:22 AM, Jason Hick > wrote: Hi, Yes having separate data and management networks has been critical for us for keeping health monitoring/communication unimpeded by data movement. Not as important, but you can also tune the networks differently (packet sizes, buffer sizes, SAK, etc) which can help. Jason On Jul 13, 2015, at 7:25 AM, Vic Cornell > wrote: Hi Salvatore, I agree that that is what the manual - and some of the wiki entries say. However , when we have had problems (typically congestion) with ethernet networks in the past (20GbE or 40GbE) we have resolved them by setting up a separate ?Admin? network. The before and after cluster health we have seen measured in number of expels and waiters has been very marked. Maybe someone ?in the know? could comment on this split. Regards, Vic On 13 Jul 2015, at 14:29, Salvatore Di Nardo > wrote: Hello Vic. We are currently draining our gpfs to do all the recabling to add a management network, but looking what the admin interface does ( man mmchnode ) it says something different: --admin-interface={hostname | ip_address} Specifies the name of the node to be used by GPFS administration commands when communicating between nodes. The admin node name must be specified as an IP address or a hostname that is resolved by the host command to the desired IP address. If the keyword DEFAULT is specified, the admin interface for the node is set to be equal to the daemon interface for the node. So, seems used only for commands propagation, hence have nothing to do with the node-to-node traffic. Infact the other interface description is: --daemon-interface={hostname | ip_address} Specifies the host name or IP address to be used by the GPFS daemons for node-to-node communication. The host name or IP address must refer to the commu- nication adapter over which the GPFS daemons communicate. Alias interfaces are not allowed. Use the original address or a name that is resolved by the host command to that original address. The "expired lease" issue and file locking mechanism a( most of our expells happens when 2 clients try to write in the same file) are exactly node-to node-comunication, so im wondering what's the point to separate the "admin network". I want to be sure to plan the right changes before we do a so massive task. We are talking about adding a new interface on 700 clients, so the recabling work its not small. Regards, Salvatore On 13/07/15 14:00, Vic Cornell wrote: Hi Salavatore, Does your GSS have the facility for a 1GbE ?management? network? If so I think that changing the ?admin? node names of the cluster members to a set of IPs on the management network would give you the split that you need. What about the clients? Can they also connect to a separate admin network? Remember that if you are using multi-cluster all of the nodes in both networks must share the same admin network. Kind Regards, Vic On 13 Jul 2015, at 13:31, Salvatore Di Nardo > wrote: Anyone? On 10/07/15 11:07, Salvatore Di Nardo wrote: Hello guys. Quite a while ago i mentioned that we have a big expel issue on our gss ( first gen) and white a lot people suggested that the root cause could be that we use the same interface for all the traffic, and that we should split the data network from the admin network. Finally we could plan a downtime and we are migrating the data out so, i can soon safelly play with the change, but looking what exactly i should to do i'm a bit puzzled. Our mmlscluster looks like this: GPFS cluster information ======================== GPFS cluster name: GSS.ebi.ac.uk GPFS cluster id: 17987981184946329605 GPFS UID domain: GSS.ebi.ac.uk Remote shell command: /usr/bin/ssh Remote file copy command: /usr/bin/scp GPFS cluster configuration servers: ----------------------------------- Primary server: gss01a.ebi.ac.uk Secondary server: gss02b.ebi.ac.uk Node Daemon node name IP address Admin node name Designation ----------------------------------------------------------------------- 1 gss01a.ebi.ac.uk 10.7.28.2 gss01a.ebi.ac.uk quorum-manager 2 gss01b.ebi.ac.uk 10.7.28.3 gss01b.ebi.ac.uk quorum-manager 3 gss02a.ebi.ac.uk 10.7.28.67 gss02a.ebi.ac.uk quorum-manager 4 gss02b.ebi.ac.uk 10.7.28.66 gss02b.ebi.ac.uk quorum-manager 5 gss03a.ebi.ac.uk 10.7.28.34 gss03a.ebi.ac.uk quorum-manager 6 gss03b.ebi.ac.uk 10.7.28.35 gss03b.ebi.ac.uk quorum-manager It was my understanding that the "admin node" should use a different interface ( a 1g link copper should be fine), while the daemon node is where the data was passing , so should point to the bonded 10g interfaces. but when i read the mmchnode man page i start to be quite confused. It says: --daemon-interface={hostname | ip_address} Specifies the host name or IP address to be used by the GPFS daemons for node-to-node communication. The host name or IP address must refer to the communication adapter over which the GPFS daemons communicate. Alias interfaces are not allowed. Use the original address or a name that is resolved by the host command to that original address. --admin-interface={hostname | ip_address} Specifies the name of the node to be used by GPFS administration commands when communicating between nodes. The admin node name must be specified as an IP address or a hostname that is resolved by the host command to the desired IP address. If the keyword DEFAULT is specified, the admin interface for the node is set to be equal to the daemon interface for the node. What exactly means "node-to node-communications" ? Means DATA or also the "lease renew", and the token communication between the clients to get/steal the locks to be able to manage concurrent write to thr same file? Since we are getting expells ( especially when several clients contends the same file ) i assumed i have to split this type of packages from the data stream, but reading the documentation it looks to me that those internal comunication between nodes use the daemon-interface wich i suppose are used also for the data. so HOW exactly i can split them? Thanks in advance, Salvatore _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -- This communication contains confidential information intended only for the persons to whom it is addressed. Any other distribution, copying or disclosure is strictly prohibited. If you have received this communication in error, please notify the sender and delete this e-mail message immediately. Le pr?sent message contient des renseignements de nature confidentielle r?serv?s uniquement ? l'usage du destinataire. Toute diffusion, distribution, divulgation, utilisation ou reproduction de la pr?sente communication, et de tout fichier qui y est joint, est strictement interdite. Si vous avez re?u le pr?sent message ?lectronique par erreur, veuillez informer imm?diatement l'exp?diteur et supprimer le message de votre ordinateur et de votre serveur. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: gpfs.jpg Type: image/jpeg Size: 28904 bytes Desc: gpfs.jpg URL: From S.J.Thompson at bham.ac.uk Sun Jul 19 11:45:09 2015 From: S.J.Thompson at bham.ac.uk (Simon Thompson (Research Computing - IT Services)) Date: Sun, 19 Jul 2015 10:45:09 +0000 Subject: [gpfsug-discuss] 4.1.1 immutable filesets Message-ID: I was wondering if anyone had looked at the immutable fileset features in 4.1.1? In particular I was looking at the iam compliant mode, but I've a couple of questions. * if I have an iam compliant fileset, and it contains immutable files or directories, can I still unlink and delete the filset? * will HSM work with immutable files? I.e. Can I migrate files to tape and restore them? The docs mention that extended attributes can be updated internally by dmapi, so I guess HSM might work? Thanks Simon From kraemerf at de.ibm.com Sun Jul 19 13:45:35 2015 From: kraemerf at de.ibm.com (Frank Kraemer) Date: Sun, 19 Jul 2015 14:45:35 +0200 Subject: [gpfsug-discuss] Immutable fileset features In-Reply-To: References: Message-ID: >I was wondering if anyone had looked at the immutable fileset features in 4.1.1? yes, Nils Haustein has see: https://www.ibm.com/developerworks/community/blogs/storageneers/entry/Insight_to_the_IBM_Spectrum_Scale_GPFS_Immutability_function Frank Kraemer IBM Consulting IT Specialist / Client Technical Architect Hechtsheimer Str. 2, 55131 Mainz mailto:kraemerf at de.ibm.com voice: +49171-3043699 IBM Germany From S.J.Thompson at bham.ac.uk Sun Jul 19 14:35:47 2015 From: S.J.Thompson at bham.ac.uk (Simon Thompson (Research Computing - IT Services)) Date: Sun, 19 Jul 2015 13:35:47 +0000 Subject: [gpfsug-discuss] Immutable fileset features In-Reply-To: References: , Message-ID: Hi Frank, Yeah id read that this.morning, which is why I was asking... I couldn't see anything about HSM in there or if its possible to delete a fileset with immutable files. I remember Scott (maybe) mentioning it at the gpfs ug meeting in York, but I thought that was immutable file systems, which you have to destroy. Simon ________________________________________ From: gpfsug-discuss-bounces at gpfsug.org [gpfsug-discuss-bounces at gpfsug.org] on behalf of Frank Kraemer [kraemerf at de.ibm.com] Sent: 19 July 2015 13:45 To: gpfsug-discuss at gpfsug.org Subject: [gpfsug-discuss] Immutable fileset features >I was wondering if anyone had looked at the immutable fileset features in 4.1.1? yes, Nils Haustein has see: https://www.ibm.com/developerworks/community/blogs/storageneers/entry/Insight_to_the_IBM_Spectrum_Scale_GPFS_Immutability_function Frank Kraemer IBM Consulting IT Specialist / Client Technical Architect Hechtsheimer Str. 2, 55131 Mainz mailto:kraemerf at de.ibm.com voice: +49171-3043699 IBM Germany _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss From S.J.Thompson at bham.ac.uk Sun Jul 19 21:09:26 2015 From: S.J.Thompson at bham.ac.uk (Simon Thompson (Research Computing - IT Services)) Date: Sun, 19 Jul 2015 20:09:26 +0000 Subject: [gpfsug-discuss] Immutable fileset features In-Reply-To: References: Message-ID: On 19/07/2015 13:45, "Frank Kraemer" wrote: >>I was wondering if anyone had looked at the immutable fileset features in >4.1.1? > >yes, Nils Haustein has see: > >https://www.ibm.com/developerworks/community/blogs/storageneers/entry/Insi >ght_to_the_IBM_Spectrum_Scale_GPFS_Immutability_function I was re-reading some of this blog post and am a little confused. It talks about setting retention times by setting the ATIME from touch, or by using -E to mmchattr. Does that mean if a file is accessed, then the ATIME is updated and so the retention period is changed? What if our retention policy is based on last access time of file +period of time. I was thinking it would be useful to do a policy scan to find newly access files and then set the retention (either directly by policy if possible? Or by passing the file list to a script). Would this work or if the ATIME is overloaded, then I guess we can?t use this? Finally, is this a feature that is supported by IBM? The -E flag for mmchattr is neither in the man page nor the online docs at: http://www-01.ibm.com/support/knowledgecenter/#!/STXKQY_4.1.1/com.ibm.spect rum.scale.v4r11.adm.doc/bl1adm_mmchattr.htm (My possibly incorrect understanding was that if its documented, then is supported, otherwise it might work)? Simon From jamiedavis at us.ibm.com Mon Jul 20 13:26:17 2015 From: jamiedavis at us.ibm.com (James Davis) Date: Mon, 20 Jul 2015 08:26:17 -0400 Subject: [gpfsug-discuss] Immutable fileset features In-Reply-To: References: Message-ID: <201507200027.t6K0RD8b003417@d01av02.pok.ibm.com> Simon, I spoke to a tester who worked on this line item. She thinks mmchattr -E should have been documented. We will follow up. If it was an oversight it should be corrected soon. Jamie Jamie Davis GPFS Functional Verification Test (FVT) jamiedavis at us.ibm.com From: "Simon Thompson (Research Computing - IT Services)" To: gpfsug main discussion list Date: 19-07-15 04:09 PM Subject: Re: [gpfsug-discuss] Immutable fileset features Sent by: gpfsug-discuss-bounces at gpfsug.org On 19/07/2015 13:45, "Frank Kraemer" wrote: >>I was wondering if anyone had looked at the immutable fileset features in >4.1.1? > >yes, Nils Haustein has see: > >https://www.ibm.com/developerworks/community/blogs/storageneers/entry/Insi >ght_to_the_IBM_Spectrum_Scale_GPFS_Immutability_function I was re-reading some of this blog post and am a little confused. It talks about setting retention times by setting the ATIME from touch, or by using -E to mmchattr. Does that mean if a file is accessed, then the ATIME is updated and so the retention period is changed? What if our retention policy is based on last access time of file +period of time. I was thinking it would be useful to do a policy scan to find newly access files and then set the retention (either directly by policy if possible? Or by passing the file list to a script). Would this work or if the ATIME is overloaded, then I guess we can?t use this? Finally, is this a feature that is supported by IBM? The -E flag for mmchattr is neither in the man page nor the online docs at: http://www-01.ibm.com/support/knowledgecenter/#!/STXKQY_4.1.1/com.ibm.spect rum.scale.v4r11.adm.doc/bl1adm_mmchattr.htm (My possibly incorrect understanding was that if its documented, then is supported, otherwise it might work)? Simon _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: From Luke.Raimbach at crick.ac.uk Mon Jul 20 08:02:01 2015 From: Luke.Raimbach at crick.ac.uk (Luke Raimbach) Date: Mon, 20 Jul 2015 07:02:01 +0000 Subject: [gpfsug-discuss] 4.1.1 immutable filesets In-Reply-To: References: Message-ID: Can I add to this list of questions? Apparently, one cannot set immutable, or append-only attributes on files / directories within an AFM cache. However, if I have an independent writer and set immutability at home, what does the AFM IW cache do about this? Or does this restriction just apply to entire filesets (which would make more sense)? Cheers, Luke. -----Original Message----- From: gpfsug-discuss-bounces at gpfsug.org [mailto:gpfsug-discuss-bounces at gpfsug.org] On Behalf Of Simon Thompson (Research Computing - IT Services) Sent: 19 July 2015 11:45 To: gpfsug main discussion list Subject: [gpfsug-discuss] 4.1.1 immutable filesets I was wondering if anyone had looked at the immutable fileset features in 4.1.1? In particular I was looking at the iam compliant mode, but I've a couple of questions. * if I have an iam compliant fileset, and it contains immutable files or directories, can I still unlink and delete the filset? * will HSM work with immutable files? I.e. Can I migrate files to tape and restore them? The docs mention that extended attributes can be updated internally by dmapi, so I guess HSM might work? Thanks Simon _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss The Francis Crick Institute Limited is a registered charity in England and Wales no. 1140062 and a company registered in England and Wales no. 06885462, with its registered office at 215 Euston Road, London NW1 2BE. From kallbac at iu.edu Wed Jul 22 11:50:58 2015 From: kallbac at iu.edu (Kristy Kallback-Rose) Date: Wed, 22 Jul 2015 06:50:58 -0400 Subject: [gpfsug-discuss] SMB support and config In-Reply-To: References: <0193637F-4448-4068-A049-3A40A9AFD998@iu.edu> Message-ID: <203758A6-C7E0-4D3F-BA31-A130CF92DCBC@iu.edu> Yes interested, please post. We?ll probably keep running Samba separately, as we do today, for quite some time, but will be facing this transition at some point so we can be supported by IBM for Samba. On Jul 10, 2015, at 8:06 AM, Simon Thompson (Research Computing - IT Services) wrote: > So IBM came back and said what I was doing wasn?t supported. > > They did say that you can use ?user defined? authentication. Which I?ve > got working now on my environment (figured what I was doing wrong, and you > can?t use mmsmb to do some of the bits I need for it to work for user > defined mode for me...). But I still think it needs a patch to one of the > files for CES for use in user defined authentication. (Right now it > appears to remove all my ?user defined? settings from nsswitch.conf when > you stop CES/GPFS on a node). I?ve supplied my patch to IBM which works > for my case, we?ll see what they do about it? > > (If people are interested, I?ll gather my notes into a blog post). > > Simon > > On 06/07/2015 23:06, "Kallback-Rose, Kristy A" wrote: > >> Just to chime in as another interested party, we do something fairly >> similar but use sssd instead of nslcd. Very interested to see how >> accommodating the IBM Samba is to local configuration needs. >> >> Best, >> Kristy >> >> On Jul 6, 2015, at 6:09 AM, Simon Thompson (Research Computing - IT >> Services) wrote: >> >>> Hi, >>> >>> (sorry, lots of questions about this stuff at the moment!) >>> >>> I?m currently looking at removing the sernet smb configs we had >>> previously >>> and moving to IBM SMB. I?ve removed all the old packages and only now >>> have >>> gpfs.smb installed on the systems. >>> >>> I?m struggling to get the config tools to work for our environment. >>> >>> We have MS Windows AD Domain for authentication. For various reasons, >>> however doesn?t hold the UIDs/GIDs, which are instead held in a >>> different >>> LDAP directory. >>> >>> In the past, we?d configure the Linux servers running Samba so that >>> NSLCD >>> was configured to get details from the LDAP server. (e.g. getent passwd >>> would return the data for an AD user). The Linux boxes would also be >>> configured to use KRB5 authentication where users were allowed to ssh >>> etc >>> in for password authentication. >>> >>> So as far as Samba was concerned, it would do ?security = ADS? and then >>> we?d also have "idmap config * : backend = tdb2? >>> >>> I.e. Use Domain for authentication, but look locally for ID mapping >>> data. >>> >>> Now I can configured IBM SMB to use ADS for authentication: >>> >>> mmuserauth service create --type ad --data-access-method file >>> --netbios-name its-rds --user-name ADMINUSER --servers DOMAIN.ADF >>> --idmap-role subordinate >>> >>> >>> However I can?t see anyway for me to manipulate the config so that it >>> doesn?t use autorid. Using this we end up with: >>> >>> mmsmb config list | grep -i idmap >>> idmap config * : backend autorid >>> idmap config * : range 10000000-299999999 >>> idmap config * : rangesize 1000000 >>> idmap config * : read only yes >>> idmap:cache no >>> >>> >>> It also adds: >>> >>> mmsmb config list | grep -i auth >>> auth methods guest sam winbind >>> >>> (though I don?t think that is a problem). >>> >>> >>> I also can?t change the idmap using the mmsmb command (I think would >>> look >>> like this): >>> # mmsmb config change --option="idmap config * : backend=tdb2" >>> idmap config * : backend=tdb2: [E] Unsupported smb option. More >>> information about smb options is availabe in the man page. >>> >>> >>> >>> I can?t see anything in the docs at: >>> >>> http://www-01.ibm.com/support/knowledgecenter/#!/STXKQY_4.1.1/com.ibm.spe >>> ct >>> rum.scale.v4r11.adm.doc/bl1adm_configfileauthentication.htm >>> >>> That give me a clue how to do what I want. >>> >>> I?d be happy to do some mixture of AD for authentication and LDAP for >>> lookups (rather than just falling back to ?local? from nslcd), but I >>> can?t >>> see a way to do this, and ?manual? seems to stop ADS authentication in >>> Samba. >>> >>> Anyone got any suggestions? >>> >>> >>> Thanks >>> >>> Simon >>> >>> >>> _______________________________________________ >>> gpfsug-discuss mailing list >>> gpfsug-discuss at gpfsug.org >>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >> >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at gpfsug.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at gpfsug.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 495 bytes Desc: Message signed with OpenPGP using GPGMail URL: From S.J.Thompson at bham.ac.uk Wed Jul 22 11:59:56 2015 From: S.J.Thompson at bham.ac.uk (Simon Thompson (Research Computing - IT Services)) Date: Wed, 22 Jul 2015 10:59:56 +0000 Subject: [gpfsug-discuss] SMB support and config In-Reply-To: <203758A6-C7E0-4D3F-BA31-A130CF92DCBC@iu.edu> References: <0193637F-4448-4068-A049-3A40A9AFD998@iu.edu> , <203758A6-C7E0-4D3F-BA31-A130CF92DCBC@iu.edu> Message-ID: Hi Kristy, Funny you should ask, I wrote it up last night... http://www.roamingzebra.co.uk/2015/07/smb-protocol-support-with-spectrum.html They did tell me it was all tested with Samba 4, so should work, subject to you checking your own smb config options. But i like not having to build it myself now ;) The move was actually pretty easy and in theory you can run mixed over existing nodes and upgraded protocol nodes, but you might need a different clustered name. Simon ________________________________________ From: gpfsug-discuss-bounces at gpfsug.org [gpfsug-discuss-bounces at gpfsug.org] on behalf of Kristy Kallback-Rose [kallbac at iu.edu] Sent: 22 July 2015 11:50 To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] SMB support and config Yes interested, please post. We?ll probably keep running Samba separately, as we do today, for quite some time, but will be facing this transition at some point so we can be supported by IBM for Samba. On Jul 10, 2015, at 8:06 AM, Simon Thompson (Research Computing - IT Services) wrote: > So IBM came back and said what I was doing wasn?t supported. > > They did say that you can use ?user defined? authentication. Which I?ve > got working now on my environment (figured what I was doing wrong, and you > can?t use mmsmb to do some of the bits I need for it to work for user > defined mode for me...). But I still think it needs a patch to one of the > files for CES for use in user defined authentication. (Right now it > appears to remove all my ?user defined? settings from nsswitch.conf when > you stop CES/GPFS on a node). I?ve supplied my patch to IBM which works > for my case, we?ll see what they do about it? > > (If people are interested, I?ll gather my notes into a blog post). > > Simon > > On 06/07/2015 23:06, "Kallback-Rose, Kristy A" wrote: > >> Just to chime in as another interested party, we do something fairly >> similar but use sssd instead of nslcd. Very interested to see how >> accommodating the IBM Samba is to local configuration needs. >> >> Best, >> Kristy >> >> On Jul 6, 2015, at 6:09 AM, Simon Thompson (Research Computing - IT >> Services) wrote: >> >>> Hi, >>> >>> (sorry, lots of questions about this stuff at the moment!) >>> >>> I?m currently looking at removing the sernet smb configs we had >>> previously >>> and moving to IBM SMB. I?ve removed all the old packages and only now >>> have >>> gpfs.smb installed on the systems. >>> >>> I?m struggling to get the config tools to work for our environment. >>> >>> We have MS Windows AD Domain for authentication. For various reasons, >>> however doesn?t hold the UIDs/GIDs, which are instead held in a >>> different >>> LDAP directory. >>> >>> In the past, we?d configure the Linux servers running Samba so that >>> NSLCD >>> was configured to get details from the LDAP server. (e.g. getent passwd >>> would return the data for an AD user). The Linux boxes would also be >>> configured to use KRB5 authentication where users were allowed to ssh >>> etc >>> in for password authentication. >>> >>> So as far as Samba was concerned, it would do ?security = ADS? and then >>> we?d also have "idmap config * : backend = tdb2? >>> >>> I.e. Use Domain for authentication, but look locally for ID mapping >>> data. >>> >>> Now I can configured IBM SMB to use ADS for authentication: >>> >>> mmuserauth service create --type ad --data-access-method file >>> --netbios-name its-rds --user-name ADMINUSER --servers DOMAIN.ADF >>> --idmap-role subordinate >>> >>> >>> However I can?t see anyway for me to manipulate the config so that it >>> doesn?t use autorid. Using this we end up with: >>> >>> mmsmb config list | grep -i idmap >>> idmap config * : backend autorid >>> idmap config * : range 10000000-299999999 >>> idmap config * : rangesize 1000000 >>> idmap config * : read only yes >>> idmap:cache no >>> >>> >>> It also adds: >>> >>> mmsmb config list | grep -i auth >>> auth methods guest sam winbind >>> >>> (though I don?t think that is a problem). >>> >>> >>> I also can?t change the idmap using the mmsmb command (I think would >>> look >>> like this): >>> # mmsmb config change --option="idmap config * : backend=tdb2" >>> idmap config * : backend=tdb2: [E] Unsupported smb option. More >>> information about smb options is availabe in the man page. >>> >>> >>> >>> I can?t see anything in the docs at: >>> >>> http://www-01.ibm.com/support/knowledgecenter/#!/STXKQY_4.1.1/com.ibm.spe >>> ct >>> rum.scale.v4r11.adm.doc/bl1adm_configfileauthentication.htm >>> >>> That give me a clue how to do what I want. >>> >>> I?d be happy to do some mixture of AD for authentication and LDAP for >>> lookups (rather than just falling back to ?local? from nslcd), but I >>> can?t >>> see a way to do this, and ?manual? seems to stop ADS authentication in >>> Samba. >>> >>> Anyone got any suggestions? >>> >>> >>> Thanks >>> >>> Simon >>> >>> >>> _______________________________________________ >>> gpfsug-discuss mailing list >>> gpfsug-discuss at gpfsug.org >>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >> >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at gpfsug.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at gpfsug.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss From mhabib73 at gmail.com Wed Jul 22 13:58:51 2015 From: mhabib73 at gmail.com (Muhammad Habib) Date: Wed, 22 Jul 2015 08:58:51 -0400 Subject: [gpfsug-discuss] data interface and management infercace. In-Reply-To: <55A625BE.9000809@ebi.ac.uk> References: <559F9960.7010509@ebi.ac.uk> <55A3AF96.3060303@ebi.ac.uk> <5269A4E9-416B-4D70-AAE0-B86042FC96B9@ddn.com> <55A3BD4E.3000205@ebi.ac.uk> <55A625BE.9000809@ebi.ac.uk> Message-ID: did you implement it ? looks ok. All daemon traffic should be going through black network including inter-cluster daemon traffic ( assume black subnet routable). All data traffic should be going through the blue network. You may need to run iptrace or tcpdump to make sure proper network are in use. You can always open a PMR if you having issue during the configuration . Thanks On Wed, Jul 15, 2015 at 5:19 AM, Salvatore Di Nardo wrote: > Thanks for the input.. this is actually very interesting! > > Reading here: > https://www.ibm.com/developerworks/community/wikis/home?lang=en#!/wiki/General+Parallel+File+System+%28GPFS%29/page/GPFS+Network+Communication+Overview > , > specifically the " Using more than one network" part it seems to me that > this way we should be able to split the lease/token/ping from the data. > > Supposing that I implement a GSS cluster with only NDS and a second > cluster with only clients: > > > > As far i understood if on the NDS cluster add first the subnet > 10.20.0.0/16 and then 10.30.0.0 is should use the internal network for > all the node-to-node comunication, leaving the 10.30.0.0/30 only for data > traffic witht he remote cluster ( the clients). Similarly, in the client > cluster, adding first 10.10.0.0/16 and then 10.30.0.0, will guarantee > than the node-to-node comunication pass trough a different interface there > the data is passing. Since the client are just "clients" the traffic trough > 10.10.0.0/16 should be minimal (only token ,lease, ping and so on ) and > not affected by the rest. Should be possible at this point move aldo the > "admin network" on the internal interface, so we effectively splitted all > the "non data" traffic on a dedicated interface. > > I'm wondering if I'm missing something, and in case i didn't, what could > be the real traffic in the internal (black) networks ( 1g link its fine or > i still need 10g for that). Another thing I I'm wondering its the load of > the "non data" traffic between the clusters.. i suppose some "daemon > traffic" goes trough the blue interface for the inter-cluster > communication. > > > Any thoughts ? > > Salvatore > > On 13/07/15 18:19, Muhammad Habib wrote: > > Did you look at "subnets" parameter used with "mmchconfig" command. I > think you can use order list of subnets for daemon communication and then > actual daemon interface can be used for data transfer. When the GPFS will > start it will use actual daemon interface for communication , however , > once its started , it will use the IPs from the subnet list whichever > coming first in the list. To further validate , you can put network > sniffer before you do actual implementation or alternatively you can open a > PMR with IBM. > > If your cluster having expel situation , you may fine tune your cluster > e.g. increase ping timeout period , having multiple NSD servers and > distributing filesystems across these NSD servers. Also critical servers > can have HBA cards installed for direct I/O through fiber. > > Thanks > > On Mon, Jul 13, 2015 at 11:22 AM, Jason Hick wrote: > >> Hi, >> >> Yes having separate data and management networks has been critical for >> us for keeping health monitoring/communication unimpeded by data movement. >> >> Not as important, but you can also tune the networks differently >> (packet sizes, buffer sizes, SAK, etc) which can help. >> >> Jason >> >> On Jul 13, 2015, at 7:25 AM, Vic Cornell wrote: >> >> Hi Salvatore, >> >> I agree that that is what the manual - and some of the wiki entries say. >> >> However , when we have had problems (typically congestion) with >> ethernet networks in the past (20GbE or 40GbE) we have resolved them by >> setting up a separate ?Admin? network. >> >> The before and after cluster health we have seen measured in number of >> expels and waiters has been very marked. >> >> Maybe someone ?in the know? could comment on this split. >> >> Regards, >> >> Vic >> >> >> On 13 Jul 2015, at 14:29, Salvatore Di Nardo wrote: >> >> Hello Vic. >> We are currently draining our gpfs to do all the recabling to add a >> management network, but looking what the admin interface does ( man >> mmchnode ) it says something different: >> >> --admin-interface={hostname | ip_address} >> Specifies the name of the node to be used by >> GPFS administration commands when communicating between nodes. The admin >> node name must be specified as an IP >> address or a hostname that is resolved by the >> host command to the desired IP address. If the keyword DEFAULT is >> specified, the admin interface for the >> node is set to be equal to the daemon interface >> for the node. >> >> >> So, seems used only for commands propagation, hence have nothing to do >> with the node-to-node traffic. Infact the other interface description is: >> >> --daemon-interface={hostname | ip_address} >> Specifies the host name or IP address *to be >> used by the GPFS daemons for node-to-node communication*. The host name >> or IP address must refer to the commu- >> nication adapter over which the GPFS daemons >> communicate. Alias interfaces are not allowed. Use the original address or >> a name that is resolved by the >> host command to that original address. >> >> >> The "expired lease" issue and file locking mechanism a( most of our >> expells happens when 2 clients try to write in the same file) are exactly >> node-to node-comunication, so im wondering what's the point to separate >> the "admin network". I want to be sure to plan the right changes before we >> do a so massive task. We are talking about adding a new interface on 700 >> clients, so the recabling work its not small. >> >> >> Regards, >> Salvatore >> >> >> >> On 13/07/15 14:00, Vic Cornell wrote: >> >> Hi Salavatore, >> >> Does your GSS have the facility for a 1GbE ?management? network? If so >> I think that changing the ?admin? node names of the cluster members to a >> set of IPs on the management network would give you the split that you need. >> >> What about the clients? Can they also connect to a separate admin >> network? >> >> Remember that if you are using multi-cluster all of the nodes in both >> networks must share the same admin network. >> >> Kind Regards, >> >> Vic >> >> >> On 13 Jul 2015, at 13:31, Salvatore Di Nardo wrote: >> >> Anyone? >> >> On 10/07/15 11:07, Salvatore Di Nardo wrote: >> >> Hello guys. >> Quite a while ago i mentioned that we have a big expel issue on our gss >> ( first gen) and white a lot people suggested that the root cause could be >> that we use the same interface for all the traffic, and that we should >> split the data network from the admin network. Finally we could plan a >> downtime and we are migrating the data out so, i can soon safelly play with >> the change, but looking what exactly i should to do i'm a bit puzzled. Our >> mmlscluster looks like this: >> >> GPFS cluster information >> ======================== >> GPFS cluster name: GSS.ebi.ac.uk >> GPFS cluster id: 17987981184946329605 >> GPFS UID domain: GSS.ebi.ac.uk >> Remote shell command: /usr/bin/ssh >> Remote file copy command: /usr/bin/scp >> >> GPFS cluster configuration servers: >> ----------------------------------- >> Primary server: gss01a.ebi.ac.uk >> Secondary server: gss02b.ebi.ac.uk >> >> Node Daemon node name IP address Admin node name Designation >> ----------------------------------------------------------------------- >> 1 gss01a.ebi.ac.uk 10.7.28.2 gss01a.ebi.ac.uk quorum-manager >> 2 gss01b.ebi.ac.uk 10.7.28.3 gss01b.ebi.ac.uk quorum-manager >> 3 gss02a.ebi.ac.uk 10.7.28.67 gss02a.ebi.ac.uk quorum-manager >> 4 gss02b.ebi.ac.uk 10.7.28.66 gss02b.ebi.ac.uk quorum-manager >> 5 gss03a.ebi.ac.uk 10.7.28.34 gss03a.ebi.ac.uk quorum-manager >> 6 gss03b.ebi.ac.uk 10.7.28.35 gss03b.ebi.ac.uk quorum-manager >> >> >> It was my understanding that the "admin node" should use a different >> interface ( a 1g link copper should be fine), while the daemon node is >> where the data was passing , so should point to the bonded 10g interfaces. >> but when i read the mmchnode man page i start to be quite confused. It says: >> >> --daemon-interface={hostname | ip_address} >> Specifies the host name or IP address *to be >> used by the GPFS daemons for node-to-node communication*. The host name >> or IP address must refer to the communication adapter over which the GPFS >> daemons communicate. >> Alias interfaces are not allowed. Use the >> original address or a name that is resolved by the host command to that >> original address. >> >> --admin-interface={hostname | ip_address} >> Specifies the name of the node to be used by >> GPFS administration commands when communicating between nodes. The admin >> node name must be specified as an IP address or a hostname that is resolved >> by the host command >> to the desired IP address. If the keyword >> DEFAULT is specified, the admin interface for the node is set to be equal >> to the daemon interface for the node. >> >> What exactly means "node-to node-communications" ? >> Means DATA or also the "lease renew", and the token communication between >> the clients to get/steal the locks to be able to manage concurrent write to >> thr same file? >> Since we are getting expells ( especially when several clients contends >> the same file ) i assumed i have to split this type of packages from the >> data stream, but reading the documentation it looks to me that those >> internal comunication between nodes use the daemon-interface wich i suppose >> are used also for the data. so HOW exactly i can split them? >> >> >> Thanks in advance, >> Salvatore >> >> >> >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at gpfsug.orghttp://gpfsug.org/mailman/listinfo/gpfsug-discuss >> >> >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at gpfsug.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >> >> >> >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at gpfsug.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >> >> >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at gpfsug.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >> >> >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at gpfsug.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >> >> > > > -- > This communication contains confidential information intended only for the > persons to whom it is addressed. Any other distribution, copying or > disclosure is strictly prohibited. If you have received this communication > in error, please notify the sender and delete this e-mail message > immediately. > > Le pr?sent message contient des renseignements de nature confidentielle > r?serv?s uniquement ? l'usage du destinataire. Toute diffusion, > distribution, divulgation, utilisation ou reproduction de la pr?sente > communication, et de tout fichier qui y est joint, est strictement > interdite. Si vous avez re?u le pr?sent message ?lectronique par erreur, > veuillez informer imm?diatement l'exp?diteur et supprimer le message de > votre ordinateur et de votre serveur. > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at gpfsug.orghttp://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at gpfsug.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > -- This communication contains confidential information intended only for the persons to whom it is addressed. Any other distribution, copying or disclosure is strictly prohibited. If you have received this communication in error, please notify the sender and delete this e-mail message immediately. Le pr?sent message contient des renseignements de nature confidentielle r?serv?s uniquement ? l'usage du destinataire. Toute diffusion, distribution, divulgation, utilisation ou reproduction de la pr?sente communication, et de tout fichier qui y est joint, est strictement interdite. Si vous avez re?u le pr?sent message ?lectronique par erreur, veuillez informer imm?diatement l'exp?diteur et supprimer le message de votre ordinateur et de votre serveur. -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: gpfs.jpg Type: image/jpeg Size: 28904 bytes Desc: not available URL: From sdinardo at ebi.ac.uk Wed Jul 22 14:51:04 2015 From: sdinardo at ebi.ac.uk (Salvatore Di Nardo) Date: Wed, 22 Jul 2015 14:51:04 +0100 Subject: [gpfsug-discuss] data interface and management infercace. In-Reply-To: References: <559F9960.7010509@ebi.ac.uk> <55A3AF96.3060303@ebi.ac.uk> <5269A4E9-416B-4D70-AAE0-B86042FC96B9@ddn.com> <55A3BD4E.3000205@ebi.ac.uk> <55A625BE.9000809@ebi.ac.uk> Message-ID: <55AF9FC8.6050107@ebi.ac.uk> Hello, no, still didn't anything because we have to drain 2PB data , into a slower storage.. so it will take few weeks. I expect doing it the second half of August. Will let you all know the results once done and properly tested. Salvatore On 22/07/15 13:58, Muhammad Habib wrote: > did you implement it ? looks ok. All daemon traffic should be going > through black network including inter-cluster daemon traffic ( assume > black subnet routable). All data traffic should be going through the > blue network. You may need to run iptrace or tcpdump to make sure > proper network are in use. You can always open a PMR if you having > issue during the configuration . > > Thanks > > On Wed, Jul 15, 2015 at 5:19 AM, Salvatore Di Nardo > > wrote: > > Thanks for the input.. this is actually very interesting! > > Reading here: > https://www.ibm.com/developerworks/community/wikis/home?lang=en#!/wiki/General+Parallel+File+System+%28GPFS%29/page/GPFS+Network+Communication+Overview > > , > specifically the " Using more than one network" part it seems to > me that this way we should be able to split the lease/token/ping > from the data. > > Supposing that I implement a GSS cluster with only NDS and a > second cluster with only clients: > > > > As far i understood if on the NDS cluster add first the subnet > 10.20.0.0/16 and then 10.30.0.0 is should > use the internal network for all the node-to-node comunication, > leaving the 10.30.0.0/30 only for data > traffic witht he remote cluster ( the clients). Similarly, in the > client cluster, adding first 10.10.0.0/16 > and then 10.30.0.0, will guarantee than the node-to-node > comunication pass trough a different interface there the data is > passing. Since the client are just "clients" the traffic trough > 10.10.0.0/16 should be minimal (only token > ,lease, ping and so on ) and not affected by the rest. Should be > possible at this point move aldo the "admin network" on the > internal interface, so we effectively splitted all the "non data" > traffic on a dedicated interface. > > I'm wondering if I'm missing something, and in case i didn't, what > could be the real traffic in the internal (black) networks ( 1g > link its fine or i still need 10g for that). Another thing I I'm > wondering its the load of the "non data" traffic between the > clusters.. i suppose some "daemon traffic" goes trough the blue > interface for the inter-cluster communication. > > > Any thoughts ? > > Salvatore > > On 13/07/15 18:19, Muhammad Habib wrote: >> Did you look at "subnets" parameter used with "mmchconfig" >> command. I think you can use order list of subnets for daemon >> communication and then actual daemon interface can be used for >> data transfer. When the GPFS will start it will use actual >> daemon interface for communication , however , once its started , >> it will use the IPs from the subnet list whichever coming first >> in the list. To further validate , you can put network sniffer >> before you do actual implementation or alternatively you can open >> a PMR with IBM. >> >> If your cluster having expel situation , you may fine tune your >> cluster e.g. increase ping timeout period , having multiple NSD >> servers and distributing filesystems across these NSD servers. >> Also critical servers can have HBA cards installed for direct I/O >> through fiber. >> >> Thanks >> >> On Mon, Jul 13, 2015 at 11:22 AM, Jason Hick > > wrote: >> >> Hi, >> >> Yes having separate data and management networks has been >> critical for us for keeping health monitoring/communication >> unimpeded by data movement. >> >> Not as important, but you can also tune the networks >> differently (packet sizes, buffer sizes, SAK, etc) which can >> help. >> >> Jason >> >> On Jul 13, 2015, at 7:25 AM, Vic Cornell >> > wrote: >> >>> Hi Salvatore, >>> >>> I agree that that is what the manual - and some of the wiki >>> entries say. >>> >>> However , when we have had problems (typically congestion) >>> with ethernet networks in the past (20GbE or 40GbE) we have >>> resolved them by setting up a separate ?Admin? network. >>> >>> The before and after cluster health we have seen measured in >>> number of expels and waiters has been very marked. >>> >>> Maybe someone ?in the know? could comment on this split. >>> >>> Regards, >>> >>> Vic >>> >>> >>>> On 13 Jul 2015, at 14:29, Salvatore Di Nardo >>>> > wrote: >>>> >>>> Hello Vic. >>>> We are currently draining our gpfs to do all the recabling >>>> to add a management network, but looking what the admin >>>> interface does ( man mmchnode ) it says something different: >>>> >>>> --admin-interface={hostname | ip_address} >>>> Specifies the name of the node to be used by GPFS >>>> administration commands when communicating between >>>> nodes. The admin node name must be specified as an IP >>>> address or a hostname that is resolved by the host >>>> command to the desired IP address. If the keyword >>>> DEFAULT is specified, the admin interface for the >>>> node is set to be equal to the daemon interface for >>>> the node. >>>> >>>> >>>> So, seems used only for commands propagation, hence have >>>> nothing to do with the node-to-node traffic. Infact the >>>> other interface description is: >>>> >>>> --daemon-interface={hostname | ip_address} >>>> Specifies the host name or IP address _*to be used >>>> by the GPFS daemons for node-to-node >>>> communication*_. The host name or IP address must >>>> refer to the commu- >>>> nication adapter over which the GPFS daemons >>>> communicate. Alias interfaces are not allowed. Use >>>> the original address or a name that is resolved >>>> by the >>>> host command to that original address. >>>> >>>> >>>> The "expired lease" issue and file locking mechanism a( >>>> most of our expells happens when 2 clients try to write in >>>> the same file) are exactly node-to node-comunication, so >>>> im wondering what's the point to separate the "admin >>>> network". I want to be sure to plan the right changes >>>> before we do a so massive task. We are talking about adding >>>> a new interface on 700 clients, so the recabling work its >>>> not small. >>>> >>>> >>>> Regards, >>>> Salvatore >>>> >>>> >>>> >>>> On 13/07/15 14:00, Vic Cornell wrote: >>>>> Hi Salavatore, >>>>> >>>>> Does your GSS have the facility for a 1GbE ?management? >>>>> network? If so I think that changing the ?admin? node >>>>> names of the cluster members to a set of IPs on the >>>>> management network would give you the split that you need. >>>>> >>>>> What about the clients? Can they also connect to a >>>>> separate admin network? >>>>> >>>>> Remember that if you are using multi-cluster all of the >>>>> nodes in both networks must share the same admin network. >>>>> >>>>> Kind Regards, >>>>> >>>>> Vic >>>>> >>>>> >>>>>> On 13 Jul 2015, at 13:31, Salvatore Di Nardo >>>>>> > wrote: >>>>>> >>>>>> Anyone? >>>>>> >>>>>> On 10/07/15 11:07, Salvatore Di Nardo wrote: >>>>>>> Hello guys. >>>>>>> Quite a while ago i mentioned that we have a big expel >>>>>>> issue on our gss ( first gen) and white a lot people >>>>>>> suggested that the root cause could be that we use the >>>>>>> same interface for all the traffic, and that we should >>>>>>> split the data network from the admin network. Finally >>>>>>> we could plan a downtime and we are migrating the data >>>>>>> out so, i can soon safelly play with the change, but >>>>>>> looking what exactly i should to do i'm a bit puzzled. >>>>>>> Our mmlscluster looks like this: >>>>>>> >>>>>>> GPFS cluster information >>>>>>> ======================== >>>>>>> GPFS cluster name: GSS.ebi.ac.uk >>>>>>> >>>>>>> GPFS cluster id: 17987981184946329605 >>>>>>> GPFS UID domain: GSS.ebi.ac.uk >>>>>>> >>>>>>> Remote shell command: /usr/bin/ssh >>>>>>> Remote file copy command: /usr/bin/scp >>>>>>> >>>>>>> GPFS cluster configuration servers: >>>>>>> ----------------------------------- >>>>>>> Primary server: gss01a.ebi.ac.uk >>>>>>> >>>>>>> Secondary server: gss02b.ebi.ac.uk >>>>>>> >>>>>>> >>>>>>> Node Daemon node name IP address Admin >>>>>>> node name Designation >>>>>>> ----------------------------------------------------------------------- >>>>>>> 1 gss01a.ebi.ac.uk >>>>>>> 10.7.28.2 >>>>>>> gss01a.ebi.ac.uk >>>>>>> quorum-manager >>>>>>> 2 gss01b.ebi.ac.uk >>>>>>> 10.7.28.3 >>>>>>> gss01b.ebi.ac.uk >>>>>>> quorum-manager >>>>>>> 3 gss02a.ebi.ac.uk >>>>>>> 10.7.28.67 >>>>>>> gss02a.ebi.ac.uk >>>>>>> quorum-manager >>>>>>> 4 gss02b.ebi.ac.uk >>>>>>> 10.7.28.66 >>>>>>> gss02b.ebi.ac.uk >>>>>>> quorum-manager >>>>>>> 5 gss03a.ebi.ac.uk >>>>>>> 10.7.28.34 >>>>>>> gss03a.ebi.ac.uk >>>>>>> quorum-manager >>>>>>> 6 gss03b.ebi.ac.uk >>>>>>> 10.7.28.35 >>>>>>> gss03b.ebi.ac.uk >>>>>>> quorum-manager >>>>>>> >>>>>>> >>>>>>> It was my understanding that the "admin node" should use >>>>>>> a different interface ( a 1g link copper should be >>>>>>> fine), while the daemon node is where the data was >>>>>>> passing , so should point to the bonded 10g interfaces. >>>>>>> but when i read the mmchnode man page i start to be >>>>>>> quite confused. It says: >>>>>>> >>>>>>> --daemon-interface={hostname | ip_address} >>>>>>> Specifies the host name or IP address _*to be used by >>>>>>> the GPFS daemons for node-to-node communication*_. The >>>>>>> host name or IP address must refer to the communication >>>>>>> adapter over which the GPFS daemons communicate. >>>>>>> Alias interfaces are not allowed. Use the >>>>>>> original address or a name that is resolved by the host >>>>>>> command to that original address. >>>>>>> >>>>>>> --admin-interface={hostname | ip_address} >>>>>>> Specifies the name of the node to be used by GPFS >>>>>>> administration commands when communicating between >>>>>>> nodes. The admin node name must be specified as an IP >>>>>>> address or a hostname that is resolved by the host command >>>>>>> tothe desired IP address. If the keyword >>>>>>> DEFAULT is specified, the admin interface for the node >>>>>>> is set to be equal to the daemon interface for the node. >>>>>>> >>>>>>> What exactly means "node-to node-communications" ? >>>>>>> Means DATA or also the "lease renew", and the token >>>>>>> communication between the clients to get/steal the locks >>>>>>> to be able to manage concurrent write to thr same file? >>>>>>> Since we are getting expells ( especially when several >>>>>>> clients contends the same file ) i assumed i have to >>>>>>> split this type of packages from the data stream, but >>>>>>> reading the documentation it looks to me that those >>>>>>> internal comunication between nodes use the >>>>>>> daemon-interface wich i suppose are used also for the >>>>>>> data. so HOW exactly i can split them? >>>>>>> >>>>>>> >>>>>>> Thanks in advance, >>>>>>> Salvatore >>>>>>> >>>>>>> >>>>>>> >>>>>>> _______________________________________________ >>>>>>> gpfsug-discuss mailing list >>>>>>> gpfsug-discuss atgpfsug.org >>>>>>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >>>>>> >>>>>> _______________________________________________ >>>>>> gpfsug-discuss mailing list >>>>>> gpfsug-discuss at gpfsug.org >>>>>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >>>>> >>>> >>>> _______________________________________________ >>>> gpfsug-discuss mailing list >>>> gpfsug-discuss at gpfsug.org >>>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >>> >>> _______________________________________________ >>> gpfsug-discuss mailing list >>> gpfsug-discuss at gpfsug.org >>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >> >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at gpfsug.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >> >> >> >> >> -- >> This communication contains confidential information intended >> only for the persons to whom it is addressed. Any other >> distribution, copying or disclosure is strictly prohibited. If >> you have received this communication in error, please notify the >> sender and delete this e-mail message immediately. >> >> Le pr?sent message contient des renseignements de nature >> confidentielle r?serv?s uniquement ? l'usage du destinataire. >> Toute diffusion, distribution, divulgation, utilisation ou >> reproduction de la pr?sente communication, et de tout fichier qui >> y est joint, est strictement interdite. Si vous avez re?u le >> pr?sent message ?lectronique par erreur, veuillez informer >> imm?diatement l'exp?diteur et supprimer le message de votre >> ordinateur et de votre serveur. >> >> >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss atgpfsug.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at gpfsug.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > > -- > This communication contains confidential information intended only for > the persons to whom it is addressed. Any other distribution, copying > or disclosure is strictly prohibited. If you have received this > communication in error, please notify the sender and delete this > e-mail message immediately. > > Le pr?sent message contient des renseignements de nature > confidentielle r?serv?s uniquement ? l'usage du destinataire. Toute > diffusion, distribution, divulgation, utilisation ou reproduction de > la pr?sente communication, et de tout fichier qui y est joint, est > strictement interdite. Si vous avez re?u le pr?sent message > ?lectronique par erreur, veuillez informer imm?diatement l'exp?diteur > et supprimer le message de votre ordinateur et de votre serveur. > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at gpfsug.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/jpeg Size: 28904 bytes Desc: not available URL: From S.J.Thompson at bham.ac.uk Mon Jul 27 22:24:11 2015 From: S.J.Thompson at bham.ac.uk (Simon Thompson (Research Computing - IT Services)) Date: Mon, 27 Jul 2015 21:24:11 +0000 Subject: [gpfsug-discuss] GPFS and Community Scientific Cloud Message-ID: Hi Ulf, Thanks for the email, as suggested, I'm copying this to the GPFS UG mailing list as well as I'm sure the discussion is of interest to others. I guess what we're looking to do is to have arbitrary VMs running provided by users (I.e. Completely untrusted), but to provide them a way to get secure access to only their data. Right now we can't give them a GPFS client as this is too trusting, I was wondering how easy it would be for us to implement something like: User has a VM User runs 'kinit user at DOMAIN' to gain kerberos ticket and can then securely gain access to only their files from my NFS server. I also mentioned Janet ASSENT, which is a relatively recent project: https://jisc.ac.uk/assent (It was piloted as Janet Moonshot). Which builds on top of SAML to provide other software access to federation. My understanding is that site-specific UID mapping is needed (e.g. On the NFS/GPFS server). Simon >I have some experience with the following questions: > >> NFS just isn?t built for security really. I guess NFSv4 with KRB5 is >> one option to look at, with user based credentials. That might just >> about be feasible if the user were do authenticate with kinit before >> being able to access NFSv4 mounted files. I.e. Its done at the user >> level rather than the instance level. That might be an interesting >> project as a feasibility study to look at, will it work? How would >> we integrate into a federated access management system (something >> like UK Federation and ABFAB/Moonshot/Assent maybe?). Could we >> provide easy steps for a user in a VM to follow? Can we even make it >> work with Ganesha in such an environment? > > >Kerberized NFSv3 and Kerberized NFSv4 provide nearly the same level of >security. Kerberos makes the difference and not the NFS version. I have >posted some background information to the GPFS forum: >http://ibm.co/1VFLUR4 > >Kerberized NFSv4 has the advantage that it allows different UID/GID ranges >on NFS server and NFS client. I have led a proof-of-concept where we have >used this feature to provide secure data access to personalized patient >data for multiple tenants where the tenants had conflicting UID/GID >ranges. >I have some material which I will share via the GPFS forum. > >UK Federation seems to be based on SAML/Shibboleth. Unfortunately there is >no easy integration of network file protocols such as NFS and SMB and >SAML/Shibboleth, because file protocols require attributes which are >typically not stored in SAML/Shibboleth. Fortunately I provided technical >guidance to a customer who exactly implemented this integration in order >to >provide secure file service to multiple universities, again with >conflicting UID/GID ranges. I need some time to write it up and publish >it. From martin.gasthuber at desy.de Tue Jul 28 17:28:44 2015 From: martin.gasthuber at desy.de (Martin Gasthuber) Date: Tue, 28 Jul 2015 18:28:44 +0200 Subject: [gpfsug-discuss] fast ACL alter solution Message-ID: Hi, since a few months we're running a new infrastructure, with the core built on GPFS (4.1.0.8), for 'light source - X-Rays' experiments local at the site. The system is used for the data acquisition chain, data analysis, data exports and archive. Right now we got new detector types (homebuilt, experimental) generating millions of small files - the last run produced ~9 million files at 64 to 128K in size ;-). In our setup, the files gets copied to a (user accessible) GPFS instance which controls the access by NFSv4 ACLs (only !) and from time to time, we had to modify these ACLs (add/remove user/group etc.). Doing a (non policy-run based) simple approach, changing 9 million files requires ~200 hours to run - which we consider not really a good option. Running mmgetacl/mmputacl whithin a policy-run will clearly speed that up - but the biggest time consuming operations are the get and put ACL ops. Is anybody aware of any faster ACL access operation (whithin the policy-run) - or even a 'mod-acl' operation ? best regards, Martin From jonathan at buzzard.me.uk Tue Jul 28 19:06:30 2015 From: jonathan at buzzard.me.uk (Jonathan Buzzard) Date: Tue, 28 Jul 2015 19:06:30 +0100 Subject: [gpfsug-discuss] fast ACL alter solution In-Reply-To: References: Message-ID: <55B7C4A6.9020205@buzzard.me.uk> On 28/07/15 17:28, Martin Gasthuber wrote: > Hi, > > since a few months we're running a new infrastructure, with the core > built on GPFS (4.1.0.8), for 'light source - X-Rays' experiments > local at the site. The system is used for the data acquisition chain, > data analysis, data exports and archive. Right now we got new > detector types (homebuilt, experimental) generating millions of small > files - the last run produced ~9 million files at 64 to 128K in size > ;-). In our setup, the files gets copied to a (user accessible) GPFS > instance which controls the access by NFSv4 ACLs (only !) and from > time to time, we had to modify these ACLs (add/remove user/group > etc.). Doing a (non policy-run based) simple approach, changing 9 > million files requires ~200 hours to run - which we consider not > really a good option. Running mmgetacl/mmputacl whithin a policy-run > will clearly speed that up - but the biggest time consuming > operations are the get and put ACL ops. Is anybody aware of any > faster ACL access operation (whithin the policy-run) - or even a > 'mod-acl' operation ? > In the past IBM have said that their expectations are that the ACL's are set via Windows on remote workstations and not from the command line on the GPFS servers themselves!!! Crazy I know. There really needs to be a mm version of the NFSv4 setfacl/nfs4_getfacl commands that ideally makes use of the fast inode traversal features to make things better. In the past I wrote some C code that set specific ACL's on files. This however was to deal with migrating files onto a system and needed to set initial ACL's and didn't make use of the fast traversal features and is completely unpolished. A good starting point would probably be the FreeBSD setfacl/getfacl tools, that at least was my plan but I have never gotten around to it. JAB. -- Jonathan A. Buzzard Email: jonathan (at) buzzard.me.uk Fife, United Kingdom. From TROPPENS at de.ibm.com Wed Jul 29 09:02:59 2015 From: TROPPENS at de.ibm.com (Ulf Troppens) Date: Wed, 29 Jul 2015 10:02:59 +0200 Subject: [gpfsug-discuss] GPFS and Community Scientific Cloud In-Reply-To: References: Message-ID: Hi Simon, I have started to draft a response, but it gets longer and longer. I need some more time to respond. Best regards, Ulf. -- IBM Spectrum Scale Development - Client Engagements & Solutions Delivery Consulting IT Specialist Author "Storage Networks Explained" IBM Deutschland Research & Development GmbH Vorsitzende des Aufsichtsrats: Martina Koederitz Gesch?ftsf?hrung: Dirk Wittkopp Sitz der Gesellschaft: B?blingen / Registergericht: Amtsgericht Stuttgart, HRB 243294 From: "Simon Thompson (Research Computing - IT Services)" To: gpfsug main discussion list Date: 27.07.2015 23:24 Subject: Re: [gpfsug-discuss] GPFS and Community Scientific Cloud Sent by: gpfsug-discuss-bounces at gpfsug.org Hi Ulf, Thanks for the email, as suggested, I'm copying this to the GPFS UG mailing list as well as I'm sure the discussion is of interest to others. I guess what we're looking to do is to have arbitrary VMs running provided by users (I.e. Completely untrusted), but to provide them a way to get secure access to only their data. Right now we can't give them a GPFS client as this is too trusting, I was wondering how easy it would be for us to implement something like: User has a VM User runs 'kinit user at DOMAIN' to gain kerberos ticket and can then securely gain access to only their files from my NFS server. I also mentioned Janet ASSENT, which is a relatively recent project: https://jisc.ac.uk/assent (It was piloted as Janet Moonshot). Which builds on top of SAML to provide other software access to federation. My understanding is that site-specific UID mapping is needed (e.g. On the NFS/GPFS server). Simon >I have some experience with the following questions: > >> NFS just isn?t built for security really. I guess NFSv4 with KRB5 is >> one option to look at, with user based credentials. That might just >> about be feasible if the user were do authenticate with kinit before >> being able to access NFSv4 mounted files. I.e. Its done at the user >> level rather than the instance level. That might be an interesting >> project as a feasibility study to look at, will it work? How would >> we integrate into a federated access management system (something >> like UK Federation and ABFAB/Moonshot/Assent maybe?). Could we >> provide easy steps for a user in a VM to follow? Can we even make it >> work with Ganesha in such an environment? > > >Kerberized NFSv3 and Kerberized NFSv4 provide nearly the same level of >security. Kerberos makes the difference and not the NFS version. I have >posted some background information to the GPFS forum: >http://ibm.co/1VFLUR4 > >Kerberized NFSv4 has the advantage that it allows different UID/GID ranges >on NFS server and NFS client. I have led a proof-of-concept where we have >used this feature to provide secure data access to personalized patient >data for multiple tenants where the tenants had conflicting UID/GID >ranges. >I have some material which I will share via the GPFS forum. > >UK Federation seems to be based on SAML/Shibboleth. Unfortunately there is >no easy integration of network file protocols such as NFS and SMB and >SAML/Shibboleth, because file protocols require attributes which are >typically not stored in SAML/Shibboleth. Fortunately I provided technical >guidance to a customer who exactly implemented this integration in order >to >provide secure file service to multiple universities, again with >conflicting UID/GID ranges. I need some time to write it up and publish >it. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss From chair at gpfsug.org Thu Jul 30 21:36:07 2015 From: chair at gpfsug.org (chair-gpfsug.org) Date: Thu, 30 Jul 2015 21:36:07 +0100 Subject: [gpfsug-discuss] July Meet the devs Message-ID: I've heard some great feedback about the July meet the devs held at IBM Warwick this week. Thanks to Ross and Patrick at IBM and Clare for coordinating the registration for this! Jez has a few photos so we'll try and get those uploaded in the next week or so to the website. Simon (GPFS UG Chair) From secretary at gpfsug.org Wed Jul 1 09:00:51 2015 From: secretary at gpfsug.org (Secretary GPFS UG) Date: Wed, 01 Jul 2015 09:00:51 +0100 Subject: [gpfsug-discuss] Meet the Developers Message-ID: Dear All, We are planning the next 'Meet the Devs' event for Wednesday 29th July, 11am-3pm. Depending on interest, we are looking to hold in either Manchester or Warwick. The agenda promises to be hands on and give you the opportunity to speak face to face with the developers of GPFS. Guideline agenda: * Data analytic workloads - development to show and tell UK work on establishing use cases and tighter integration of Spark on top of GPFS * Show the GUI coming in 4.2 * Discuss 4.2 and beyond roadmap * How would you like IP management to work for protocol access? * Optional - Team can demo & discuss NFS/SMB/Object integration into Scale Lunch and refreshments will be provided. Please can you let me know by email if you are interested in attending along with your preferred venue by Friday 10th July. Thanks and we hope to see you there! -- Claire O'Toole (n?e Robson) GPFS User Group Secretary +44 (0)7508 033896 From chair at gpfsug.org Wed Jul 1 09:21:03 2015 From: chair at gpfsug.org (GPFS UG Chair) Date: Wed, 1 Jul 2015 09:21:03 +0100 Subject: [gpfsug-discuss] mailing list change Message-ID: Hi All, We've made a change to the mailing list so that only subscribers are now able to post to the list. We've done this as we've been getting a *lot* of spam held for moderation from non-members and the occasional legitimate post was getting lost in the spam. If you or colleagues routinely post from a different address from that subscribed to the list, you'll now need to be subscribed (you'll get an error back from the list when you try to post). As its a mailman list, if you do want to have multiple addresses subscribed, you can of course disable the address from the mailman interface from receiving posts. Simon -------------- next part -------------- An HTML attachment was scrubbed... URL: From oehmes at us.ibm.com Wed Jul 1 15:21:29 2015 From: oehmes at us.ibm.com (Sven Oehme) Date: Wed, 1 Jul 2015 07:21:29 -0700 Subject: [gpfsug-discuss] GPFS 4.1.1 without QoS for mmrestripefs? In-Reply-To: <2CDF270206A255459AC4FA6B08E52AF90114634DD0@ABCSYSEXC1.abcsystems.ch> References: <2CDF270206A255459AC4FA6B08E52AF90114634DD0@ABCSYSEXC1.abcsystems.ch> Message-ID: <201507011422.t61EMZmw011626@d01av01.pok.ibm.com> Daniel, as you know, we can't discuss future / confidential items on a mailing list. what i presented as an outlook to future releases hasn't changed from a technical standpoint, we just can't share a release date until we announce it official. there are multiple ways today to limit the impact on restripe and other tasks, the best way to do this is to run the task ( using -N) on a node (or very small number of nodes) that has no performance critical role. while this is not perfect, it should limit the impact significantly. . sven ------------------------------------------ Sven Oehme Scalable Storage Research email: oehmes at us.ibm.com Phone: +1 (408) 824-8904 IBM Almaden Research Lab ------------------------------------------ From: Daniel Vogel To: "'gpfsug-discuss at gpfsug.org'" Date: 07/01/2015 03:29 AM Subject: [gpfsug-discuss] GPFS 4.1.1 without QoS for mmrestripefs? Sent by: gpfsug-discuss-bounces at gpfsug.org Hi Years ago, IBM made some plan to do a implementation ?QoS for mmrestripefs, mmdeldisk??. If a ?mmfsrestripe? is running, very poor performance for NFS access. I opened a PMR to ask for QoS in version 4.1.1 (Spectrum Scale). PMR 61309,113,848: I discussed the question of QOS with the development team. These command changes that were noticed are not meant to be used as GA code which is why they are not documented. I cannot provide any further information from the support perspective. Anybody knows about QoS? The last hope was at ?GPFS Workshop Stuttgart M?rz 2015? with Sven Oehme as speaker. Daniel Vogel IT Consultant ABC SYSTEMS AG Hauptsitz Z?rich R?tistrasse 28 CH - 8952 Schlieren T +41 43 433 6 433 D +41 43 433 6 467 http://www.abcsystems.ch ABC - Always Better Concepts. Approved By Customers since 1981. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: From Kevin.Buterbaugh at Vanderbilt.Edu Wed Jul 1 15:32:50 2015 From: Kevin.Buterbaugh at Vanderbilt.Edu (Buterbaugh, Kevin L) Date: Wed, 1 Jul 2015 14:32:50 +0000 Subject: [gpfsug-discuss] GPFS 4.1.1 without QoS for mmrestripefs? In-Reply-To: <201507011422.t61EMZmw011626@d01av01.pok.ibm.com> References: <2CDF270206A255459AC4FA6B08E52AF90114634DD0@ABCSYSEXC1.abcsystems.ch> <201507011422.t61EMZmw011626@d01av01.pok.ibm.com> Message-ID: Sven, It?s been a while since I tried that, but the last time I tried to limit the impact of a restripe by only running it on a few NSD server nodes it made things worse. Everybody was as slowed down as they would?ve been if I?d thrown every last NSD server we have at it and they were slowed down for longer, since using fewer NSD servers meant the restripe ran longer. What we do is always kick off restripes on a Friday afternoon, throw every NSD server we have at them, and let them run over the weekend. Interactive use is lower then and people don?t notice or care if their batch jobs run longer. Of course, this is all just my experiences. YMMV... Kevin On Jul 1, 2015, at 9:21 AM, Sven Oehme > wrote: Daniel, as you know, we can't discuss future / confidential items on a mailing list. what i presented as an outlook to future releases hasn't changed from a technical standpoint, we just can't share a release date until we announce it official. there are multiple ways today to limit the impact on restripe and other tasks, the best way to do this is to run the task ( using -N) on a node (or very small number of nodes) that has no performance critical role. while this is not perfect, it should limit the impact significantly. . sven ------------------------------------------ Sven Oehme Scalable Storage Research email: oehmes at us.ibm.com Phone: +1 (408) 824-8904 IBM Almaden Research Lab ------------------------------------------ Daniel Vogel ---07/01/2015 03:29:11 AM---Hi Years ago, IBM made some plan to do a implementation "QoS for mmrestripefs, mmdeldisk...". If a " From: Daniel Vogel > To: "'gpfsug-discuss at gpfsug.org'" > Date: 07/01/2015 03:29 AM Subject: [gpfsug-discuss] GPFS 4.1.1 without QoS for mmrestripefs? Sent by: gpfsug-discuss-bounces at gpfsug.org ________________________________ Hi Years ago, IBM made some plan to do a implementation ?QoS for mmrestripefs, mmdeldisk??. If a ?mmfsrestripe? is running, very poor performance for NFS access. I opened a PMR to ask for QoS in version 4.1.1 (Spectrum Scale). PMR 61309,113,848: I discussed the question of QOS with the development team. These command changes that were noticed are not meant to be used as GA code which is why they are not documented. I cannot provide any further information from the support perspective. Anybody knows about QoS? The last hope was at ?GPFS Workshop Stuttgart M?rz 2015? with Sven Oehme as speaker. Daniel Vogel IT Consultant ABC SYSTEMS AG Hauptsitz Z?rich R?tistrasse 28 CH - 8952 Schlieren T +41 43 433 6 433 D +41 43 433 6 467 http://www.abcsystems.ch ABC - Always Better Concepts. Approved By Customers since 1981. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss ? Kevin Buterbaugh - Senior System Administrator Vanderbilt University - Advanced Computing Center for Research and Education Kevin.Buterbaugh at vanderbilt.edu - (615)875-9633 -------------- next part -------------- An HTML attachment was scrubbed... URL: From chris.hunter at yale.edu Wed Jul 1 16:52:07 2015 From: chris.hunter at yale.edu (Chris Hunter) Date: Wed, 01 Jul 2015 11:52:07 -0400 Subject: [gpfsug-discuss] gpfs rdma expels Message-ID: <55940CA7.9010506@yale.edu> Hi UG list, We have a large rdma/tcp multi-cluster gpfs filesystem, about 2/3 of clients use RDMA. We see a large number of expels of rdma clients but less of the tcp clients. Most of the gpfs config is at defaults. We are unclear if any of the non-RDMA config items (eg. Idle socket timeout) would help our issue. Any suggestions on gpfs config parameters we should investigate ? thank-you in advance, chris hunter yale hpc group From viccornell at gmail.com Wed Jul 1 16:58:31 2015 From: viccornell at gmail.com (Vic Cornell) Date: Wed, 1 Jul 2015 16:58:31 +0100 Subject: [gpfsug-discuss] gpfs rdma expels In-Reply-To: <55940CA7.9010506@yale.edu> References: <55940CA7.9010506@yale.edu> Message-ID: <6E28C0FB-2F99-4127-B1F2-272BA2532330@gmail.com> If it used to work then its probably not config. Most expels are the result of network connectivity problems. If your cluster is not too big try looking at ping from every node to every other node and look for large latencies. Also look to see who is expelling who. Ie - if your RDMA nodes are being expelled by non-RDMA nodes. It may point to a weakness in your network which GPFS ,being as it is a great finder of weaknesses, is having a problem with. Also more details (network config etc) will elicit more detailed suggestions. Cheers, Vic > On 1 Jul 2015, at 16:52, Chris Hunter wrote: > > Hi UG list, > We have a large rdma/tcp multi-cluster gpfs filesystem, about 2/3 of clients use RDMA. We see a large number of expels of rdma clients but less of the tcp clients. > Most of the gpfs config is at defaults. We are unclear if any of the non-RDMA config items (eg. Idle socket timeout) would help our issue. Any suggestions on gpfs config parameters we should investigate ? > > thank-you in advance, > chris hunter > yale hpc group > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at gpfsug.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss From stijn.deweirdt at ugent.be Thu Jul 2 07:42:30 2015 From: stijn.deweirdt at ugent.be (Stijn De Weirdt) Date: Thu, 02 Jul 2015 08:42:30 +0200 Subject: [gpfsug-discuss] gpfs rdma expels In-Reply-To: <6E28C0FB-2F99-4127-B1F2-272BA2532330@gmail.com> References: <55940CA7.9010506@yale.edu> <6E28C0FB-2F99-4127-B1F2-272BA2532330@gmail.com> Message-ID: <5594DD56.6010302@ugent.be> do you use ipoib for the rdma nodes or regular ethernet? and what OS are you on? we had issue with el7.1 kernel and ipoib; there's packet loss with ipoib and mlnx_ofed (and mlnx engineering told that it might be in basic ofed from el7.1 too). 7.0 kernels are ok) and client expels were the first signs on our setup. stijn On 07/01/2015 05:58 PM, Vic Cornell wrote: > If it used to work then its probably not config. Most expels are the result of network connectivity problems. > > If your cluster is not too big try looking at ping from every node to every other node and look for large latencies. > > Also look to see who is expelling who. Ie - if your RDMA nodes are being expelled by non-RDMA nodes. It may point to a weakness in your network which GPFS ,being as it is a great finder of weaknesses, is having a problem with. > > Also more details (network config etc) will elicit more detailed suggestions. > > Cheers, > > Vic > > > >> On 1 Jul 2015, at 16:52, Chris Hunter wrote: >> >> Hi UG list, >> We have a large rdma/tcp multi-cluster gpfs filesystem, about 2/3 of clients use RDMA. We see a large number of expels of rdma clients but less of the tcp clients. >> Most of the gpfs config is at defaults. We are unclear if any of the non-RDMA config items (eg. Idle socket timeout) would help our issue. Any suggestions on gpfs config parameters we should investigate ? >> >> thank-you in advance, >> chris hunter >> yale hpc group >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at gpfsug.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at gpfsug.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > From Daniel.Vogel at abcsystems.ch Thu Jul 2 08:12:32 2015 From: Daniel.Vogel at abcsystems.ch (Daniel Vogel) Date: Thu, 2 Jul 2015 07:12:32 +0000 Subject: [gpfsug-discuss] GPFS 4.1.1 without QoS for mmrestripefs? In-Reply-To: <201507011422.t61EMZmw011626@d01av01.pok.ibm.com> References: <2CDF270206A255459AC4FA6B08E52AF90114634DD0@ABCSYSEXC1.abcsystems.ch> <201507011422.t61EMZmw011626@d01av01.pok.ibm.com> Message-ID: <2CDF270206A255459AC4FA6B08E52AF901146351BD@ABCSYSEXC1.abcsystems.ch> Sven, Yes I agree, but ?using ?N? to reduce the load helps not really. If I use NFS, for example, as a ESX data store, ESX I/O latency for NFS goes very high, the VM?s hangs. By the way I use SSD PCIe cards, perfect ?mirror speed? but slow I/O on NFS. The GPFS cluster concept I use are different than GSS or traditional FC (shared storage). I use shared nothing with IB (no FPO), many GPFS nodes with NSD?s. I know the need to resync the FS with mmchdisk / mmrestripe will happen more often. The only one feature will help is QoS for the GPFS admin jobs. I hope we are not fare away from this. Thanks, Daniel Von: gpfsug-discuss-bounces at gpfsug.org [mailto:gpfsug-discuss-bounces at gpfsug.org] Im Auftrag von Sven Oehme Gesendet: Mittwoch, 1. Juli 2015 16:21 An: gpfsug main discussion list Betreff: Re: [gpfsug-discuss] GPFS 4.1.1 without QoS for mmrestripefs? Daniel, as you know, we can't discuss future / confidential items on a mailing list. what i presented as an outlook to future releases hasn't changed from a technical standpoint, we just can't share a release date until we announce it official. there are multiple ways today to limit the impact on restripe and other tasks, the best way to do this is to run the task ( using -N) on a node (or very small number of nodes) that has no performance critical role. while this is not perfect, it should limit the impact significantly. . sven ------------------------------------------ Sven Oehme Scalable Storage Research email: oehmes at us.ibm.com Phone: +1 (408) 824-8904 IBM Almaden Research Lab ------------------------------------------ [Inactive hide details for Daniel Vogel ---07/01/2015 03:29:11 AM---Hi Years ago, IBM made some plan to do a implementation "QoS]Daniel Vogel ---07/01/2015 03:29:11 AM---Hi Years ago, IBM made some plan to do a implementation "QoS for mmrestripefs, mmdeldisk...". If a " From: Daniel Vogel > To: "'gpfsug-discuss at gpfsug.org'" > Date: 07/01/2015 03:29 AM Subject: [gpfsug-discuss] GPFS 4.1.1 without QoS for mmrestripefs? Sent by: gpfsug-discuss-bounces at gpfsug.org ________________________________ Hi Years ago, IBM made some plan to do a implementation ?QoS for mmrestripefs, mmdeldisk??. If a ?mmfsrestripe? is running, very poor performance for NFS access. I opened a PMR to ask for QoS in version 4.1.1 (Spectrum Scale). PMR 61309,113,848: I discussed the question of QOS with the development team. These command changes that were noticed are not meant to be used as GA code which is why they are not documented. I cannot provide any further information from the support perspective. Anybody knows about QoS? The last hope was at ?GPFS Workshop Stuttgart M?rz 2015? with Sven Oehme as speaker. Daniel Vogel IT Consultant ABC SYSTEMS AG Hauptsitz Z?rich R?tistrasse 28 CH - 8952 Schlieren T +41 43 433 6 433 D +41 43 433 6 467 http://www.abcsystems.ch ABC - Always Better Concepts. Approved By Customers since 1981. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image001.gif Type: image/gif Size: 105 bytes Desc: image001.gif URL: From chris.howarth at citi.com Thu Jul 2 08:24:37 2015 From: chris.howarth at citi.com (Howarth, Chris ) Date: Thu, 2 Jul 2015 07:24:37 +0000 Subject: [gpfsug-discuss] GPFS 4.1.1 without QoS for mmrestripefs? In-Reply-To: <2CDF270206A255459AC4FA6B08E52AF901146351BD@ABCSYSEXC1.abcsystems.ch> References: <2CDF270206A255459AC4FA6B08E52AF90114634DD0@ABCSYSEXC1.abcsystems.ch> <201507011422.t61EMZmw011626@d01av01.pok.ibm.com> <2CDF270206A255459AC4FA6B08E52AF901146351BD@ABCSYSEXC1.abcsystems.ch> Message-ID: <0609A0AC1B1CA9408D88D4144C5C990B75D89CF5@EXLNMB52.eur.nsroot.net> Daniel ?in our environment we have data and metadata split out onto separate drives in separate servers. We also set the GPFS parameter ?mmchconfig defaultHelperNodes=?list_of_metadata_servers? which will automatically only use these nodes for the scan for restriping/rebalancing data (rather than having to specify the ?N option). This dramatically reduced the impact to clients accessing the data nodes while these activities are taking place. Also using SSDs for metadata nodes can make a big improvement. Chris From: gpfsug-discuss-bounces at gpfsug.org [mailto:gpfsug-discuss-bounces at gpfsug.org] On Behalf Of Daniel Vogel Sent: Thursday, July 02, 2015 8:13 AM To: 'gpfsug main discussion list' Subject: Re: [gpfsug-discuss] GPFS 4.1.1 without QoS for mmrestripefs? Sven, Yes I agree, but ?using ?N? to reduce the load helps not really. If I use NFS, for example, as a ESX data store, ESX I/O latency for NFS goes very high, the VM?s hangs. By the way I use SSD PCIe cards, perfect ?mirror speed? but slow I/O on NFS. The GPFS cluster concept I use are different than GSS or traditional FC (shared storage). I use shared nothing with IB (no FPO), many GPFS nodes with NSD?s. I know the need to resync the FS with mmchdisk / mmrestripe will happen more often. The only one feature will help is QoS for the GPFS admin jobs. I hope we are not fare away from this. Thanks, Daniel Von: gpfsug-discuss-bounces at gpfsug.org [mailto:gpfsug-discuss-bounces at gpfsug.org] Im Auftrag von Sven Oehme Gesendet: Mittwoch, 1. Juli 2015 16:21 An: gpfsug main discussion list Betreff: Re: [gpfsug-discuss] GPFS 4.1.1 without QoS for mmrestripefs? Daniel, as you know, we can't discuss future / confidential items on a mailing list. what i presented as an outlook to future releases hasn't changed from a technical standpoint, we just can't share a release date until we announce it official. there are multiple ways today to limit the impact on restripe and other tasks, the best way to do this is to run the task ( using -N) on a node (or very small number of nodes) that has no performance critical role. while this is not perfect, it should limit the impact significantly. . sven ------------------------------------------ Sven Oehme Scalable Storage Research email: oehmes at us.ibm.com Phone: +1 (408) 824-8904 IBM Almaden Research Lab ------------------------------------------ [Inactive hide details for Daniel Vogel ---07/01/2015 03:29:11 AM---Hi Years ago, IBM made some plan to do a implementation "QoS]Daniel Vogel ---07/01/2015 03:29:11 AM---Hi Years ago, IBM made some plan to do a implementation "QoS for mmrestripefs, mmdeldisk...". If a " From: Daniel Vogel > To: "'gpfsug-discuss at gpfsug.org'" > Date: 07/01/2015 03:29 AM Subject: [gpfsug-discuss] GPFS 4.1.1 without QoS for mmrestripefs? Sent by: gpfsug-discuss-bounces at gpfsug.org ________________________________ Hi Years ago, IBM made some plan to do a implementation ?QoS for mmrestripefs, mmdeldisk??. If a ?mmfsrestripe? is running, very poor performance for NFS access. I opened a PMR to ask for QoS in version 4.1.1 (Spectrum Scale). PMR 61309,113,848: I discussed the question of QOS with the development team. These command changes that were noticed are not meant to be used as GA code which is why they are not documented. I cannot provide any further information from the support perspective. Anybody knows about QoS? The last hope was at ?GPFS Workshop Stuttgart M?rz 2015? with Sven Oehme as speaker. Daniel Vogel IT Consultant ABC SYSTEMS AG Hauptsitz Z?rich R?tistrasse 28 CH - 8952 Schlieren T +41 43 433 6 433 D +41 43 433 6 467 http://www.abcsystems.ch ABC - Always Better Concepts. Approved By Customers since 1981. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image001.gif Type: image/gif Size: 105 bytes Desc: image001.gif URL: From chris.hunter at yale.edu Thu Jul 2 14:01:53 2015 From: chris.hunter at yale.edu (Chris Hunter) Date: Thu, 02 Jul 2015 09:01:53 -0400 Subject: [gpfsug-discuss] gpfs rdma expels Message-ID: <55953641.4010701@yale.edu> Thanks for the feedback. Our network is non-uniform, we have three (uniform) rdma networks connected by narrow uplinks. Previously we used gpfs on one network, now we wish to expand to the other networks. Previous experience shows we see "PortXmitWait" messages from traffic over the narrow uplinks. We find expels happen often from gpfs communication over the narrow uplinks. We acknowledge an inherent weakness with narrow uplinks but for practical reasons it would be difficult to resolve. So the question, is it possible to configure gpfs to be tolerant of non-uniform networks with narrow uplinks ? thanks, chris hunter > On 1 Jul 2015, at 16:52, Chris Hunter wrote: > > Hi UG list, > We have a large rdma/tcp multi-cluster gpfs filesystem, about 2/3 of > clients use RDMA. We see a large number of expels of rdma clients but > less of the tcp clients. Most of the gpfs config is at defaults. We > are unclear if any of the non-RDMA config items (eg. Idle socket > timeout) would help our issue. Any suggestions on gpfs config > parameters we should investigate ? From S.J.Thompson at bham.ac.uk Thu Jul 2 16:43:03 2015 From: S.J.Thompson at bham.ac.uk (Simon Thompson (Research Computing - IT Services)) Date: Thu, 2 Jul 2015 15:43:03 +0000 Subject: [gpfsug-discuss] 4.1.1 protocol support Message-ID: Hi, Just wondering if anyone has looked at the new protocol support stuff in 4.1.1 yet? >From what I can see, it wants to use the installer to add things like IBM Samba onto nodes in the cluster. The docs online seem to list manual installation as running the chef template, which is hardly manual... 1. Id like to know what is being run on my cluster 2. Its an existing install which was using sernet samba, so I don't want to go out and break anything inadvertently 3. My protocol nodes are in a multicluster, and I understand the installer doesn't support multicluster. (the docs state that multicluster isn't supported but something like its expected to work). So... Has anyone had a go at this yet and have a set of steps? I've started unpicking the chef recipe, but just wondering if anyone had already had a go at this? (and lets not start on the mildy bemusing error when you "enable" the service with "mmces service enable" (ces service not enabled) - there's other stuff to enable it)... Simon From GARWOODM at uk.ibm.com Thu Jul 2 16:55:42 2015 From: GARWOODM at uk.ibm.com (Michael Garwood7) Date: Thu, 2 Jul 2015 16:55:42 +0100 Subject: [gpfsug-discuss] 4.1.1 protocol support In-Reply-To: References: Message-ID: Hi Simon, 1. Most of the chef recipes involve installing the various packages required for the protocols, and some of the new performance monitoring packages required for mmperfquery. There is a series of steps for proper manual install at http://www-01.ibm.com/support/knowledgecenter/STXKQY_4.1.1/com.ibm.spectrum.scale.v4r11.adv.doc/bl1adv_ces_features.htm but this assumes you have all IBM Samba RPMs and prerequisites installed. The recipes *should* be split out so that at the very least, RPM install is done in its own recipe without configuring or enabling anything... 2. I am not 100% sure what deploying IBM Samba on the cluster will do with regards to sernet samba. As far as I am aware there is no code in the installer or chef recipes to check for other samba deployments running but I may be mistaken. Depending on how sernet samba hooks to GPFS I can't think of any reason why it would cause problems aside from the risk of the protocols not communicating and causing issues with file locks/data overwrites, depending on what workload you have running on samba. 3. I haven't personally seen multicluster deployments done or tested before, but no, it is not officially supported. The installer has been written with the assumption that you are installing to one cluster, so I wouldn't recommend trying with multiple clusters - unforseen consequences :) Regards, Michael Garwood IBM Systems Developer Phone: 44-161-905-4118 E-mail: GARWOODM at uk.ibm.com 40 Blackfriars Street Manchester, M3 2EG United Kingdom IBM United Kingdom Limited Registered in England and Wales with number 741598 Registered office: PO Box 41, North Harbour, Portsmouth, Hants. PO6 3AU From: "Simon Thompson (Research Computing - IT Services)" To: gpfsug main discussion list , Date: 02/07/2015 16:43 Subject: [gpfsug-discuss] 4.1.1 protocol support Sent by: gpfsug-discuss-bounces at gpfsug.org Hi, Just wondering if anyone has looked at the new protocol support stuff in 4.1.1 yet? >From what I can see, it wants to use the installer to add things like IBM Samba onto nodes in the cluster. The docs online seem to list manual installation as running the chef template, which is hardly manual... 1. Id like to know what is being run on my cluster 2. Its an existing install which was using sernet samba, so I don't want to go out and break anything inadvertently 3. My protocol nodes are in a multicluster, and I understand the installer doesn't support multicluster. (the docs state that multicluster isn't supported but something like its expected to work). So... Has anyone had a go at this yet and have a set of steps? I've started unpicking the chef recipe, but just wondering if anyone had already had a go at this? (and lets not start on the mildy bemusing error when you "enable" the service with "mmces service enable" (ces service not enabled) - there's other stuff to enable it)... Simon _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss Unless stated otherwise above: IBM United Kingdom Limited - Registered in England and Wales with number 741598. Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU -------------- next part -------------- An HTML attachment was scrubbed... URL: From oester at gmail.com Thu Jul 2 17:02:01 2015 From: oester at gmail.com (Bob Oesterlin) Date: Thu, 2 Jul 2015 11:02:01 -0500 Subject: [gpfsug-discuss] 4.1.1 protocol support In-Reply-To: References: Message-ID: Hi Simon I was part of a beta program for GPFS (ok, better start saying Spectrum Scale!) 4.1.1, so I've had some experience with the toolkit that installs the protocol nodes. The new protocol nodes MUST be RH7, so it's going to be a bit more of an involved process to migrate to this level than in the past. The GPFS server nodes/client nodes can remain at RH6 is needed. Overall it works pretty well. You do have the option of doing things manually as well. The guide that describes it is pretty good. If you want to discuss the process in detail, I'd be happy to do so - a bit too much to cover over a mailing list. Bob Oesterlin Sr Storage Engineer, Nuance Communications robert.oesterlin at nuance.com On Thu, Jul 2, 2015 at 10:43 AM, Simon Thompson (Research Computing - IT Services) wrote: > Hi, > > Just wondering if anyone has looked at the new protocol support stuff in > 4.1.1 yet? > > From what I can see, it wants to use the installer to add things like IBM > Samba onto nodes in the cluster. The docs online seem to list manual > installation as running the chef template, which is hardly manual... > > 1. Id like to know what is being run on my cluster > 2. Its an existing install which was using sernet samba, so I don't want > to go out and break anything inadvertently > 3. My protocol nodes are in a multicluster, and I understand the installer > doesn't support multicluster. > > (the docs state that multicluster isn't supported but something like its > expected to work). > > So... Has anyone had a go at this yet and have a set of steps? > > I've started unpicking the chef recipe, but just wondering if anyone had > already had a go at this? > > (and lets not start on the mildy bemusing error when you "enable" the > service with "mmces service enable" (ces service not enabled) - there's > other stuff to enable it)... > > Simon > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at gpfsug.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: From S.J.Thompson at bham.ac.uk Thu Jul 2 19:52:28 2015 From: S.J.Thompson at bham.ac.uk (Simon Thompson (Research Computing - IT Services)) Date: Thu, 2 Jul 2015 18:52:28 +0000 Subject: [gpfsug-discuss] 4.1.1 protocol support In-Reply-To: References: Message-ID: Hi Michael, Thanks for that link. This is the docs I?d found before: http://www-01.ibm.com/support/knowledgecenter/#!/STXKQY_4.1.1/com.ibm.spectrum.scale.v4r11.ins.doc/bl1ins_manualprotocols.htm I guess one of the reasons for wanting to unpick is because we already have configuration management tools all in place. I have no issue about GPFS config being inside GPFS, but we really need to know what is going on (and we can manage to get the RPMs all on etc if we know what is needed from the config management tool). I do note that it needs CCR enabled, which we currently don?t have. Now I think this was because we saw issues with mmsdrestore when adding a node that had been reinstalled back into the cluster. I need to check if that is still the case (we work on being able to pull clients, NSDs etc from the cluster and using xcat to reprovision and the a config tool to do the relevant bits to rejoin the cluster ? makes it easier for us to stage kernel, GPFS, OFED updates as we just blat on a new image). I don?t really want to have a mix of Sernet and IBM samba on there, so am happy to pull out those bits, but obviously need to get the IBM bits working as well. Multicluster ? well, our ?protocol? cluster is a separate cluster from the NSD cluster (can?t remote expel, might want to add other GPFS clusters to the protocol layer etc). Of course the multi cluster talks GPFS protocol, so I don?t see any reason why it shouldn?t work, but yes, noted its not supported. Simon From: Michael Garwood7 > Reply-To: gpfsug main discussion list > Date: Thursday, 2 July 2015 16:55 To: gpfsug main discussion list > Subject: Re: [gpfsug-discuss] 4.1.1 protocol support Hi Simon, 1. Most of the chef recipes involve installing the various packages required for the protocols, and some of the new performance monitoring packages required for mmperfquery. There is a series of steps for proper manual install at http://www-01.ibm.com/support/knowledgecenter/STXKQY_4.1.1/com.ibm.spectrum.scale.v4r11.adv.doc/bl1adv_ces_features.htm but this assumes you have all IBM Samba RPMs and prerequisites installed. The recipes *should* be split out so that at the very least, RPM install is done in its own recipe without configuring or enabling anything... 2. I am not 100% sure what deploying IBM Samba on the cluster will do with regards to sernet samba. As far as I am aware there is no code in the installer or chef recipes to check for other samba deployments running but I may be mistaken. Depending on how sernet samba hooks to GPFS I can't think of any reason why it would cause problems aside from the risk of the protocols not communicating and causing issues with file locks/data overwrites, depending on what workload you have running on samba. 3. I haven't personally seen multicluster deployments done or tested before, but no, it is not officially supported. The installer has been written with the assumption that you are installing to one cluster, so I wouldn't recommend trying with multiple clusters - unforseen consequences :) Regards, Michael Garwood IBM Systems Developer ________________________________ Phone: 44-161-905-4118 E-mail: GARWOODM at uk.ibm.com 40 Blackfriars Street Manchester, M3 2EG United Kingdom IBM United Kingdom Limited Registered in England and Wales with number 741598 Registered office: PO Box 41, North Harbour, Portsmouth, Hants. PO6 3AU From: "Simon Thompson (Research Computing - IT Services)" > To: gpfsug main discussion list >, Date: 02/07/2015 16:43 Subject: [gpfsug-discuss] 4.1.1 protocol support Sent by: gpfsug-discuss-bounces at gpfsug.org ________________________________ Hi, Just wondering if anyone has looked at the new protocol support stuff in 4.1.1 yet? >From what I can see, it wants to use the installer to add things like IBM Samba onto nodes in the cluster. The docs online seem to list manual installation as running the chef template, which is hardly manual... 1. Id like to know what is being run on my cluster 2. Its an existing install which was using sernet samba, so I don't want to go out and break anything inadvertently 3. My protocol nodes are in a multicluster, and I understand the installer doesn't support multicluster. (the docs state that multicluster isn't supported but something like its expected to work). So... Has anyone had a go at this yet and have a set of steps? I've started unpicking the chef recipe, but just wondering if anyone had already had a go at this? (and lets not start on the mildy bemusing error when you "enable" the service with "mmces service enable" (ces service not enabled) - there's other stuff to enable it)... Simon _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss Unless stated otherwise above: IBM United Kingdom Limited - Registered in England and Wales with number 741598. Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU -------------- next part -------------- An HTML attachment was scrubbed... URL: From S.J.Thompson at bham.ac.uk Thu Jul 2 19:58:12 2015 From: S.J.Thompson at bham.ac.uk (Simon Thompson (Research Computing - IT Services)) Date: Thu, 2 Jul 2015 18:58:12 +0000 Subject: [gpfsug-discuss] 4.1.1 protocol support In-Reply-To: References: Message-ID: Hi Bob, Thanks, I?ll have a look through the link Michael sent me and shout if I get stuck? Looks a bit different to the previous way were we running this with ctdb etc. Our protocol nodes are already running 7.1 (though CentOS which means the mmbuildgpl command doesn?t work, would be much nice of course if the init script detected the kernel had changed and did a build etc automagically ?). Simon From: Bob Oesterlin > Reply-To: gpfsug main discussion list > Date: Thursday, 2 July 2015 17:02 To: gpfsug main discussion list > Subject: Re: [gpfsug-discuss] 4.1.1 protocol support Hi Simon I was part of a beta program for GPFS (ok, better start saying Spectrum Scale!) 4.1.1, so I've had some experience with the toolkit that installs the protocol nodes. The new protocol nodes MUST be RH7, so it's going to be a bit more of an involved process to migrate to this level than in the past. The GPFS server nodes/client nodes can remain at RH6 is needed. Overall it works pretty well. You do have the option of doing things manually as well. The guide that describes it is pretty good. If you want to discuss the process in detail, I'd be happy to do so - a bit too much to cover over a mailing list. Bob Oesterlin Sr Storage Engineer, Nuance Communications robert.oesterlin at nuance.com On Thu, Jul 2, 2015 at 10:43 AM, Simon Thompson (Research Computing - IT Services) > wrote: Hi, Just wondering if anyone has looked at the new protocol support stuff in 4.1.1 yet? >From what I can see, it wants to use the installer to add things like IBM Samba onto nodes in the cluster. The docs online seem to list manual installation as running the chef template, which is hardly manual... 1. Id like to know what is being run on my cluster 2. Its an existing install which was using sernet samba, so I don't want to go out and break anything inadvertently 3. My protocol nodes are in a multicluster, and I understand the installer doesn't support multicluster. (the docs state that multicluster isn't supported but something like its expected to work). So... Has anyone had a go at this yet and have a set of steps? I've started unpicking the chef recipe, but just wondering if anyone had already had a go at this? (and lets not start on the mildy bemusing error when you "enable" the service with "mmces service enable" (ces service not enabled) - there's other stuff to enable it)... Simon _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From oester at gmail.com Thu Jul 2 20:03:02 2015 From: oester at gmail.com (Bob Oesterlin) Date: Thu, 2 Jul 2015 14:03:02 -0500 Subject: [gpfsug-discuss] 4.1.1 protocol support In-Reply-To: References: Message-ID: On Thu, Jul 2, 2015 at 1:52 PM, Simon Thompson (Research Computing - IT Services) wrote: > I do note that it needs CCR enabled, which we currently don?t have. Now I > think this was because we saw issues with mmsdrestore when adding a node > that had been reinstalled back into the cluster. I need to check if that is > still the case (we work on being able to pull clients, NSDs etc from the > cluster and using xcat to reprovision and the a config tool to do the > relevant bits to rejoin the cluster ? makes it easier for us to stage > kernel, GPFS, OFED updates as we just blat on a new image). > Yes, and this is why we couldn't use CCR - our compute nodes are netboot, so they go thru a mmsdrrestore every time they reboot. Now, they have fixed this in 4.1.1, which means if you can get (the cluster) to 4.1.1 and turn on CCR, mmsdrrestore should work. Note to self: Test this out in your sandbox cluster. :-) Bob Oesterlin -------------- next part -------------- An HTML attachment was scrubbed... URL: From S.J.Thompson at bham.ac.uk Fri Jul 3 12:22:06 2015 From: S.J.Thompson at bham.ac.uk (Simon Thompson (Research Computing - IT Services)) Date: Fri, 3 Jul 2015 11:22:06 +0000 Subject: [gpfsug-discuss] mmsdrestore with CCR enabled (was Re: 4.1.1 protocol support) Message-ID: Bob, (anyone?) Have you tried mmsdrestore to see if its working in 4.1.1? # mmsdrrestore -p PRIMARY -R /usr/bin/scp Fri 3 Jul 11:56:05 BST 2015: mmsdrrestore: Processing node PRIMARY ccrio initialization failed (err 811) mmsdrrestore: Unable to retrieve GPFS cluster files from CCR. mmsdrrestore: Unexpected error from updateMmfsEnvironment. Return code: 1 mmsdrrestore: Command failed. Examine previous error messages to determine cause. It seems to copy the mmsdrfs file to the local node into /var/mmfs/gen/mmsdrfs but then fails to actually work. Simon From: Bob Oesterlin > Reply-To: gpfsug main discussion list > Date: Thursday, 2 July 2015 20:03 To: gpfsug main discussion list > Subject: Re: [gpfsug-discuss] 4.1.1 protocol support On Thu, Jul 2, 2015 at 1:52 PM, Simon Thompson (Research Computing - IT Services) > wrote: I do note that it needs CCR enabled, which we currently don?t have. Now I think this was because we saw issues with mmsdrestore when adding a node that had been reinstalled back into the cluster. I need to check if that is still the case (we work on being able to pull clients, NSDs etc from the cluster and using xcat to reprovision and the a config tool to do the relevant bits to rejoin the cluster ? makes it easier for us to stage kernel, GPFS, OFED updates as we just blat on a new image). Yes, and this is why we couldn't use CCR - our compute nodes are netboot, so they go thru a mmsdrrestore every time they reboot. Now, they have fixed this in 4.1.1, which means if you can get (the cluster) to 4.1.1 and turn on CCR, mmsdrrestore should work. Note to self: Test this out in your sandbox cluster. :-) Bob Oesterlin -------------- next part -------------- An HTML attachment was scrubbed... URL: From S.J.Thompson at bham.ac.uk Fri Jul 3 12:50:31 2015 From: S.J.Thompson at bham.ac.uk (Simon Thompson (Research Computing - IT Services)) Date: Fri, 3 Jul 2015 11:50:31 +0000 Subject: [gpfsug-discuss] mmsdrestore with CCR enabled (was Re: 4.1.1 protocol support) In-Reply-To: References: Message-ID: Actually, no just ignore me, it does appear to be fixed in 4.1.1 * I cleaned up the node by removing the 4.1.1 packages, then cleaned up /var/mmfs, but then when the config tool reinstalled, it put 4.1.0 back on and didn?t apply the updates to 4.1.1, so it must have been an older version of mmsdrrestore Simon From: Simon Thompson > Reply-To: gpfsug main discussion list > Date: Friday, 3 July 2015 12:22 To: gpfsug main discussion list > Subject: [gpfsug-discuss] mmsdrestore with CCR enabled (was Re: 4.1.1 protocol support) Bob, (anyone?) Have you tried mmsdrestore to see if its working in 4.1.1? # mmsdrrestore -p PRIMARY -R /usr/bin/scp Fri 3 Jul 11:56:05 BST 2015: mmsdrrestore: Processing node PRIMARY ccrio initialization failed (err 811) mmsdrrestore: Unable to retrieve GPFS cluster files from CCR. mmsdrrestore: Unexpected error from updateMmfsEnvironment. Return code: 1 mmsdrrestore: Command failed. Examine previous error messages to determine cause. It seems to copy the mmsdrfs file to the local node into /var/mmfs/gen/mmsdrfs but then fails to actually work. Simon From: Bob Oesterlin > Reply-To: gpfsug main discussion list > Date: Thursday, 2 July 2015 20:03 To: gpfsug main discussion list > Subject: Re: [gpfsug-discuss] 4.1.1 protocol support On Thu, Jul 2, 2015 at 1:52 PM, Simon Thompson (Research Computing - IT Services) > wrote: I do note that it needs CCR enabled, which we currently don?t have. Now I think this was because we saw issues with mmsdrestore when adding a node that had been reinstalled back into the cluster. I need to check if that is still the case (we work on being able to pull clients, NSDs etc from the cluster and using xcat to reprovision and the a config tool to do the relevant bits to rejoin the cluster ? makes it easier for us to stage kernel, GPFS, OFED updates as we just blat on a new image). Yes, and this is why we couldn't use CCR - our compute nodes are netboot, so they go thru a mmsdrrestore every time they reboot. Now, they have fixed this in 4.1.1, which means if you can get (the cluster) to 4.1.1 and turn on CCR, mmsdrrestore should work. Note to self: Test this out in your sandbox cluster. :-) Bob Oesterlin -------------- next part -------------- An HTML attachment was scrubbed... URL: From oester at gmail.com Fri Jul 3 13:21:43 2015 From: oester at gmail.com (Bob Oesterlin) Date: Fri, 3 Jul 2015 07:21:43 -0500 Subject: [gpfsug-discuss] mmsdrestore with CCR enabled (was Re: 4.1.1 protocol support) In-Reply-To: References: Message-ID: On Fri, Jul 3, 2015 at 6:22 AM, Simon Thompson (Research Computing - IT Services) wrote: > Have you tried mmsdrestore to see if its working in 4.1.1? Well, no actually :) They told me it was fixed but I have never got 'round to checking it during my beta testing. If it's not, I say submit a PMR and let's get them to fix it - I will do the same. It would be nice to actually use CCR, especially if the new protocol support depends on it. Bob Oesterlin -------------- next part -------------- An HTML attachment was scrubbed... URL: From oester at gmail.com Fri Jul 3 13:22:37 2015 From: oester at gmail.com (Bob Oesterlin) Date: Fri, 3 Jul 2015 07:22:37 -0500 Subject: [gpfsug-discuss] mmsdrestore with CCR enabled (was Re: 4.1.1 protocol support) In-Reply-To: References: Message-ID: On Fri, Jul 3, 2015 at 6:22 AM, Simon Thompson (Research Computing - IT Services) wrote: > Have you tried mmsdrestore to see if its working in 4.1.1? One thing - did you try this on a pure 4.1.1 cluster with release=LATEST? Bob Oesterlin -------------- next part -------------- An HTML attachment was scrubbed... URL: From S.J.Thompson at bham.ac.uk Fri Jul 3 13:28:10 2015 From: S.J.Thompson at bham.ac.uk (Simon Thompson (Research Computing - IT Services)) Date: Fri, 3 Jul 2015 12:28:10 +0000 Subject: [gpfsug-discuss] mmsdrestore with CCR enabled (was Re: 4.1.1 protocol support) In-Reply-To: References: Message-ID: It was on a pure cluster with 4.1.1 only. (I had to do that a precursor to start enabling CES). As I mentioned, I messed up with 4.1.0 client installed so it doesn?t work from a mixed version, but did work from pure 4.1.1 Simon From: Bob Oesterlin > Reply-To: gpfsug main discussion list > Date: Friday, 3 July 2015 13:22 To: gpfsug main discussion list > Subject: Re: [gpfsug-discuss] mmsdrestore with CCR enabled (was Re: 4.1.1 protocol support) On Fri, Jul 3, 2015 at 6:22 AM, Simon Thompson (Research Computing - IT Services) > wrote: Have you tried mmsdrestore to see if its working in 4.1.1? One thing - did you try this on a pure 4.1.1 cluster with release=LATEST? Bob Oesterlin -------------- next part -------------- An HTML attachment was scrubbed... URL: From oehmes at us.ibm.com Fri Jul 3 23:48:38 2015 From: oehmes at us.ibm.com (Sven Oehme) Date: Fri, 3 Jul 2015 15:48:38 -0700 Subject: [gpfsug-discuss] GPFS 4.1.1 without QoS for mmrestripefs? In-Reply-To: <2CDF270206A255459AC4FA6B08E52AF901146351BD@ABCSYSEXC1.abcsystems.ch> References: <2CDF270206A255459AC4FA6B08E52AF90114634DD0@ABCSYSEXC1.abcsystems.ch><201507011422.t61EMZmw011626@d01av01.pok.ibm.com> <2CDF270206A255459AC4FA6B08E52AF901146351BD@ABCSYSEXC1.abcsystems.ch> Message-ID: <201507032249.t63Mnffp025995@d03av03.boulder.ibm.com> this triggers a few questions 1. have you tried running it only on a node that doesn't serve NFS data ? 2. what NFS stack are you using ? is this the kernel NFS Server as part of linux means you use cNFS ? if the answer to 2 is yes, have you adjusted the nfsd threads in /etc/sysconfig/nfs ? the default is only 8 and if you run with the default you have a very low number of threads from the outside competing with a larger number of threads doing restripe, increasing the nfsd threads could help. you could also reduce the number of internal restripe threads to try out if that helps mitigating the impact. to try an extreme low value set the following : mmchconfig pitWorkerThreadsPerNode=1 -i and retry the restripe again, to reset it back to default run mmchconfig pitWorkerThreadsPerNode=DEFAULT -i sven ------------------------------------------ Sven Oehme Scalable Storage Research email: oehmes at us.ibm.com Phone: +1 (408) 824-8904 IBM Almaden Research Lab ------------------------------------------ From: Daniel Vogel To: "'gpfsug main discussion list'" Date: 07/02/2015 12:12 AM Subject: Re: [gpfsug-discuss] GPFS 4.1.1 without QoS for mmrestripefs? Sent by: gpfsug-discuss-bounces at gpfsug.org Sven, Yes I agree, but ?using ?N? to reduce the load helps not really. If I use NFS, for example, as a ESX data store, ESX I/O latency for NFS goes very high, the VM?s hangs. By the way I use SSD PCIe cards, perfect ?mirror speed? but slow I/O on NFS. The GPFS cluster concept I use are different than GSS or traditional FC (shared storage). I use shared nothing with IB (no FPO), many GPFS nodes with NSD?s. I know the need to resync the FS with mmchdisk / mmrestripe will happen more often. The only one feature will help is QoS for the GPFS admin jobs. I hope we are not fare away from this. Thanks, Daniel Von: gpfsug-discuss-bounces at gpfsug.org [ mailto:gpfsug-discuss-bounces at gpfsug.org] Im Auftrag von Sven Oehme Gesendet: Mittwoch, 1. Juli 2015 16:21 An: gpfsug main discussion list Betreff: Re: [gpfsug-discuss] GPFS 4.1.1 without QoS for mmrestripefs? Daniel, as you know, we can't discuss future / confidential items on a mailing list. what i presented as an outlook to future releases hasn't changed from a technical standpoint, we just can't share a release date until we announce it official. there are multiple ways today to limit the impact on restripe and other tasks, the best way to do this is to run the task ( using -N) on a node (or very small number of nodes) that has no performance critical role. while this is not perfect, it should limit the impact significantly. . sven ------------------------------------------ Sven Oehme Scalable Storage Research email: oehmes at us.ibm.com Phone: +1 (408) 824-8904 IBM Almaden Research Lab ------------------------------------------ Inactive hide details for Daniel Vogel ---07/01/2015 03:29:11 AM---Hi Years ago, IBM made some plan to do a implementation "QoSDaniel Vogel ---07/01/2015 03:29:11 AM---Hi Years ago, IBM made some plan to do a implementation "QoS for mmrestripefs, mmdeldisk...". If a " From: Daniel Vogel To: "'gpfsug-discuss at gpfsug.org'" Date: 07/01/2015 03:29 AM Subject: [gpfsug-discuss] GPFS 4.1.1 without QoS for mmrestripefs? Sent by: gpfsug-discuss-bounces at gpfsug.org Hi Years ago, IBM made some plan to do a implementation ?QoS for mmrestripefs, mmdeldisk??. If a ?mmfsrestripe? is running, very poor performance for NFS access. I opened a PMR to ask for QoS in version 4.1.1 (Spectrum Scale). PMR 61309,113,848: I discussed the question of QOS with the development team. These command changes that were noticed are not meant to be used as GA code which is why they are not documented. I cannot provide any further information from the support perspective. Anybody knows about QoS? The last hope was at ?GPFS Workshop Stuttgart M?rz 2015? with Sven Oehme as speaker. Daniel Vogel IT Consultant ABC SYSTEMS AG Hauptsitz Z?rich R?tistrasse 28 CH - 8952 Schlieren T +41 43 433 6 433 D +41 43 433 6 467 http://www.abcsystems.ch ABC - Always Better Concepts. Approved By Customers since 1981. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: From S.J.Thompson at bham.ac.uk Mon Jul 6 11:09:08 2015 From: S.J.Thompson at bham.ac.uk (Simon Thompson (Research Computing - IT Services)) Date: Mon, 6 Jul 2015 10:09:08 +0000 Subject: [gpfsug-discuss] SMB support and config Message-ID: Hi, (sorry, lots of questions about this stuff at the moment!) I?m currently looking at removing the sernet smb configs we had previously and moving to IBM SMB. I?ve removed all the old packages and only now have gpfs.smb installed on the systems. I?m struggling to get the config tools to work for our environment. We have MS Windows AD Domain for authentication. For various reasons, however doesn?t hold the UIDs/GIDs, which are instead held in a different LDAP directory. In the past, we?d configure the Linux servers running Samba so that NSLCD was configured to get details from the LDAP server. (e.g. getent passwd would return the data for an AD user). The Linux boxes would also be configured to use KRB5 authentication where users were allowed to ssh etc in for password authentication. So as far as Samba was concerned, it would do ?security = ADS? and then we?d also have "idmap config * : backend = tdb2? I.e. Use Domain for authentication, but look locally for ID mapping data. Now I can configured IBM SMB to use ADS for authentication: mmuserauth service create --type ad --data-access-method file --netbios-name its-rds --user-name ADMINUSER --servers DOMAIN.ADF --idmap-role subordinate However I can?t see anyway for me to manipulate the config so that it doesn?t use autorid. Using this we end up with: mmsmb config list | grep -i idmap idmap config * : backend autorid idmap config * : range 10000000-299999999 idmap config * : rangesize 1000000 idmap config * : read only yes idmap:cache no It also adds: mmsmb config list | grep -i auth auth methods guest sam winbind (though I don?t think that is a problem). I also can?t change the idmap using the mmsmb command (I think would look like this): # mmsmb config change --option="idmap config * : backend=tdb2" idmap config * : backend=tdb2: [E] Unsupported smb option. More information about smb options is availabe in the man page. I can?t see anything in the docs at: http://www-01.ibm.com/support/knowledgecenter/#!/STXKQY_4.1.1/com.ibm.spect rum.scale.v4r11.adm.doc/bl1adm_configfileauthentication.htm That give me a clue how to do what I want. I?d be happy to do some mixture of AD for authentication and LDAP for lookups (rather than just falling back to ?local? from nslcd), but I can?t see a way to do this, and ?manual? seems to stop ADS authentication in Samba. Anyone got any suggestions? Thanks Simon From kallbac at iu.edu Mon Jul 6 23:06:00 2015 From: kallbac at iu.edu (Kallback-Rose, Kristy A) Date: Mon, 6 Jul 2015 22:06:00 +0000 Subject: [gpfsug-discuss] SMB support and config In-Reply-To: References: Message-ID: <0193637F-4448-4068-A049-3A40A9AFD998@iu.edu> Just to chime in as another interested party, we do something fairly similar but use sssd instead of nslcd. Very interested to see how accommodating the IBM Samba is to local configuration needs. Best, Kristy On Jul 6, 2015, at 6:09 AM, Simon Thompson (Research Computing - IT Services) wrote: > Hi, > > (sorry, lots of questions about this stuff at the moment!) > > I?m currently looking at removing the sernet smb configs we had previously > and moving to IBM SMB. I?ve removed all the old packages and only now have > gpfs.smb installed on the systems. > > I?m struggling to get the config tools to work for our environment. > > We have MS Windows AD Domain for authentication. For various reasons, > however doesn?t hold the UIDs/GIDs, which are instead held in a different > LDAP directory. > > In the past, we?d configure the Linux servers running Samba so that NSLCD > was configured to get details from the LDAP server. (e.g. getent passwd > would return the data for an AD user). The Linux boxes would also be > configured to use KRB5 authentication where users were allowed to ssh etc > in for password authentication. > > So as far as Samba was concerned, it would do ?security = ADS? and then > we?d also have "idmap config * : backend = tdb2? > > I.e. Use Domain for authentication, but look locally for ID mapping data. > > Now I can configured IBM SMB to use ADS for authentication: > > mmuserauth service create --type ad --data-access-method file > --netbios-name its-rds --user-name ADMINUSER --servers DOMAIN.ADF > --idmap-role subordinate > > > However I can?t see anyway for me to manipulate the config so that it > doesn?t use autorid. Using this we end up with: > > mmsmb config list | grep -i idmap > idmap config * : backend autorid > idmap config * : range 10000000-299999999 > idmap config * : rangesize 1000000 > idmap config * : read only yes > idmap:cache no > > > It also adds: > > mmsmb config list | grep -i auth > auth methods guest sam winbind > > (though I don?t think that is a problem). > > > I also can?t change the idmap using the mmsmb command (I think would look > like this): > # mmsmb config change --option="idmap config * : backend=tdb2" > idmap config * : backend=tdb2: [E] Unsupported smb option. More > information about smb options is availabe in the man page. > > > > I can?t see anything in the docs at: > http://www-01.ibm.com/support/knowledgecenter/#!/STXKQY_4.1.1/com.ibm.spect > rum.scale.v4r11.adm.doc/bl1adm_configfileauthentication.htm > > That give me a clue how to do what I want. > > I?d be happy to do some mixture of AD for authentication and LDAP for > lookups (rather than just falling back to ?local? from nslcd), but I can?t > see a way to do this, and ?manual? seems to stop ADS authentication in > Samba. > > Anyone got any suggestions? > > > Thanks > > Simon > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at gpfsug.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss From S.J.Thompson at bham.ac.uk Tue Jul 7 12:39:24 2015 From: S.J.Thompson at bham.ac.uk (Simon Thompson (Research Computing - IT Services)) Date: Tue, 7 Jul 2015 11:39:24 +0000 Subject: [gpfsug-discuss] SMB support and config In-Reply-To: <0193637F-4448-4068-A049-3A40A9AFD998@iu.edu> References: <0193637F-4448-4068-A049-3A40A9AFD998@iu.edu> Message-ID: So based on what I?m seeing ... When you run mmstartup, the start process edits /etc/nsswitch.conf. I?ve managed to make it work in my environment, but I had to edit the file /usr/lpp/mmfs/bin/mmcesop to make it put ldap instead of winbind when it starts up. I also had to do some studious use of "net conf delparm? ? Which is probably not a good idea. I did try using: mmuserauth service create --type userdefined --data-access-method file And the setting the "security = ADS? parameters by hand with "net conf? (can?t do it with mmsmb), and a manual ?net ads join" but I couldn?t get it to authenticate clients properly. I can?t work out why just at the moment. But even then when mmshutdown runs, it still goes ahead and edits /etc/nsswitch.conf I?ve got a ticket open with IBM at the moment via our integrator to see what they say. But I?m not sure I like something going off and poking things like /etc/nsswitch.conf at startup/shutdown. I can sorta see that at config time, but when service start etc, I?m not sure I really like that idea! Simon On 06/07/2015 23:06, "Kallback-Rose, Kristy A" wrote: >Just to chime in as another interested party, we do something fairly >similar but use sssd instead of nslcd. Very interested to see how >accommodating the IBM Samba is to local configuration needs. > >Best, >Kristy > >On Jul 6, 2015, at 6:09 AM, Simon Thompson (Research Computing - IT >Services) wrote: > >> Hi, >> >> (sorry, lots of questions about this stuff at the moment!) >> >> I?m currently looking at removing the sernet smb configs we had >>previously >> and moving to IBM SMB. I?ve removed all the old packages and only now >>have >> gpfs.smb installed on the systems. >> >> I?m struggling to get the config tools to work for our environment. >> >> We have MS Windows AD Domain for authentication. For various reasons, >> however doesn?t hold the UIDs/GIDs, which are instead held in a >>different >> LDAP directory. >> >> In the past, we?d configure the Linux servers running Samba so that >>NSLCD >> was configured to get details from the LDAP server. (e.g. getent passwd >> would return the data for an AD user). The Linux boxes would also be >> configured to use KRB5 authentication where users were allowed to ssh >>etc >> in for password authentication. >> >> So as far as Samba was concerned, it would do ?security = ADS? and then >> we?d also have "idmap config * : backend = tdb2? >> >> I.e. Use Domain for authentication, but look locally for ID mapping >>data. >> >> Now I can configured IBM SMB to use ADS for authentication: >> >> mmuserauth service create --type ad --data-access-method file >> --netbios-name its-rds --user-name ADMINUSER --servers DOMAIN.ADF >> --idmap-role subordinate >> >> >> However I can?t see anyway for me to manipulate the config so that it >> doesn?t use autorid. Using this we end up with: >> >> mmsmb config list | grep -i idmap >> idmap config * : backend autorid >> idmap config * : range 10000000-299999999 >> idmap config * : rangesize 1000000 >> idmap config * : read only yes >> idmap:cache no >> >> >> It also adds: >> >> mmsmb config list | grep -i auth >> auth methods guest sam winbind >> >> (though I don?t think that is a problem). >> >> >> I also can?t change the idmap using the mmsmb command (I think would >>look >> like this): >> # mmsmb config change --option="idmap config * : backend=tdb2" >> idmap config * : backend=tdb2: [E] Unsupported smb option. More >> information about smb options is availabe in the man page. >> >> >> >> I can?t see anything in the docs at: >> >>http://www-01.ibm.com/support/knowledgecenter/#!/STXKQY_4.1.1/com.ibm.spe >>ct >> rum.scale.v4r11.adm.doc/bl1adm_configfileauthentication.htm >> >> That give me a clue how to do what I want. >> >> I?d be happy to do some mixture of AD for authentication and LDAP for >> lookups (rather than just falling back to ?local? from nslcd), but I >>can?t >> see a way to do this, and ?manual? seems to stop ADS authentication in >> Samba. >> >> Anyone got any suggestions? >> >> >> Thanks >> >> Simon >> >> >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at gpfsug.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss > >_______________________________________________ >gpfsug-discuss mailing list >gpfsug-discuss at gpfsug.org >http://gpfsug.org/mailman/listinfo/gpfsug-discuss From TROPPENS at de.ibm.com Thu Jul 9 07:55:24 2015 From: TROPPENS at de.ibm.com (Ulf Troppens) Date: Thu, 9 Jul 2015 08:55:24 +0200 Subject: [gpfsug-discuss] ISC 2015 Message-ID: Anybody at ISC 2015 in Frankfurt next week? I am happy to share my experience with supporting four ESP (a.k.a beta) customers of the new protocol feature. You can find me at the IBM booth (Booth 928). -- IBM Spectrum Scale Development - Client Engagements & Solutions Delivery Consulting IT Specialist Author "Storage Networks Explained" IBM Deutschland Research & Development GmbH Vorsitzende des Aufsichtsrats: Martina Koederitz Gesch?ftsf?hrung: Dirk Wittkopp Sitz der Gesellschaft: B?blingen / Registergericht: Amtsgericht Stuttgart, HRB 243294 From daniel.kidger at uk.ibm.com Thu Jul 9 09:12:51 2015 From: daniel.kidger at uk.ibm.com (Daniel Kidger) Date: Thu, 9 Jul 2015 09:12:51 +0100 Subject: [gpfsug-discuss] ISC 2015 In-Reply-To: Message-ID: <1970894201.4637011436429559512.JavaMail.notes@d06wgw86.portsmouth.uk.ibm.com> Ulf, I am certainly interested. You can find me on the IBM booth too :-) Looking forward to meeting you. Daniel Sent from IBM Verse Ulf Troppens --- [gpfsug-discuss] ISC 2015 --- From:"Ulf Troppens" To:"gpfsug main discussion list" Date:Thu, 9 Jul 2015 08:55Subject:[gpfsug-discuss] ISC 2015 Anybody at ISC 2015 in Frankfurt next week? I am happy to share my experience with supporting four ESP (a.k.a beta) customers of the new protocol feature. You can find me at the IBM booth (Booth 928). -- IBM Spectrum Scale Development - Client Engagements & Solutions Delivery Consulting IT Specialist Author "Storage Networks Explained" IBM Deutschland Research & Development GmbH Vorsitzende des Aufsichtsrats: Martina Koederitz Gesch?ftsf?hrung: Dirk Wittkopp Sitz der Gesellschaft: B?blingen / Registergericht: Amtsgericht Stuttgart, HRB 243294 _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From ewahl at osc.edu Thu Jul 9 15:56:42 2015 From: ewahl at osc.edu (Wahl, Edward) Date: Thu, 9 Jul 2015 14:56:42 +0000 Subject: [gpfsug-discuss] 4.1.1 protocol support In-Reply-To: References: , Message-ID: <9DA9EC7A281AC7428A9618AFDC49049955A5DCC4@CIO-KRC-D1MBX02.osuad.osu.edu> Please please please please PLEASE tell me that support for RHEL 6 is in the plan for protocol nodes. Forcing us to 7 seems rather VERY premature. been out sick a week so I just saw this, FYI. I'd sell my co-workers to test out protocol nodes, but frankly NOT on RHEL 7. Definitely NOT an HPC ready release. ugh. Ed Wahl OSC ________________________________ From: gpfsug-discuss-bounces at gpfsug.org [gpfsug-discuss-bounces at gpfsug.org] on behalf of Bob Oesterlin [oester at gmail.com] Sent: Thursday, July 02, 2015 12:02 PM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] 4.1.1 protocol support Hi Simon I was part of a beta program for GPFS (ok, better start saying Spectrum Scale!) 4.1.1, so I've had some experience with the toolkit that installs the protocol nodes. The new protocol nodes MUST be RH7, so it's going to be a bit more of an involved process to migrate to this level than in the past. The GPFS server nodes/client nodes can remain at RH6 is needed. Overall it works pretty well. You do have the option of doing things manually as well. The guide that describes it is pretty good. If you want to discuss the process in detail, I'd be happy to do so - a bit too much to cover over a mailing list. Bob Oesterlin Sr Storage Engineer, Nuance Communications robert.oesterlin at nuance.com On Thu, Jul 2, 2015 at 10:43 AM, Simon Thompson (Research Computing - IT Services) > wrote: Hi, Just wondering if anyone has looked at the new protocol support stuff in 4.1.1 yet? >From what I can see, it wants to use the installer to add things like IBM Samba onto nodes in the cluster. The docs online seem to list manual installation as running the chef template, which is hardly manual... 1. Id like to know what is being run on my cluster 2. Its an existing install which was using sernet samba, so I don't want to go out and break anything inadvertently 3. My protocol nodes are in a multicluster, and I understand the installer doesn't support multicluster. (the docs state that multicluster isn't supported but something like its expected to work). So... Has anyone had a go at this yet and have a set of steps? I've started unpicking the chef recipe, but just wondering if anyone had already had a go at this? (and lets not start on the mildy bemusing error when you "enable" the service with "mmces service enable" (ces service not enabled) - there's other stuff to enable it)... Simon _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From sdinardo at ebi.ac.uk Fri Jul 10 11:07:28 2015 From: sdinardo at ebi.ac.uk (Salvatore Di Nardo) Date: Fri, 10 Jul 2015 11:07:28 +0100 Subject: [gpfsug-discuss] data interface and management infercace. Message-ID: <559F9960.7010509@ebi.ac.uk> Hello guys. Quite a while ago i mentioned that we have a big expel issue on our gss ( first gen) and white a lot people suggested that the root cause could be that we use the same interface for all the traffic, and that we should split the data network from the admin network. Finally we could plan a downtime and we are migrating the data out so, i can soon safelly play with the change, but looking what exactly i should to do i'm a bit puzzled. Our mmlscluster looks like this: GPFS cluster information ======================== GPFS cluster name: GSS.ebi.ac.uk GPFS cluster id: 17987981184946329605 GPFS UID domain: GSS.ebi.ac.uk Remote shell command: /usr/bin/ssh Remote file copy command: /usr/bin/scp GPFS cluster configuration servers: ----------------------------------- Primary server: gss01a.ebi.ac.uk Secondary server: gss02b.ebi.ac.uk Node Daemon node name IP address Admin node name Designation ----------------------------------------------------------------------- 1 gss01a.ebi.ac.uk 10.7.28.2 gss01a.ebi.ac.uk quorum-manager 2 gss01b.ebi.ac.uk 10.7.28.3 gss01b.ebi.ac.uk quorum-manager 3 gss02a.ebi.ac.uk 10.7.28.67 gss02a.ebi.ac.uk quorum-manager 4 gss02b.ebi.ac.uk 10.7.28.66 gss02b.ebi.ac.uk quorum-manager 5 gss03a.ebi.ac.uk 10.7.28.34 gss03a.ebi.ac.uk quorum-manager 6 gss03b.ebi.ac.uk 10.7.28.35 gss03b.ebi.ac.uk quorum-manager It was my understanding that the "admin node" should use a different interface ( a 1g link copper should be fine), while the daemon node is where the data was passing , so should point to the bonded 10g interfaces. but when i read the mmchnode man page i start to be quite confused. It says: --daemon-interface={hostname | ip_address} Specifies the host name or IP address _*to be used by the GPFS daemons for node-to-node communication*_. The host name or IP address must refer to the communication adapter over which the GPFS daemons communicate. Alias interfaces are not allowed. Use the original address or a name that is resolved by the host command to that original address. --admin-interface={hostname | ip_address} Specifies the name of the node to be used by GPFS administration commands when communicating between nodes. The admin node name must be specified as an IP address or a hostname that is resolved by the host command tothe desired IP address. If the keyword DEFAULT is specified, the admin interface for the node is set to be equal to the daemon interface for the node. What exactly means "node-to node-communications" ? Means DATA or also the "lease renew", and the token communication between the clients to get/steal the locks to be able to manage concurrent write to thr same file? Since we are getting expells ( especially when several clients contends the same file ) i assumed i have to split this type of packages from the data stream, but reading the documentation it looks to me that those internal comunication between nodes use the daemon-interface wich i suppose are used also for the data. so HOW exactly i can split them? _**_ Thanks in advance, Salvatore -------------- next part -------------- An HTML attachment was scrubbed... URL: From secretary at gpfsug.org Fri Jul 10 12:33:48 2015 From: secretary at gpfsug.org (Secretary GPFS UG) Date: Fri, 10 Jul 2015 12:33:48 +0100 Subject: [gpfsug-discuss] Places available: Meet the Devs Message-ID: Dear All, There are a couple of places remaining at the next 'Meet the Devs' event on Wednesday 29th July, 11am-3pm. The event is being held at IBM Warwick. The agenda promises to be hands on and give you the opportunity to speak face to face with the developers of GPFS. Guideline agenda: * Data analytic workloads - development to show and tell UK work on establishing use cases and tighter integration of Spark on top of GPFS * Show the GUI coming in 4.2 * Discuss 4.2 and beyond roadmap * How would you like IP management to work for protocol access? * Optional - Team can demo & discuss NFS/SMB/Object integration into Scale Lunch and refreshments will be provided. Please can you let me know by email if you are interested in attending and I'll register your place. Thanks and we hope to see you there! -- Claire O'Toole (n?e Robson) GPFS User Group Secretary +44 (0)7508 033896 www.gpfsug.org From S.J.Thompson at bham.ac.uk Fri Jul 10 12:59:19 2015 From: S.J.Thompson at bham.ac.uk (Simon Thompson (Research Computing - IT Services)) Date: Fri, 10 Jul 2015 11:59:19 +0000 Subject: [gpfsug-discuss] 4.1.1 protocol support In-Reply-To: <9DA9EC7A281AC7428A9618AFDC49049955A5DCC4@CIO-KRC-D1MBX02.osuad.osu.edu> References: <9DA9EC7A281AC7428A9618AFDC49049955A5DCC4@CIO-KRC-D1MBX02.osuad.osu.edu> Message-ID: Hi Ed, Well, technically: http://www-01.ibm.com/support/knowledgecenter/#!/STXKQY_4.1.1/com.ibm.spectrum.scale.v4r11.ins.doc/bl1ins_protocolsprerequisites.htm Says "The spectrumscale installation toolkit supports Red Hat Enterprise Linux 7.0 and 7.1 platforms on x86_64 and ppc64 architectures" So maybe if you don?t want to use the installer, you don't need RHEL 7. Of course where or not that is supported, only IBM would be able to say ? I?ve only looked at gpfs.smb, but as its provided as a binary RPM, it might or might not work in a 6 environment (it bundles ctdb etc all in). For object, as its a bundle of openstack RPMs, then potentially it won?t work on EL6 depending on the python requirements? And surely you aren?t running protocol support on HPC nodes anyway ... so maybe a few EL7 nodes could work for you? Simon From: , Edward > Reply-To: gpfsug main discussion list > Date: Thursday, 9 July 2015 15:56 To: gpfsug main discussion list > Subject: Re: [gpfsug-discuss] 4.1.1 protocol support Please please please please PLEASE tell me that support for RHEL 6 is in the plan for protocol nodes. Forcing us to 7 seems rather VERY premature. been out sick a week so I just saw this, FYI. I'd sell my co-workers to test out protocol nodes, but frankly NOT on RHEL 7. Definitely NOT an HPC ready release. ugh. Ed Wahl OSC ________________________________ From: gpfsug-discuss-bounces at gpfsug.org [gpfsug-discuss-bounces at gpfsug.org] on behalf of Bob Oesterlin [oester at gmail.com] Sent: Thursday, July 02, 2015 12:02 PM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] 4.1.1 protocol support Hi Simon I was part of a beta program for GPFS (ok, better start saying Spectrum Scale!) 4.1.1, so I've had some experience with the toolkit that installs the protocol nodes. The new protocol nodes MUST be RH7, so it's going to be a bit more of an involved process to migrate to this level than in the past. The GPFS server nodes/client nodes can remain at RH6 is needed. Overall it works pretty well. You do have the option of doing things manually as well. The guide that describes it is pretty good. If you want to discuss the process in detail, I'd be happy to do so - a bit too much to cover over a mailing list. Bob Oesterlin Sr Storage Engineer, Nuance Communications robert.oesterlin at nuance.com On Thu, Jul 2, 2015 at 10:43 AM, Simon Thompson (Research Computing - IT Services) > wrote: Hi, Just wondering if anyone has looked at the new protocol support stuff in 4.1.1 yet? >From what I can see, it wants to use the installer to add things like IBM Samba onto nodes in the cluster. The docs online seem to list manual installation as running the chef template, which is hardly manual... 1. Id like to know what is being run on my cluster 2. Its an existing install which was using sernet samba, so I don't want to go out and break anything inadvertently 3. My protocol nodes are in a multicluster, and I understand the installer doesn't support multicluster. (the docs state that multicluster isn't supported but something like its expected to work). So... Has anyone had a go at this yet and have a set of steps? I've started unpicking the chef recipe, but just wondering if anyone had already had a go at this? (and lets not start on the mildy bemusing error when you "enable" the service with "mmces service enable" (ces service not enabled) - there's other stuff to enable it)... Simon _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From S.J.Thompson at bham.ac.uk Fri Jul 10 13:06:01 2015 From: S.J.Thompson at bham.ac.uk (Simon Thompson (Research Computing - IT Services)) Date: Fri, 10 Jul 2015 12:06:01 +0000 Subject: [gpfsug-discuss] SMB support and config In-Reply-To: <0193637F-4448-4068-A049-3A40A9AFD998@iu.edu> References: <0193637F-4448-4068-A049-3A40A9AFD998@iu.edu> Message-ID: So IBM came back and said what I was doing wasn?t supported. They did say that you can use ?user defined? authentication. Which I?ve got working now on my environment (figured what I was doing wrong, and you can?t use mmsmb to do some of the bits I need for it to work for user defined mode for me...). But I still think it needs a patch to one of the files for CES for use in user defined authentication. (Right now it appears to remove all my ?user defined? settings from nsswitch.conf when you stop CES/GPFS on a node). I?ve supplied my patch to IBM which works for my case, we?ll see what they do about it? (If people are interested, I?ll gather my notes into a blog post). Simon On 06/07/2015 23:06, "Kallback-Rose, Kristy A" wrote: >Just to chime in as another interested party, we do something fairly >similar but use sssd instead of nslcd. Very interested to see how >accommodating the IBM Samba is to local configuration needs. > >Best, >Kristy > >On Jul 6, 2015, at 6:09 AM, Simon Thompson (Research Computing - IT >Services) wrote: > >> Hi, >> >> (sorry, lots of questions about this stuff at the moment!) >> >> I?m currently looking at removing the sernet smb configs we had >>previously >> and moving to IBM SMB. I?ve removed all the old packages and only now >>have >> gpfs.smb installed on the systems. >> >> I?m struggling to get the config tools to work for our environment. >> >> We have MS Windows AD Domain for authentication. For various reasons, >> however doesn?t hold the UIDs/GIDs, which are instead held in a >>different >> LDAP directory. >> >> In the past, we?d configure the Linux servers running Samba so that >>NSLCD >> was configured to get details from the LDAP server. (e.g. getent passwd >> would return the data for an AD user). The Linux boxes would also be >> configured to use KRB5 authentication where users were allowed to ssh >>etc >> in for password authentication. >> >> So as far as Samba was concerned, it would do ?security = ADS? and then >> we?d also have "idmap config * : backend = tdb2? >> >> I.e. Use Domain for authentication, but look locally for ID mapping >>data. >> >> Now I can configured IBM SMB to use ADS for authentication: >> >> mmuserauth service create --type ad --data-access-method file >> --netbios-name its-rds --user-name ADMINUSER --servers DOMAIN.ADF >> --idmap-role subordinate >> >> >> However I can?t see anyway for me to manipulate the config so that it >> doesn?t use autorid. Using this we end up with: >> >> mmsmb config list | grep -i idmap >> idmap config * : backend autorid >> idmap config * : range 10000000-299999999 >> idmap config * : rangesize 1000000 >> idmap config * : read only yes >> idmap:cache no >> >> >> It also adds: >> >> mmsmb config list | grep -i auth >> auth methods guest sam winbind >> >> (though I don?t think that is a problem). >> >> >> I also can?t change the idmap using the mmsmb command (I think would >>look >> like this): >> # mmsmb config change --option="idmap config * : backend=tdb2" >> idmap config * : backend=tdb2: [E] Unsupported smb option. More >> information about smb options is availabe in the man page. >> >> >> >> I can?t see anything in the docs at: >> >>http://www-01.ibm.com/support/knowledgecenter/#!/STXKQY_4.1.1/com.ibm.spe >>ct >> rum.scale.v4r11.adm.doc/bl1adm_configfileauthentication.htm >> >> That give me a clue how to do what I want. >> >> I?d be happy to do some mixture of AD for authentication and LDAP for >> lookups (rather than just falling back to ?local? from nslcd), but I >>can?t >> see a way to do this, and ?manual? seems to stop ADS authentication in >> Samba. >> >> Anyone got any suggestions? >> >> >> Thanks >> >> Simon >> >> >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at gpfsug.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss > >_______________________________________________ >gpfsug-discuss mailing list >gpfsug-discuss at gpfsug.org >http://gpfsug.org/mailman/listinfo/gpfsug-discuss From Daniel.Vogel at abcsystems.ch Fri Jul 10 15:19:11 2015 From: Daniel.Vogel at abcsystems.ch (Daniel Vogel) Date: Fri, 10 Jul 2015 14:19:11 +0000 Subject: [gpfsug-discuss] GPFS 4.1.1 without QoS for mmrestripefs? In-Reply-To: <201507032249.t63Mnffp025995@d03av03.boulder.ibm.com> References: <2CDF270206A255459AC4FA6B08E52AF90114634DD0@ABCSYSEXC1.abcsystems.ch><201507011422.t61EMZmw011626@d01av01.pok.ibm.com> <2CDF270206A255459AC4FA6B08E52AF901146351BD@ABCSYSEXC1.abcsystems.ch> <201507032249.t63Mnffp025995@d03av03.boulder.ibm.com> Message-ID: <2CDF270206A255459AC4FA6B08E52AF90114635E8E@ABCSYSEXC1.abcsystems.ch> For ?1? we use the quorum node to do ?start disk? or ?restripe file system? (quorum node without disks). For ?2? we use kernel NFS with cNFS I used the command ?cnfsNFSDprocs 64? to set the NFS threads. Is this correct? gpfs01:~ # cat /proc/fs/nfsd/threads 64 I will verify the settings in our lab, will use the following configuration: mmchconfig worker1Threads=128 mmchconfig prefetchThreads=128 mmchconfig nsdMaxWorkerThreads=128 mmchconfig cnfsNFSDprocs=256 daniel Von: gpfsug-discuss-bounces at gpfsug.org [mailto:gpfsug-discuss-bounces at gpfsug.org] Im Auftrag von Sven Oehme Gesendet: Samstag, 4. Juli 2015 00:49 An: gpfsug main discussion list Betreff: Re: [gpfsug-discuss] GPFS 4.1.1 without QoS for mmrestripefs? this triggers a few questions 1. have you tried running it only on a node that doesn't serve NFS data ? 2. what NFS stack are you using ? is this the kernel NFS Server as part of linux means you use cNFS ? if the answer to 2 is yes, have you adjusted the nfsd threads in /etc/sysconfig/nfs ? the default is only 8 and if you run with the default you have a very low number of threads from the outside competing with a larger number of threads doing restripe, increasing the nfsd threads could help. you could also reduce the number of internal restripe threads to try out if that helps mitigating the impact. to try an extreme low value set the following : mmchconfig pitWorkerThreadsPerNode=1 -i and retry the restripe again, to reset it back to default run mmchconfig pitWorkerThreadsPerNode=DEFAULT -i sven ------------------------------------------ Sven Oehme Scalable Storage Research email: oehmes at us.ibm.com Phone: +1 (408) 824-8904 IBM Almaden Research Lab ------------------------------------------ [Beschreibung: Inactive hide details for Daniel Vogel ---07/02/2015 12:12:46 AM---Sven, Yes I agree, but ?using ?N? to reduce the load help]Daniel Vogel ---07/02/2015 12:12:46 AM---Sven, Yes I agree, but ?using ?N? to reduce the load helps not really. If I use NFS, for example, as From: Daniel Vogel > To: "'gpfsug main discussion list'" > Date: 07/02/2015 12:12 AM Subject: Re: [gpfsug-discuss] GPFS 4.1.1 without QoS for mmrestripefs? Sent by: gpfsug-discuss-bounces at gpfsug.org ________________________________ Sven, Yes I agree, but ?using ?N? to reduce the load helps not really. If I use NFS, for example, as a ESX data store, ESX I/O latency for NFS goes very high, the VM?s hangs. By the way I use SSD PCIe cards, perfect ?mirror speed? but slow I/O on NFS. The GPFS cluster concept I use are different than GSS or traditional FC (shared storage). I use shared nothing with IB (no FPO), many GPFS nodes with NSD?s. I know the need to resync the FS with mmchdisk / mmrestripe will happen more often. The only one feature will help is QoS for the GPFS admin jobs. I hope we are not fare away from this. Thanks, Daniel Von: gpfsug-discuss-bounces at gpfsug.org [mailto:gpfsug-discuss-bounces at gpfsug.org] Im Auftrag von Sven Oehme Gesendet: Mittwoch, 1. Juli 2015 16:21 An: gpfsug main discussion list Betreff: Re: [gpfsug-discuss] GPFS 4.1.1 without QoS for mmrestripefs? Daniel, as you know, we can't discuss future / confidential items on a mailing list. what i presented as an outlook to future releases hasn't changed from a technical standpoint, we just can't share a release date until we announce it official. there are multiple ways today to limit the impact on restripe and other tasks, the best way to do this is to run the task ( using -N) on a node (or very small number of nodes) that has no performance critical role. while this is not perfect, it should limit the impact significantly. . sven ------------------------------------------ Sven Oehme Scalable Storage Research email: oehmes at us.ibm.com Phone: +1 (408) 824-8904 IBM Almaden Research Lab ------------------------------------------ [Beschreibung: Inactive hide details for Daniel Vogel ---07/01/2015 03:29:11 AM---Hi Years ago, IBM made some plan to do a implementation "QoS]Daniel Vogel ---07/01/2015 03:29:11 AM---Hi Years ago, IBM made some plan to do a implementation "QoS for mmrestripefs, mmdeldisk...". If a " From: Daniel Vogel > To: "'gpfsug-discuss at gpfsug.org'" > Date: 07/01/2015 03:29 AM Subject: [gpfsug-discuss] GPFS 4.1.1 without QoS for mmrestripefs? Sent by: gpfsug-discuss-bounces at gpfsug.org ________________________________ Hi Years ago, IBM made some plan to do a implementation ?QoS for mmrestripefs, mmdeldisk??. If a ?mmfsrestripe? is running, very poor performance for NFS access. I opened a PMR to ask for QoS in version 4.1.1 (Spectrum Scale). PMR 61309,113,848: I discussed the question of QOS with the development team. These command changes that were noticed are not meant to be used as GA code which is why they are not documented. I cannot provide any further information from the support perspective. Anybody knows about QoS? The last hope was at ?GPFS Workshop Stuttgart M?rz 2015? with Sven Oehme as speaker. Daniel Vogel IT Consultant ABC SYSTEMS AG Hauptsitz Z?rich R?tistrasse 28 CH - 8952 Schlieren T +41 43 433 6 433 D +41 43 433 6 467 http://www.abcsystems.ch ABC - Always Better Concepts. Approved By Customers since 1981. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image001.gif Type: image/gif Size: 105 bytes Desc: image001.gif URL: From Kevin.Buterbaugh at Vanderbilt.Edu Fri Jul 10 15:56:04 2015 From: Kevin.Buterbaugh at Vanderbilt.Edu (Buterbaugh, Kevin L) Date: Fri, 10 Jul 2015 14:56:04 +0000 Subject: [gpfsug-discuss] Fwd: GPFS 4.1, NFSv4, and authenticating against AD References: <69C83493-2E22-4B11-BF15-A276DA6D4901@vanderbilt.edu> Message-ID: <55426129-67A0-4071-91F4-715BAC1F0DBE@vanderbilt.edu> Begin forwarded message: From: buterbkl > Subject: GPFS 4.1, NFSv4, and authenticating against AD Date: July 10, 2015 at 9:52:38 AM CDT To: gpfs-general at sdsc.edu Hi All, We are under the (hopefully not mistaken) impression that with GPFS 4.1 supporting NFSv4 it should be possible to have a CNFS setup authenticate against Active Directory as long as you use NFSv4. I also thought that I had seen somewhere (possibly one of the two GPFS related mailing lists I?m on, or in a DeveloperWorks article, or ???) that IBM has published documentation on how to set this up (a kind of cookbook). I?ve done a fair amount of Googling looking for such a document, but I seem to be uniquely talented in not being able to find things with Google! :-( Does anyone know of such a document and could send me the link to it? It would be very helpful to us as I?ve got essentially zero experience with Kerberos (which I think is required to talk to AD) and the institutions? AD environment is managed by a separate department. Thanks in advance? Kevin ? Kevin Buterbaugh - Senior System Administrator Vanderbilt University - Advanced Computing Center for Research and Education Kevin.Buterbaugh at vanderbilt.edu - (615)875-9633 -------------- next part -------------- An HTML attachment was scrubbed... URL: From sdinardo at ebi.ac.uk Mon Jul 13 13:31:18 2015 From: sdinardo at ebi.ac.uk (Salvatore Di Nardo) Date: Mon, 13 Jul 2015 13:31:18 +0100 Subject: [gpfsug-discuss] data interface and management infercace. In-Reply-To: <559F9960.7010509@ebi.ac.uk> References: <559F9960.7010509@ebi.ac.uk> Message-ID: <55A3AF96.3060303@ebi.ac.uk> Anyone? On 10/07/15 11:07, Salvatore Di Nardo wrote: > Hello guys. > Quite a while ago i mentioned that we have a big expel issue on our > gss ( first gen) and white a lot people suggested that the root cause > could be that we use the same interface for all the traffic, and that > we should split the data network from the admin network. Finally we > could plan a downtime and we are migrating the data out so, i can soon > safelly play with the change, but looking what exactly i should to do > i'm a bit puzzled. Our mmlscluster looks like this: > > GPFS cluster information > ======================== > GPFS cluster name: GSS.ebi.ac.uk > GPFS cluster id: 17987981184946329605 > GPFS UID domain: GSS.ebi.ac.uk > Remote shell command: /usr/bin/ssh > Remote file copy command: /usr/bin/scp > > GPFS cluster configuration servers: > ----------------------------------- > Primary server: gss01a.ebi.ac.uk > Secondary server: gss02b.ebi.ac.uk > > Node Daemon node name IP address Admin node name > Designation > ----------------------------------------------------------------------- > 1 gss01a.ebi.ac.uk 10.7.28.2 gss01a.ebi.ac.uk > quorum-manager > 2 gss01b.ebi.ac.uk 10.7.28.3 gss01b.ebi.ac.uk > quorum-manager > 3 gss02a.ebi.ac.uk 10.7.28.67 gss02a.ebi.ac.uk > quorum-manager > 4 gss02b.ebi.ac.uk 10.7.28.66 gss02b.ebi.ac.uk > quorum-manager > 5 gss03a.ebi.ac.uk 10.7.28.34 gss03a.ebi.ac.uk > quorum-manager > 6 gss03b.ebi.ac.uk 10.7.28.35 gss03b.ebi.ac.uk > quorum-manager > > > It was my understanding that the "admin node" should use a different > interface ( a 1g link copper should be fine), while the daemon node is > where the data was passing , so should point to the bonded 10g > interfaces. but when i read the mmchnode man page i start to be quite > confused. It says: > > --daemon-interface={hostname | ip_address} > Specifies the host name or IP address _*to > be used by the GPFS daemons for node-to-node communication*_. The > host name or IP address must refer to the communication adapter over > which the GPFS daemons communicate. > Alias interfaces are not allowed. Use the > original address or a name that is resolved by the host command to > that original address. > > --admin-interface={hostname | ip_address} > Specifies the name of the node to be used by > GPFS administration commands when communicating between nodes. The > admin node name must be specified as an IP address or a hostname that > is resolved by the host command > tothe desired IP address. If the keyword > DEFAULT is specified, the admin interface for the node is set to be > equal to the daemon interface for the node. > > What exactly means "node-to node-communications" ? > Means DATA or also the "lease renew", and the token communication > between the clients to get/steal the locks to be able to manage > concurrent write to thr same file? > Since we are getting expells ( especially when several clients > contends the same file ) i assumed i have to split this type of > packages from the data stream, but reading the documentation it looks > to me that those internal comunication between nodes use the > daemon-interface wich i suppose are used also for the data. so HOW > exactly i can split them? > > > Thanks in advance, > Salvatore > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at gpfsug.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From sdinardo at ebi.ac.uk Mon Jul 13 14:29:50 2015 From: sdinardo at ebi.ac.uk (Salvatore Di Nardo) Date: Mon, 13 Jul 2015 14:29:50 +0100 Subject: [gpfsug-discuss] data interface and management infercace. In-Reply-To: <5269A4E9-416B-4D70-AAE0-B86042FC96B9@ddn.com> References: <559F9960.7010509@ebi.ac.uk> <55A3AF96.3060303@ebi.ac.uk> <5269A4E9-416B-4D70-AAE0-B86042FC96B9@ddn.com> Message-ID: <55A3BD4E.3000205@ebi.ac.uk> Hello Vic. We are currently draining our gpfs to do all the recabling to add a management network, but looking what the admin interface does ( man mmchnode ) it says something different: --admin-interface={hostname | ip_address} Specifies the name of the node to be used by GPFS administration commands when communicating between nodes. The admin node name must be specified as an IP address or a hostname that is resolved by the host command to the desired IP address. If the keyword DEFAULT is specified, the admin interface for the node is set to be equal to the daemon interface for the node. So, seems used only for commands propagation, hence have nothing to do with the node-to-node traffic. Infact the other interface description is: --daemon-interface={hostname | ip_address} Specifies the host name or IP address _*to be used by the GPFS daemons for node-to-node communication*_. The host name or IP address must refer to the commu- nication adapter over which the GPFS daemons communicate. Alias interfaces are not allowed. Use the original address or a name that is resolved by the host command to that original address. The "expired lease" issue and file locking mechanism a( most of our expells happens when 2 clients try to write in the same file) are exactly node-to node-comunication, so im wondering what's the point to separate the "admin network". I want to be sure to plan the right changes before we do a so massive task. We are talking about adding a new interface on 700 clients, so the recabling work its not small. Regards, Salvatore On 13/07/15 14:00, Vic Cornell wrote: > Hi Salavatore, > > Does your GSS have the facility for a 1GbE ?management? network? If so > I think that changing the ?admin? node names of the cluster members to > a set of IPs on the management network would give you the split that > you need. > > What about the clients? Can they also connect to a separate admin network? > > Remember that if you are using multi-cluster all of the nodes in both > networks must share the same admin network. > > Kind Regards, > > Vic > > >> On 13 Jul 2015, at 13:31, Salvatore Di Nardo > > wrote: >> >> Anyone? >> >> On 10/07/15 11:07, Salvatore Di Nardo wrote: >>> Hello guys. >>> Quite a while ago i mentioned that we have a big expel issue on our >>> gss ( first gen) and white a lot people suggested that the root >>> cause could be that we use the same interface for all the traffic, >>> and that we should split the data network from the admin network. >>> Finally we could plan a downtime and we are migrating the data out >>> so, i can soon safelly play with the change, but looking what >>> exactly i should to do i'm a bit puzzled. Our mmlscluster looks like >>> this: >>> >>> GPFS cluster information >>> ======================== >>> GPFS cluster name: GSS.ebi.ac.uk >>> GPFS cluster id: 17987981184946329605 >>> GPFS UID domain: GSS.ebi.ac.uk >>> Remote shell command: /usr/bin/ssh >>> Remote file copy command: /usr/bin/scp >>> >>> GPFS cluster configuration servers: >>> ----------------------------------- >>> Primary server: gss01a.ebi.ac.uk >>> Secondary server: gss02b.ebi.ac.uk >>> >>> Node Daemon node name IP address Admin node >>> name Designation >>> ----------------------------------------------------------------------- >>> 1 gss01a.ebi.ac.uk >>> 10.7.28.2 gss01a.ebi.ac.uk >>> quorum-manager >>> 2 gss01b.ebi.ac.uk >>> 10.7.28.3 gss01b.ebi.ac.uk >>> quorum-manager >>> 3 gss02a.ebi.ac.uk >>> 10.7.28.67 gss02a.ebi.ac.uk >>> quorum-manager >>> 4 gss02b.ebi.ac.uk >>> 10.7.28.66 gss02b.ebi.ac.uk >>> quorum-manager >>> 5 gss03a.ebi.ac.uk >>> 10.7.28.34 gss03a.ebi.ac.uk >>> quorum-manager >>> 6 gss03b.ebi.ac.uk >>> 10.7.28.35 gss03b.ebi.ac.uk >>> quorum-manager >>> >>> >>> It was my understanding that the "admin node" should use a different >>> interface ( a 1g link copper should be fine), while the daemon node >>> is where the data was passing , so should point to the bonded 10g >>> interfaces. but when i read the mmchnode man page i start to be >>> quite confused. It says: >>> >>> --daemon-interface={hostname | ip_address} >>> Specifies the host name or IP address >>> _*to be used by the GPFS daemons for node-to-node communication*_. >>> The host name or IP address must refer to the communication adapter >>> over which the GPFS daemons communicate. >>> Alias interfaces are not allowed. Use the >>> original address or a name that is resolved by the host command to >>> that original address. >>> >>> --admin-interface={hostname | ip_address} >>> Specifies the name of the node to be used >>> by GPFS administration commands when communicating between nodes. >>> The admin node name must be specified as an IP address or a hostname >>> that is resolved by the host command >>> tothe desired IP address. If the keyword >>> DEFAULT is specified, the admin interface for the node is set to be >>> equal to the daemon interface for the node. >>> >>> What exactly means "node-to node-communications" ? >>> Means DATA or also the "lease renew", and the token communication >>> between the clients to get/steal the locks to be able to manage >>> concurrent write to thr same file? >>> Since we are getting expells ( especially when several clients >>> contends the same file ) i assumed i have to split this type of >>> packages from the data stream, but reading the documentation it >>> looks to me that those internal comunication between nodes use the >>> daemon-interface wich i suppose are used also for the data. so HOW >>> exactly i can split them? >>> >>> >>> Thanks in advance, >>> Salvatore >>> >>> >>> >>> _______________________________________________ >>> gpfsug-discuss mailing list >>> gpfsug-discuss atgpfsug.org >>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >> >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at gpfsug.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: From viccornell at gmail.com Mon Jul 13 15:25:32 2015 From: viccornell at gmail.com (Vic Cornell) Date: Mon, 13 Jul 2015 15:25:32 +0100 Subject: [gpfsug-discuss] data interface and management infercace. In-Reply-To: <55A3BD4E.3000205@ebi.ac.uk> References: <559F9960.7010509@ebi.ac.uk> <55A3AF96.3060303@ebi.ac.uk> <5269A4E9-416B-4D70-AAE0-B86042FC96B9@ddn.com> <55A3BD4E.3000205@ebi.ac.uk> Message-ID: Hi Salvatore, I agree that that is what the manual - and some of the wiki entries say. However , when we have had problems (typically congestion) with ethernet networks in the past (20GbE or 40GbE) we have resolved them by setting up a separate ?Admin? network. The before and after cluster health we have seen measured in number of expels and waiters has been very marked. Maybe someone ?in the know? could comment on this split. Regards, Vic > On 13 Jul 2015, at 14:29, Salvatore Di Nardo wrote: > > Hello Vic. > We are currently draining our gpfs to do all the recabling to add a management network, but looking what the admin interface does ( man mmchnode ) it says something different: > > --admin-interface={hostname | ip_address} > Specifies the name of the node to be used by GPFS administration commands when communicating between nodes. The admin node name must be specified as an IP > address or a hostname that is resolved by the host command to the desired IP address. If the keyword DEFAULT is specified, the admin interface for the > node is set to be equal to the daemon interface for the node. > > So, seems used only for commands propagation, hence have nothing to do with the node-to-node traffic. Infact the other interface description is: > > --daemon-interface={hostname | ip_address} > Specifies the host name or IP address to be used by the GPFS daemons for node-to-node communication. The host name or IP address must refer to the commu- > nication adapter over which the GPFS daemons communicate. Alias interfaces are not allowed. Use the original address or a name that is resolved by the > host command to that original address. > > The "expired lease" issue and file locking mechanism a( most of our expells happens when 2 clients try to write in the same file) are exactly node-to node-comunication, so im wondering what's the point to separate the "admin network". I want to be sure to plan the right changes before we do a so massive task. We are talking about adding a new interface on 700 clients, so the recabling work its not small. > > > Regards, > Salvatore > > > > On 13/07/15 14:00, Vic Cornell wrote: >> Hi Salavatore, >> >> Does your GSS have the facility for a 1GbE ?management? network? If so I think that changing the ?admin? node names of the cluster members to a set of IPs on the management network would give you the split that you need. >> >> What about the clients? Can they also connect to a separate admin network? >> >> Remember that if you are using multi-cluster all of the nodes in both networks must share the same admin network. >> >> Kind Regards, >> >> Vic >> >> >>> On 13 Jul 2015, at 13:31, Salvatore Di Nardo > wrote: >>> >>> Anyone? >>> >>> On 10/07/15 11:07, Salvatore Di Nardo wrote: >>>> Hello guys. >>>> Quite a while ago i mentioned that we have a big expel issue on our gss ( first gen) and white a lot people suggested that the root cause could be that we use the same interface for all the traffic, and that we should split the data network from the admin network. Finally we could plan a downtime and we are migrating the data out so, i can soon safelly play with the change, but looking what exactly i should to do i'm a bit puzzled. Our mmlscluster looks like this: >>>> >>>> GPFS cluster information >>>> ======================== >>>> GPFS cluster name: GSS.ebi.ac.uk >>>> GPFS cluster id: 17987981184946329605 >>>> GPFS UID domain: GSS.ebi.ac.uk >>>> Remote shell command: /usr/bin/ssh >>>> Remote file copy command: /usr/bin/scp >>>> >>>> GPFS cluster configuration servers: >>>> ----------------------------------- >>>> Primary server: gss01a.ebi.ac.uk >>>> Secondary server: gss02b.ebi.ac.uk >>>> >>>> Node Daemon node name IP address Admin node name Designation >>>> ----------------------------------------------------------------------- >>>> 1 gss01a.ebi.ac.uk 10.7.28.2 gss01a.ebi.ac.uk quorum-manager >>>> 2 gss01b.ebi.ac.uk 10.7.28.3 gss01b.ebi.ac.uk quorum-manager >>>> 3 gss02a.ebi.ac.uk 10.7.28.67 gss02a.ebi.ac.uk quorum-manager >>>> 4 gss02b.ebi.ac.uk 10.7.28.66 gss02b.ebi.ac.uk quorum-manager >>>> 5 gss03a.ebi.ac.uk 10.7.28.34 gss03a.ebi.ac.uk quorum-manager >>>> 6 gss03b.ebi.ac.uk 10.7.28.35 gss03b.ebi.ac.uk quorum-manager >>>> >>>> It was my understanding that the "admin node" should use a different interface ( a 1g link copper should be fine), while the daemon node is where the data was passing , so should point to the bonded 10g interfaces. but when i read the mmchnode man page i start to be quite confused. It says: >>>> >>>> --daemon-interface={hostname | ip_address} >>>> Specifies the host name or IP address to be used by the GPFS daemons for node-to-node communication. The host name or IP address must refer to the communication adapter over which the GPFS daemons communicate. >>>> Alias interfaces are not allowed. Use the original address or a name that is resolved by the host command to that original address. >>>> >>>> --admin-interface={hostname | ip_address} >>>> Specifies the name of the node to be used by GPFS administration commands when communicating between nodes. The admin node name must be specified as an IP address or a hostname that is resolved by the host command >>>> to the desired IP address. If the keyword DEFAULT is specified, the admin interface for the node is set to be equal to the daemon interface for the node. >>>> >>>> What exactly means "node-to node-communications" ? >>>> Means DATA or also the "lease renew", and the token communication between the clients to get/steal the locks to be able to manage concurrent write to thr same file? >>>> Since we are getting expells ( especially when several clients contends the same file ) i assumed i have to split this type of packages from the data stream, but reading the documentation it looks to me that those internal comunication between nodes use the daemon-interface wich i suppose are used also for the data. so HOW exactly i can split them? >>>> >>>> >>>> Thanks in advance, >>>> Salvatore >>>> >>>> >>>> >>>> _______________________________________________ >>>> gpfsug-discuss mailing list >>>> gpfsug-discuss at gpfsug.org >>>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >>> >>> _______________________________________________ >>> gpfsug-discuss mailing list >>> gpfsug-discuss at gpfsug.org >>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >> > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at gpfsug.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From jhick at lbl.gov Mon Jul 13 16:22:58 2015 From: jhick at lbl.gov (Jason Hick) Date: Mon, 13 Jul 2015 08:22:58 -0700 Subject: [gpfsug-discuss] data interface and management infercace. In-Reply-To: References: <559F9960.7010509@ebi.ac.uk> <55A3AF96.3060303@ebi.ac.uk> <5269A4E9-416B-4D70-AAE0-B86042FC96B9@ddn.com> <55A3BD4E.3000205@ebi.ac.uk> Message-ID: Hi, Yes having separate data and management networks has been critical for us for keeping health monitoring/communication unimpeded by data movement. Not as important, but you can also tune the networks differently (packet sizes, buffer sizes, SAK, etc) which can help. Jason > On Jul 13, 2015, at 7:25 AM, Vic Cornell wrote: > > Hi Salvatore, > > I agree that that is what the manual - and some of the wiki entries say. > > However , when we have had problems (typically congestion) with ethernet networks in the past (20GbE or 40GbE) we have resolved them by setting up a separate ?Admin? network. > > The before and after cluster health we have seen measured in number of expels and waiters has been very marked. > > Maybe someone ?in the know? could comment on this split. > > Regards, > > Vic > > >> On 13 Jul 2015, at 14:29, Salvatore Di Nardo wrote: >> >> Hello Vic. >> We are currently draining our gpfs to do all the recabling to add a management network, but looking what the admin interface does ( man mmchnode ) it says something different: >> >> --admin-interface={hostname | ip_address} >> Specifies the name of the node to be used by GPFS administration commands when communicating between nodes. The admin node name must be specified as an IP >> address or a hostname that is resolved by the host command to the desired IP address. If the keyword DEFAULT is specified, the admin interface for the >> node is set to be equal to the daemon interface for the node. >> >> So, seems used only for commands propagation, hence have nothing to do with the node-to-node traffic. Infact the other interface description is: >> >> --daemon-interface={hostname | ip_address} >> Specifies the host name or IP address to be used by the GPFS daemons for node-to-node communication. The host name or IP address must refer to the commu- >> nication adapter over which the GPFS daemons communicate. Alias interfaces are not allowed. Use the original address or a name that is resolved by the >> host command to that original address. >> >> The "expired lease" issue and file locking mechanism a( most of our expells happens when 2 clients try to write in the same file) are exactly node-to node-comunication, so im wondering what's the point to separate the "admin network". I want to be sure to plan the right changes before we do a so massive task. We are talking about adding a new interface on 700 clients, so the recabling work its not small. >> >> >> Regards, >> Salvatore >> >> >> >>> On 13/07/15 14:00, Vic Cornell wrote: >>> Hi Salavatore, >>> >>> Does your GSS have the facility for a 1GbE ?management? network? If so I think that changing the ?admin? node names of the cluster members to a set of IPs on the management network would give you the split that you need. >>> >>> What about the clients? Can they also connect to a separate admin network? >>> >>> Remember that if you are using multi-cluster all of the nodes in both networks must share the same admin network. >>> >>> Kind Regards, >>> >>> Vic >>> >>> >>>> On 13 Jul 2015, at 13:31, Salvatore Di Nardo wrote: >>>> >>>> Anyone? >>>> >>>>> On 10/07/15 11:07, Salvatore Di Nardo wrote: >>>>> Hello guys. >>>>> Quite a while ago i mentioned that we have a big expel issue on our gss ( first gen) and white a lot people suggested that the root cause could be that we use the same interface for all the traffic, and that we should split the data network from the admin network. Finally we could plan a downtime and we are migrating the data out so, i can soon safelly play with the change, but looking what exactly i should to do i'm a bit puzzled. Our mmlscluster looks like this: >>>>> >>>>> GPFS cluster information >>>>> ======================== >>>>> GPFS cluster name: GSS.ebi.ac.uk >>>>> GPFS cluster id: 17987981184946329605 >>>>> GPFS UID domain: GSS.ebi.ac.uk >>>>> Remote shell command: /usr/bin/ssh >>>>> Remote file copy command: /usr/bin/scp >>>>> >>>>> GPFS cluster configuration servers: >>>>> ----------------------------------- >>>>> Primary server: gss01a.ebi.ac.uk >>>>> Secondary server: gss02b.ebi.ac.uk >>>>> >>>>> Node Daemon node name IP address Admin node name Designation >>>>> ----------------------------------------------------------------------- >>>>> 1 gss01a.ebi.ac.uk 10.7.28.2 gss01a.ebi.ac.uk quorum-manager >>>>> 2 gss01b.ebi.ac.uk 10.7.28.3 gss01b.ebi.ac.uk quorum-manager >>>>> 3 gss02a.ebi.ac.uk 10.7.28.67 gss02a.ebi.ac.uk quorum-manager >>>>> 4 gss02b.ebi.ac.uk 10.7.28.66 gss02b.ebi.ac.uk quorum-manager >>>>> 5 gss03a.ebi.ac.uk 10.7.28.34 gss03a.ebi.ac.uk quorum-manager >>>>> 6 gss03b.ebi.ac.uk 10.7.28.35 gss03b.ebi.ac.uk quorum-manager >>>>> >>>>> It was my understanding that the "admin node" should use a different interface ( a 1g link copper should be fine), while the daemon node is where the data was passing , so should point to the bonded 10g interfaces. but when i read the mmchnode man page i start to be quite confused. It says: >>>>> >>>>> --daemon-interface={hostname | ip_address} >>>>> Specifies the host name or IP address to be used by the GPFS daemons for node-to-node communication. The host name or IP address must refer to the communication adapter over which the GPFS daemons communicate. >>>>> Alias interfaces are not allowed. Use the original address or a name that is resolved by the host command to that original address. >>>>> >>>>> --admin-interface={hostname | ip_address} >>>>> Specifies the name of the node to be used by GPFS administration commands when communicating between nodes. The admin node name must be specified as an IP address or a hostname that is resolved by the host command >>>>> to the desired IP address. If the keyword DEFAULT is specified, the admin interface for the node is set to be equal to the daemon interface for the node. >>>>> >>>>> What exactly means "node-to node-communications" ? >>>>> Means DATA or also the "lease renew", and the token communication between the clients to get/steal the locks to be able to manage concurrent write to thr same file? >>>>> Since we are getting expells ( especially when several clients contends the same file ) i assumed i have to split this type of packages from the data stream, but reading the documentation it looks to me that those internal comunication between nodes use the daemon-interface wich i suppose are used also for the data. so HOW exactly i can split them? >>>>> >>>>> >>>>> Thanks in advance, >>>>> Salvatore >>>>> >>>>> >>>>> >>>>> _______________________________________________ >>>>> gpfsug-discuss mailing list >>>>> gpfsug-discuss at gpfsug.org >>>>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >>>> >>>> _______________________________________________ >>>> gpfsug-discuss mailing list >>>> gpfsug-discuss at gpfsug.org >>>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >> >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at gpfsug.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at gpfsug.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From sdenham at gmail.com Mon Jul 13 17:45:48 2015 From: sdenham at gmail.com (Scott D) Date: Mon, 13 Jul 2015 11:45:48 -0500 Subject: [gpfsug-discuss] data interface and management infercace. Message-ID: I spent a good deal of time exploring this topic when I was at IBM. I think there are two key aspects here; the congestion of the actual interfaces on the [cluster, FS, token] management nodes and competition for other resources like CPU cycles on those nodes. When using a single Ethernet interface (or for that matter IB RDMA + IPoIB over the same interface), at some point the two kinds of traffic begin to conflict. The management traffic being much more time sensitive suffers as a result. One solution is to separate the traffic. For larger clusters though (1000s of nodes), a better solution, that may avoid having to have a 2nd interface on every client node, is to add dedicated nodes as managers and not rely on NSD servers for this. It does cost you some modest servers and GPFS server licenses. My previous client generally used previous-generation retired compute nodes for this job. Scott Date: Mon, 13 Jul 2015 15:25:32 +0100 > From: Vic Cornell > Subject: Re: [gpfsug-discuss] data interface and management infercace. > > Hi Salvatore, > > I agree that that is what the manual - and some of the wiki entries say. > > However , when we have had problems (typically congestion) with ethernet > networks in the past (20GbE or 40GbE) we have resolved them by setting up a > separate ?Admin? network. > > The before and after cluster health we have seen measured in number of > expels and waiters has been very marked. > > Maybe someone ?in the know? could comment on this split. > > Regards, > > Vic > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mhabib73 at gmail.com Mon Jul 13 18:19:36 2015 From: mhabib73 at gmail.com (Muhammad Habib) Date: Mon, 13 Jul 2015 13:19:36 -0400 Subject: [gpfsug-discuss] data interface and management infercace. In-Reply-To: References: <559F9960.7010509@ebi.ac.uk> <55A3AF96.3060303@ebi.ac.uk> <5269A4E9-416B-4D70-AAE0-B86042FC96B9@ddn.com> <55A3BD4E.3000205@ebi.ac.uk> Message-ID: Did you look at "subnets" parameter used with "mmchconfig" command. I think you can use order list of subnets for daemon communication and then actual daemon interface can be used for data transfer. When the GPFS will start it will use actual daemon interface for communication , however , once its started , it will use the IPs from the subnet list whichever coming first in the list. To further validate , you can put network sniffer before you do actual implementation or alternatively you can open a PMR with IBM. If your cluster having expel situation , you may fine tune your cluster e.g. increase ping timeout period , having multiple NSD servers and distributing filesystems across these NSD servers. Also critical servers can have HBA cards installed for direct I/O through fiber. Thanks On Mon, Jul 13, 2015 at 11:22 AM, Jason Hick wrote: > Hi, > > Yes having separate data and management networks has been critical for us > for keeping health monitoring/communication unimpeded by data movement. > > Not as important, but you can also tune the networks differently (packet > sizes, buffer sizes, SAK, etc) which can help. > > Jason > > On Jul 13, 2015, at 7:25 AM, Vic Cornell wrote: > > Hi Salvatore, > > I agree that that is what the manual - and some of the wiki entries say. > > However , when we have had problems (typically congestion) with ethernet > networks in the past (20GbE or 40GbE) we have resolved them by setting up a > separate ?Admin? network. > > The before and after cluster health we have seen measured in number of > expels and waiters has been very marked. > > Maybe someone ?in the know? could comment on this split. > > Regards, > > Vic > > > On 13 Jul 2015, at 14:29, Salvatore Di Nardo wrote: > > Hello Vic. > We are currently draining our gpfs to do all the recabling to add a > management network, but looking what the admin interface does ( man > mmchnode ) it says something different: > > --admin-interface={hostname | ip_address} > Specifies the name of the node to be used by GPFS > administration commands when communicating between nodes. The admin node > name must be specified as an IP > address or a hostname that is resolved by the > host command to the desired IP address. If the keyword DEFAULT is > specified, the admin interface for the > node is set to be equal to the daemon interface > for the node. > > > So, seems used only for commands propagation, hence have nothing to do > with the node-to-node traffic. Infact the other interface description is: > > --daemon-interface={hostname | ip_address} > Specifies the host name or IP address *to be > used by the GPFS daemons for node-to-node communication*. The host name > or IP address must refer to the commu- > nication adapter over which the GPFS daemons > communicate. Alias interfaces are not allowed. Use the original address or > a name that is resolved by the > host command to that original address. > > > The "expired lease" issue and file locking mechanism a( most of our > expells happens when 2 clients try to write in the same file) are exactly > node-to node-comunication, so im wondering what's the point to separate > the "admin network". I want to be sure to plan the right changes before we > do a so massive task. We are talking about adding a new interface on 700 > clients, so the recabling work its not small. > > > Regards, > Salvatore > > > > On 13/07/15 14:00, Vic Cornell wrote: > > Hi Salavatore, > > Does your GSS have the facility for a 1GbE ?management? network? If so I > think that changing the ?admin? node names of the cluster members to a set > of IPs on the management network would give you the split that you need. > > What about the clients? Can they also connect to a separate admin > network? > > Remember that if you are using multi-cluster all of the nodes in both > networks must share the same admin network. > > Kind Regards, > > Vic > > > On 13 Jul 2015, at 13:31, Salvatore Di Nardo wrote: > > Anyone? > > On 10/07/15 11:07, Salvatore Di Nardo wrote: > > Hello guys. > Quite a while ago i mentioned that we have a big expel issue on our gss ( > first gen) and white a lot people suggested that the root cause could be > that we use the same interface for all the traffic, and that we should > split the data network from the admin network. Finally we could plan a > downtime and we are migrating the data out so, i can soon safelly play with > the change, but looking what exactly i should to do i'm a bit puzzled. Our > mmlscluster looks like this: > > GPFS cluster information > ======================== > GPFS cluster name: GSS.ebi.ac.uk > GPFS cluster id: 17987981184946329605 > GPFS UID domain: GSS.ebi.ac.uk > Remote shell command: /usr/bin/ssh > Remote file copy command: /usr/bin/scp > > GPFS cluster configuration servers: > ----------------------------------- > Primary server: gss01a.ebi.ac.uk > Secondary server: gss02b.ebi.ac.uk > > Node Daemon node name IP address Admin node name Designation > ----------------------------------------------------------------------- > 1 gss01a.ebi.ac.uk 10.7.28.2 gss01a.ebi.ac.uk quorum-manager > 2 gss01b.ebi.ac.uk 10.7.28.3 gss01b.ebi.ac.uk quorum-manager > 3 gss02a.ebi.ac.uk 10.7.28.67 gss02a.ebi.ac.uk quorum-manager > 4 gss02b.ebi.ac.uk 10.7.28.66 gss02b.ebi.ac.uk quorum-manager > 5 gss03a.ebi.ac.uk 10.7.28.34 gss03a.ebi.ac.uk quorum-manager > 6 gss03b.ebi.ac.uk 10.7.28.35 gss03b.ebi.ac.uk quorum-manager > > > It was my understanding that the "admin node" should use a different > interface ( a 1g link copper should be fine), while the daemon node is > where the data was passing , so should point to the bonded 10g interfaces. > but when i read the mmchnode man page i start to be quite confused. It says: > > --daemon-interface={hostname | ip_address} > Specifies the host name or IP address *to be > used by the GPFS daemons for node-to-node communication*. The host name > or IP address must refer to the communication adapter over which the GPFS > daemons communicate. > Alias interfaces are not allowed. Use the > original address or a name that is resolved by the host command to that > original address. > > --admin-interface={hostname | ip_address} > Specifies the name of the node to be used by GPFS > administration commands when communicating between nodes. The admin node > name must be specified as an IP address or a hostname that is resolved by > the host command > to the desired IP address. If the keyword > DEFAULT is specified, the admin interface for the node is set to be equal > to the daemon interface for the node. > > What exactly means "node-to node-communications" ? > Means DATA or also the "lease renew", and the token communication between > the clients to get/steal the locks to be able to manage concurrent write to > thr same file? > Since we are getting expells ( especially when several clients contends > the same file ) i assumed i have to split this type of packages from the > data stream, but reading the documentation it looks to me that those > internal comunication between nodes use the daemon-interface wich i suppose > are used also for the data. so HOW exactly i can split them? > > > Thanks in advance, > Salvatore > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at gpfsug.orghttp://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at gpfsug.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at gpfsug.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at gpfsug.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at gpfsug.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > -- This communication contains confidential information intended only for the persons to whom it is addressed. Any other distribution, copying or disclosure is strictly prohibited. If you have received this communication in error, please notify the sender and delete this e-mail message immediately. Le pr?sent message contient des renseignements de nature confidentielle r?serv?s uniquement ? l'usage du destinataire. Toute diffusion, distribution, divulgation, utilisation ou reproduction de la pr?sente communication, et de tout fichier qui y est joint, est strictement interdite. Si vous avez re?u le pr?sent message ?lectronique par erreur, veuillez informer imm?diatement l'exp?diteur et supprimer le message de votre ordinateur et de votre serveur. -------------- next part -------------- An HTML attachment was scrubbed... URL: From oester at gmail.com Mon Jul 13 18:42:47 2015 From: oester at gmail.com (Bob Oesterlin) Date: Mon, 13 Jul 2015 12:42:47 -0500 Subject: [gpfsug-discuss] data interface and management infercace. In-Reply-To: References: Message-ID: Some thoughts on node expels, based on the last 2-3 months of "expel hell" here. We've spent a lot of time looking at this issue, across multiple clusters. A big thanks to IBM for helping us center in on the right issues. First, you need to understand if the expels are due to "expired lease" message, or expels due to "communication issues". It sounds like you are talking about the latter. In the case of nodes being expelled due to communication issues, it's more likely the problem in related to network congestion. This can occur at many levels - the node, the network, or the switch. When it's a communication issue, changing prams like "missed ping timeout" isn't going to help you. The problem for us ended up being that GPFS wasn't getting a response to a periodic "keep alive" poll to the node, and after 300 seconds, it declared the node dead and expelled it. You can tell if this is the issue by starting to look at the RPC waiters just before the expel. If you see something like "Waiting for poll on sock" RPC, that the node is waiting for that periodic poll to return, and it's not seeing it. The response is either lost in the network, sitting on the network queue, or the node is too busy to send it. You may also see RPC's like "waiting for exclusive use of connection" RPC - this is another clear indication of network congestion. Look at the GPFSUG presentions (http://www.gpfsug.org/presentations/) for one by Jason Hick (NERSC) - he also talks about these issues. You need to take a look at net.ipv4.tcp_wmem and net.ipv4.tcp_rmem, especially if you have client nodes that are on slower network interfaces. In our case, it was a number of factors - adjusting these settings, looking at congestion at the switch level, and some physical hardware issues. I would be happy to discuss in more detail (offline) if you want). There are no simple solutions. :-) Bob Oesterlin, Sr Storage Engineer, Nuance Communications robert.oesterlin at nuance.com On Mon, Jul 13, 2015 at 11:45 AM, Scott D wrote: > I spent a good deal of time exploring this topic when I was at IBM. I > think there are two key aspects here; the congestion of the actual > interfaces on the [cluster, FS, token] management nodes and competition for > other resources like CPU cycles on those nodes. When using a single > Ethernet interface (or for that matter IB RDMA + IPoIB over the same > interface), at some point the two kinds of traffic begin to conflict. The > management traffic being much more time sensitive suffers as a result. One > solution is to separate the traffic. For larger clusters though (1000s of > nodes), a better solution, that may avoid having to have a 2nd interface on > every client node, is to add dedicated nodes as managers and not rely on > NSD servers for this. It does cost you some modest servers and GPFS server > licenses. My previous client generally used previous-generation retired > compute nodes for this job. > -------------- next part -------------- An HTML attachment was scrubbed... URL: From hagley at cscs.ch Tue Jul 14 08:31:04 2015 From: hagley at cscs.ch (Hagley Birgit) Date: Tue, 14 Jul 2015 07:31:04 +0000 Subject: [gpfsug-discuss] data interface and management infercace. In-Reply-To: <55A3BD4E.3000205@ebi.ac.uk> References: <559F9960.7010509@ebi.ac.uk> <55A3AF96.3060303@ebi.ac.uk> <5269A4E9-416B-4D70-AAE0-B86042FC96B9@ddn.com>, <55A3BD4E.3000205@ebi.ac.uk> Message-ID: <97B2355E006F044E9B8518711889B13719CF3810@MBX114.d.ethz.ch> Hello Salvatore, as you wrote that you have about 700 clients, maybe also the tuning recommendations for large GPFS clusters are helpful for you. They are on the developerworks GPFS wiki: https://www.ibm.com/developerworks/community/wikis/home?lang=en#!/wiki/General%20Parallel%20File%20System%20%28GPFS%29/page/Best%20Practices%20Network%20Tuning To my experience especially "failureDetectionTime" and "minMissedPingTimeout" may help in case of expelled nodes. In case you use InfiniBand, for RDMA, there also is a "Best Practices RDMA Tuning" page: https://www.ibm.com/developerworks/community/wikis/home?lang=en#!/wiki/General%20Parallel%20File%20System%20%28GPFS%29/page/Best%20Practices%20RDMA%20Tuning Regards Birgit ________________________________ From: gpfsug-discuss-bounces at gpfsug.org [gpfsug-discuss-bounces at gpfsug.org] on behalf of Salvatore Di Nardo [sdinardo at ebi.ac.uk] Sent: Monday, July 13, 2015 3:29 PM To: Vic Cornell Cc: gpfsug main discussion list Subject: Re: [gpfsug-discuss] data interface and management infercace. Hello Vic. We are currently draining our gpfs to do all the recabling to add a management network, but looking what the admin interface does ( man mmchnode ) it says something different: --admin-interface={hostname | ip_address} Specifies the name of the node to be used by GPFS administration commands when communicating between nodes. The admin node name must be specified as an IP address or a hostname that is resolved by the host command to the desired IP address. If the keyword DEFAULT is specified, the admin interface for the node is set to be equal to the daemon interface for the node. So, seems used only for commands propagation, hence have nothing to do with the node-to-node traffic. Infact the other interface description is: --daemon-interface={hostname | ip_address} Specifies the host name or IP address to be used by the GPFS daemons for node-to-node communication. The host name or IP address must refer to the commu- nication adapter over which the GPFS daemons communicate. Alias interfaces are not allowed. Use the original address or a name that is resolved by the host command to that original address. The "expired lease" issue and file locking mechanism a( most of our expells happens when 2 clients try to write in the same file) are exactly node-to node-comunication, so im wondering what's the point to separate the "admin network". I want to be sure to plan the right changes before we do a so massive task. We are talking about adding a new interface on 700 clients, so the recabling work its not small. Regards, Salvatore On 13/07/15 14:00, Vic Cornell wrote: Hi Salavatore, Does your GSS have the facility for a 1GbE ?management? network? If so I think that changing the ?admin? node names of the cluster members to a set of IPs on the management network would give you the split that you need. What about the clients? Can they also connect to a separate admin network? Remember that if you are using multi-cluster all of the nodes in both networks must share the same admin network. Kind Regards, Vic On 13 Jul 2015, at 13:31, Salvatore Di Nardo > wrote: Anyone? On 10/07/15 11:07, Salvatore Di Nardo wrote: Hello guys. Quite a while ago i mentioned that we have a big expel issue on our gss ( first gen) and white a lot people suggested that the root cause could be that we use the same interface for all the traffic, and that we should split the data network from the admin network. Finally we could plan a downtime and we are migrating the data out so, i can soon safelly play with the change, but looking what exactly i should to do i'm a bit puzzled. Our mmlscluster looks like this: GPFS cluster information ======================== GPFS cluster name: GSS.ebi.ac.uk GPFS cluster id: 17987981184946329605 GPFS UID domain: GSS.ebi.ac.uk Remote shell command: /usr/bin/ssh Remote file copy command: /usr/bin/scp GPFS cluster configuration servers: ----------------------------------- Primary server: gss01a.ebi.ac.uk Secondary server: gss02b.ebi.ac.uk Node Daemon node name IP address Admin node name Designation ----------------------------------------------------------------------- 1 gss01a.ebi.ac.uk 10.7.28.2 gss01a.ebi.ac.uk quorum-manager 2 gss01b.ebi.ac.uk 10.7.28.3 gss01b.ebi.ac.uk quorum-manager 3 gss02a.ebi.ac.uk 10.7.28.67 gss02a.ebi.ac.uk quorum-manager 4 gss02b.ebi.ac.uk 10.7.28.66 gss02b.ebi.ac.uk quorum-manager 5 gss03a.ebi.ac.uk 10.7.28.34 gss03a.ebi.ac.uk quorum-manager 6 gss03b.ebi.ac.uk 10.7.28.35 gss03b.ebi.ac.uk quorum-manager It was my understanding that the "admin node" should use a different interface ( a 1g link copper should be fine), while the daemon node is where the data was passing , so should point to the bonded 10g interfaces. but when i read the mmchnode man page i start to be quite confused. It says: --daemon-interface={hostname | ip_address} Specifies the host name or IP address to be used by the GPFS daemons for node-to-node communication. The host name or IP address must refer to the communication adapter over which the GPFS daemons communicate. Alias interfaces are not allowed. Use the original address or a name that is resolved by the host command to that original address. --admin-interface={hostname | ip_address} Specifies the name of the node to be used by GPFS administration commands when communicating between nodes. The admin node name must be specified as an IP address or a hostname that is resolved by the host command to the desired IP address. If the keyword DEFAULT is specified, the admin interface for the node is set to be equal to the daemon interface for the node. What exactly means "node-to node-communications" ? Means DATA or also the "lease renew", and the token communication between the clients to get/steal the locks to be able to manage concurrent write to thr same file? Since we are getting expells ( especially when several clients contends the same file ) i assumed i have to split this type of packages from the data stream, but reading the documentation it looks to me that those internal comunication between nodes use the daemon-interface wich i suppose are used also for the data. so HOW exactly i can split them? Thanks in advance, Salvatore _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From sdinardo at ebi.ac.uk Tue Jul 14 09:15:26 2015 From: sdinardo at ebi.ac.uk (Salvatore Di Nardo) Date: Tue, 14 Jul 2015 09:15:26 +0100 Subject: [gpfsug-discuss] data interface and management infercace. In-Reply-To: <97B2355E006F044E9B8518711889B13719CF3810@MBX114.d.ethz.ch> References: <559F9960.7010509@ebi.ac.uk> <55A3AF96.3060303@ebi.ac.uk> <5269A4E9-416B-4D70-AAE0-B86042FC96B9@ddn.com>, <55A3BD4E.3000205@ebi.ac.uk> <97B2355E006F044E9B8518711889B13719CF3810@MBX114.d.ethz.ch> Message-ID: <55A4C51E.8050606@ebi.ac.uk> Thanks, this has already been done ( without too much success). We need to rearrange the networking and since somebody experience was to add a copper interface for management i want to do the same, so i'm digging a bit to aundertsand the best way yo do it. Regards, Salvatore On 14/07/15 08:31, Hagley Birgit wrote: > Hello Salvatore, > > as you wrote that you have about 700 clients, maybe also the tuning > recommendations for large GPFS clusters are helpful for you. They are > on the developerworks GPFS wiki: > > https://www.ibm.com/developerworks/community/wikis/home?lang=en#!/wiki/General%20Parallel%20File%20System%20%28GPFS%29/page/Best%20Practices%20Network%20Tuning > > > > To my experience especially "failureDetectionTime" and > "minMissedPingTimeout" may help in case of expelled nodes. > > > In case you use InfiniBand, for RDMA, there also is a "Best Practices > RDMA Tuning" page: > > https://www.ibm.com/developerworks/community/wikis/home?lang=en#!/wiki/General%20Parallel%20File%20System%20%28GPFS%29/page/Best%20Practices%20RDMA%20Tuning > > > > > Regards > Birgit > > ------------------------------------------------------------------------ > *From:* gpfsug-discuss-bounces at gpfsug.org > [gpfsug-discuss-bounces at gpfsug.org] on behalf of Salvatore Di Nardo > [sdinardo at ebi.ac.uk] > *Sent:* Monday, July 13, 2015 3:29 PM > *To:* Vic Cornell > *Cc:* gpfsug main discussion list > *Subject:* Re: [gpfsug-discuss] data interface and management infercace. > > Hello Vic. > We are currently draining our gpfs to do all the recabling to add a > management network, but looking what the admin interface does ( man > mmchnode ) it says something different: > > --admin-interface={hostname | ip_address} > Specifies the name of the node to be used by GPFS > administration commands when communicating between nodes. The > admin node name must be specified as an IP > address or a hostname that is resolved by the host command to > the desired IP address. If the keyword DEFAULT is specified, > the admin interface for the > node is set to be equal to the daemon interface for the node. > > > So, seems used only for commands propagation, hence have nothing to > do with the node-to-node traffic. Infact the other interface > description is: > > --daemon-interface={hostname | ip_address} > Specifies the host name or IP address _*to be used by the GPFS > daemons for node-to-node communication*_. The host name or IP > address must refer to the commu- > nication adapter over which the GPFS daemons communicate. > Alias interfaces are not allowed. Use the original address or > a name that is resolved by the > host command to that original address. > > > The "expired lease" issue and file locking mechanism a( most of our > expells happens when 2 clients try to write in the same file) are > exactly node-to node-comunication, so im wondering what's the point to > separate the "admin network". I want to be sure to plan the right > changes before we do a so massive task. We are talking about adding a > new interface on 700 clients, so the recabling work its not small. > > > Regards, > Salvatore > > > > On 13/07/15 14:00, Vic Cornell wrote: >> Hi Salavatore, >> >> Does your GSS have the facility for a 1GbE ?management? network? If >> so I think that changing the ?admin? node names of the cluster >> members to a set of IPs on the management network would give you the >> split that you need. >> >> What about the clients? Can they also connect to a separate admin >> network? >> >> Remember that if you are using multi-cluster all of the nodes in both >> networks must share the same admin network. >> >> Kind Regards, >> >> Vic >> >> >>> On 13 Jul 2015, at 13:31, Salvatore Di Nardo >> > wrote: >>> >>> Anyone? >>> >>> On 10/07/15 11:07, Salvatore Di Nardo wrote: >>>> Hello guys. >>>> Quite a while ago i mentioned that we have a big expel issue on >>>> our gss ( first gen) and white a lot people suggested that the root >>>> cause could be that we use the same interface for all the traffic, >>>> and that we should split the data network from the admin network. >>>> Finally we could plan a downtime and we are migrating the data out >>>> so, i can soon safelly play with the change, but looking what >>>> exactly i should to do i'm a bit puzzled. Our mmlscluster looks >>>> like this: >>>> >>>> GPFS cluster information >>>> ======================== >>>> GPFS cluster name: GSS.ebi.ac.uk >>>> GPFS cluster id: 17987981184946329605 >>>> GPFS UID domain: GSS.ebi.ac.uk >>>> Remote shell command: /usr/bin/ssh >>>> Remote file copy command: /usr/bin/scp >>>> >>>> GPFS cluster configuration servers: >>>> ----------------------------------- >>>> Primary server: gss01a.ebi.ac.uk >>>> Secondary server: gss02b.ebi.ac.uk >>>> >>>> >>>> Node Daemon node name IP address Admin node >>>> name Designation >>>> ----------------------------------------------------------------------- >>>> 1 gss01a.ebi.ac.uk >>>> 10.7.28.2 gss01a.ebi.ac.uk >>>> quorum-manager >>>> 2 gss01b.ebi.ac.uk >>>> 10.7.28.3 gss01b.ebi.ac.uk >>>> quorum-manager >>>> 3 gss02a.ebi.ac.uk >>>> 10.7.28.67 gss02a.ebi.ac.uk >>>> quorum-manager >>>> 4 gss02b.ebi.ac.uk >>>> 10.7.28.66 gss02b.ebi.ac.uk >>>> quorum-manager >>>> 5 gss03a.ebi.ac.uk >>>> 10.7.28.34 gss03a.ebi.ac.uk >>>> quorum-manager >>>> 6 gss03b.ebi.ac.uk >>>> 10.7.28.35 gss03b.ebi.ac.uk >>>> quorum-manager >>>> >>>> >>>> It was my understanding that the "admin node" should use a >>>> different interface ( a 1g link copper should be fine), while the >>>> daemon node is where the data was passing , so should point to the >>>> bonded 10g interfaces. but when i read the mmchnode man page i >>>> start to be quite confused. It says: >>>> >>>> --daemon-interface={hostname | ip_address} >>>> Specifies the host name or IP address _*to be used by the GPFS >>>> daemons for node-to-node communication*_. The host name or IP >>>> address must refer to the communication adapter over which the GPFS >>>> daemons communicate. >>>> Alias interfaces are not allowed. Use the >>>> original address or a name that is resolved by the host command to >>>> that original address. >>>> >>>> --admin-interface={hostname | ip_address} >>>> Specifies the name of the node to be used by GPFS administration >>>> commands when communicating between nodes. The admin node name must >>>> be specified as an IP address or a hostname that is resolved by >>>> the host command >>>> tothe desired IP address. If the keyword >>>> DEFAULT is specified, the admin interface for the node is set to be >>>> equal to the daemon interface for the node. >>>> >>>> What exactly means "node-to node-communications" ? >>>> Means DATA or also the "lease renew", and the token communication >>>> between the clients to get/steal the locks to be able to manage >>>> concurrent write to thr same file? >>>> Since we are getting expells ( especially when several clients >>>> contends the same file ) i assumed i have to split this type of >>>> packages from the data stream, but reading the documentation it >>>> looks to me that those internal comunication between nodes use the >>>> daemon-interface wich i suppose are used also for the data. so HOW >>>> exactly i can split them? >>>> >>>> >>>> Thanks in advance, >>>> Salvatore >>>> >>>> >>>> >>>> _______________________________________________ >>>> gpfsug-discuss mailing list >>>> gpfsug-discuss atgpfsug.org >>>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >>> >>> _______________________________________________ >>> gpfsug-discuss mailing list >>> gpfsug-discuss at gpfsug.org >>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >> > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at gpfsug.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From jtucker at pixitmedia.com Tue Jul 14 16:11:51 2015 From: jtucker at pixitmedia.com (Jez Tucker) Date: Tue, 14 Jul 2015 16:11:51 +0100 Subject: [gpfsug-discuss] Vim highlighting for GPFS available Message-ID: <55A526B7.6080602@pixitmedia.com> Hi everyone, I've released vim highlighting for GPFS policies as a public git repo. https://github.com/arcapix/vim-gpfs Pull requests welcome. Please enjoy your new colourful world. Jez p.s. Apologies to Emacs users. Head of R&D ArcaStream/Pixit Media -- This email is confidential in that it is intended for the exclusive attention of the addressee(s) indicated. If you are not the intended recipient, this email should not be read or disclosed to any other person. Please notify the sender immediately and delete this email from your computer system. Any opinions expressed are not necessarily those of the company from which this email was sent and, whilst to the best of our knowledge no viruses or defects exist, no responsibility can be accepted for any loss or damage arising from its receipt or subsequent use of this email. From jonbernard at gmail.com Wed Jul 15 09:19:49 2015 From: jonbernard at gmail.com (Jon Bernard) Date: Wed, 15 Jul 2015 10:19:49 +0200 Subject: [gpfsug-discuss] GPFS UG 10 Presentations - Sven Oehme In-Reply-To: References: Message-ID: If I may revive this: is trcio publicly available? Jon Bernard On Fri, May 2, 2014 at 5:06 PM, Bob Oesterlin wrote: > It Sven's presentation, he mentions a tools "trcio" (in > /xcat/oehmes/gpfs-clone) > > Where can I find that? > > Bob Oesterlin > > > > On Fri, May 2, 2014 at 9:49 AM, Jez Tucker (Chair) > wrote: > >> Hello all >> >> Firstly, thanks for the feedback we've had so far. Very much >> appreciated. >> >> Secondly, GPFS UG 10 Presentations are now available on the Presentations >> section of the website. >> Any outstanding presentations will follow shortly. >> >> See: http://www.gpfsug.org/ >> >> Best regards, >> >> Jez >> >> UG Chair >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at gpfsug.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >> > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at gpfsug.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From sdinardo at ebi.ac.uk Wed Jul 15 10:19:58 2015 From: sdinardo at ebi.ac.uk (Salvatore Di Nardo) Date: Wed, 15 Jul 2015 10:19:58 +0100 Subject: [gpfsug-discuss] data interface and management infercace. In-Reply-To: References: <559F9960.7010509@ebi.ac.uk> <55A3AF96.3060303@ebi.ac.uk> <5269A4E9-416B-4D70-AAE0-B86042FC96B9@ddn.com> <55A3BD4E.3000205@ebi.ac.uk> Message-ID: <55A625BE.9000809@ebi.ac.uk> Thanks for the input.. this is actually very interesting! Reading here: https://www.ibm.com/developerworks/community/wikis/home?lang=en#!/wiki/General+Parallel+File+System+%28GPFS%29/page/GPFS+Network+Communication+Overview , specifically the " Using more than one network" part it seems to me that this way we should be able to split the lease/token/ping from the data. Supposing that I implement a GSS cluster with only NDS and a second cluster with only clients: As far i understood if on the NDS cluster add first the subnet 10.20.0.0/16 and then 10.30.0.0 is should use the internal network for all the node-to-node comunication, leaving the 10.30.0.0/30 only for data traffic witht he remote cluster ( the clients). Similarly, in the client cluster, adding first 10.10.0.0/16 and then 10.30.0.0, will guarantee than the node-to-node comunication pass trough a different interface there the data is passing. Since the client are just "clients" the traffic trough 10.10.0.0/16 should be minimal (only token ,lease, ping and so on ) and not affected by the rest. Should be possible at this point move aldo the "admin network" on the internal interface, so we effectively splitted all the "non data" traffic on a dedicated interface. I'm wondering if I'm missing something, and in case i didn't, what could be the real traffic in the internal (black) networks ( 1g link its fine or i still need 10g for that). Another thing I I'm wondering its the load of the "non data" traffic between the clusters.. i suppose some "daemon traffic" goes trough the blue interface for the inter-cluster communication. Any thoughts ? Salvatore On 13/07/15 18:19, Muhammad Habib wrote: > Did you look at "subnets" parameter used with "mmchconfig" command. I > think you can use order list of subnets for daemon communication and > then actual daemon interface can be used for data transfer. When the > GPFS will start it will use actual daemon interface for communication > , however , once its started , it will use the IPs from the subnet > list whichever coming first in the list. To further validate , you > can put network sniffer before you do actual implementation or > alternatively you can open a PMR with IBM. > > If your cluster having expel situation , you may fine tune your > cluster e.g. increase ping timeout period , having multiple NSD > servers and distributing filesystems across these NSD servers. Also > critical servers can have HBA cards installed for direct I/O through > fiber. > > Thanks > > On Mon, Jul 13, 2015 at 11:22 AM, Jason Hick > wrote: > > Hi, > > Yes having separate data and management networks has been critical > for us for keeping health monitoring/communication unimpeded by > data movement. > > Not as important, but you can also tune the networks differently > (packet sizes, buffer sizes, SAK, etc) which can help. > > Jason > > On Jul 13, 2015, at 7:25 AM, Vic Cornell > wrote: > >> Hi Salvatore, >> >> I agree that that is what the manual - and some of the wiki >> entries say. >> >> However , when we have had problems (typically congestion) with >> ethernet networks in the past (20GbE or 40GbE) we have resolved >> them by setting up a separate ?Admin? network. >> >> The before and after cluster health we have seen measured in >> number of expels and waiters has been very marked. >> >> Maybe someone ?in the know? could comment on this split. >> >> Regards, >> >> Vic >> >> >>> On 13 Jul 2015, at 14:29, Salvatore Di Nardo >> > wrote: >>> >>> Hello Vic. >>> We are currently draining our gpfs to do all the recabling to >>> add a management network, but looking what the admin interface >>> does ( man mmchnode ) it says something different: >>> >>> --admin-interface={hostname | ip_address} >>> Specifies the name of the node to be used by GPFS >>> administration commands when communicating between >>> nodes. The admin node name must be specified as an IP >>> address or a hostname that is resolved by the host >>> command to the desired IP address. If the keyword >>> DEFAULT is specified, the admin interface for the >>> node is set to be equal to the daemon interface for the >>> node. >>> >>> >>> So, seems used only for commands propagation, hence have >>> nothing to do with the node-to-node traffic. Infact the other >>> interface description is: >>> >>> --daemon-interface={hostname | ip_address} >>> Specifies the host name or IP address _*to be used by >>> the GPFS daemons for node-to-node communication*_. The >>> host name or IP address must refer to the commu- >>> nication adapter over which the GPFS daemons >>> communicate. Alias interfaces are not allowed. Use the >>> original address or a name that is resolved by the >>> host command to that original address. >>> >>> >>> The "expired lease" issue and file locking mechanism a( most of >>> our expells happens when 2 clients try to write in the same >>> file) are exactly node-to node-comunication, so im wondering >>> what's the point to separate the "admin network". I want to be >>> sure to plan the right changes before we do a so massive task. >>> We are talking about adding a new interface on 700 clients, so >>> the recabling work its not small. >>> >>> >>> Regards, >>> Salvatore >>> >>> >>> >>> On 13/07/15 14:00, Vic Cornell wrote: >>>> Hi Salavatore, >>>> >>>> Does your GSS have the facility for a 1GbE ?management? >>>> network? If so I think that changing the ?admin? node names of >>>> the cluster members to a set of IPs on the management network >>>> would give you the split that you need. >>>> >>>> What about the clients? Can they also connect to a separate >>>> admin network? >>>> >>>> Remember that if you are using multi-cluster all of the nodes >>>> in both networks must share the same admin network. >>>> >>>> Kind Regards, >>>> >>>> Vic >>>> >>>> >>>>> On 13 Jul 2015, at 13:31, Salvatore Di Nardo >>>>> > wrote: >>>>> >>>>> Anyone? >>>>> >>>>> On 10/07/15 11:07, Salvatore Di Nardo wrote: >>>>>> Hello guys. >>>>>> Quite a while ago i mentioned that we have a big expel issue >>>>>> on our gss ( first gen) and white a lot people suggested that >>>>>> the root cause could be that we use the same interface for >>>>>> all the traffic, and that we should split the data network >>>>>> from the admin network. Finally we could plan a downtime and >>>>>> we are migrating the data out so, i can soon safelly play >>>>>> with the change, but looking what exactly i should to do i'm >>>>>> a bit puzzled. Our mmlscluster looks like this: >>>>>> >>>>>> GPFS cluster information >>>>>> ======================== >>>>>> GPFS cluster name: GSS.ebi.ac.uk >>>>>> >>>>>> GPFS cluster id: 17987981184946329605 >>>>>> GPFS UID domain: GSS.ebi.ac.uk >>>>>> >>>>>> Remote shell command: /usr/bin/ssh >>>>>> Remote file copy command: /usr/bin/scp >>>>>> >>>>>> GPFS cluster configuration servers: >>>>>> ----------------------------------- >>>>>> Primary server: gss01a.ebi.ac.uk >>>>>> >>>>>> Secondary server: gss02b.ebi.ac.uk >>>>>> >>>>>> >>>>>> Node Daemon node name IP address Admin node >>>>>> name Designation >>>>>> ----------------------------------------------------------------------- >>>>>> 1 gss01a.ebi.ac.uk >>>>>> 10.7.28.2 gss01a.ebi.ac.uk >>>>>> quorum-manager >>>>>> 2 gss01b.ebi.ac.uk >>>>>> 10.7.28.3 gss01b.ebi.ac.uk >>>>>> quorum-manager >>>>>> 3 gss02a.ebi.ac.uk >>>>>> 10.7.28.67 gss02a.ebi.ac.uk >>>>>> quorum-manager >>>>>> 4 gss02b.ebi.ac.uk >>>>>> 10.7.28.66 gss02b.ebi.ac.uk >>>>>> quorum-manager >>>>>> 5 gss03a.ebi.ac.uk >>>>>> 10.7.28.34 gss03a.ebi.ac.uk >>>>>> quorum-manager >>>>>> 6 gss03b.ebi.ac.uk >>>>>> 10.7.28.35 gss03b.ebi.ac.uk >>>>>> quorum-manager >>>>>> >>>>>> >>>>>> It was my understanding that the "admin node" should use a >>>>>> different interface ( a 1g link copper should be fine), while >>>>>> the daemon node is where the data was passing , so should >>>>>> point to the bonded 10g interfaces. but when i read the >>>>>> mmchnode man page i start to be quite confused. It says: >>>>>> >>>>>> --daemon-interface={hostname | ip_address} >>>>>> Specifies the host name or IP address _*to be used by the >>>>>> GPFS daemons for node-to-node communication*_. The host name >>>>>> or IP address must refer to the communication adapter over >>>>>> which the GPFS daemons communicate. >>>>>> Alias interfaces are not allowed. Use the original address or >>>>>> a name that is resolved by the host command to that original >>>>>> address. >>>>>> >>>>>> --admin-interface={hostname | ip_address} >>>>>> Specifies the name of the node to be used by GPFS >>>>>> administration commands when communicating between nodes. The >>>>>> admin node name must be specified as an IP address or a >>>>>> hostname that is resolved by the host command >>>>>> tothe desired IP address. If the >>>>>> keyword DEFAULT is specified, the admin interface for the >>>>>> node is set to be equal to the daemon interface for the node. >>>>>> >>>>>> What exactly means "node-to node-communications" ? >>>>>> Means DATA or also the "lease renew", and the token >>>>>> communication between the clients to get/steal the locks to >>>>>> be able to manage concurrent write to thr same file? >>>>>> Since we are getting expells ( especially when several >>>>>> clients contends the same file ) i assumed i have to split >>>>>> this type of packages from the data stream, but reading the >>>>>> documentation it looks to me that those internal comunication >>>>>> between nodes use the daemon-interface wich i suppose are >>>>>> used also for the data. so HOW exactly i can split them? >>>>>> >>>>>> >>>>>> Thanks in advance, >>>>>> Salvatore >>>>>> >>>>>> >>>>>> >>>>>> _______________________________________________ >>>>>> gpfsug-discuss mailing list >>>>>> gpfsug-discuss atgpfsug.org >>>>>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >>>>> >>>>> _______________________________________________ >>>>> gpfsug-discuss mailing list >>>>> gpfsug-discuss at gpfsug.org >>>>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >>>> >>> >>> _______________________________________________ >>> gpfsug-discuss mailing list >>> gpfsug-discuss at gpfsug.org >>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >> >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at gpfsug.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at gpfsug.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > > -- > This communication contains confidential information intended only for > the persons to whom it is addressed. Any other distribution, copying > or disclosure is strictly prohibited. If you have received this > communication in error, please notify the sender and delete this > e-mail message immediately. > > Le pr?sent message contient des renseignements de nature > confidentielle r?serv?s uniquement ? l'usage du destinataire. Toute > diffusion, distribution, divulgation, utilisation ou reproduction de > la pr?sente communication, et de tout fichier qui y est joint, est > strictement interdite. Si vous avez re?u le pr?sent message > ?lectronique par erreur, veuillez informer imm?diatement l'exp?diteur > et supprimer le message de votre ordinateur et de votre serveur. > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at gpfsug.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: gpfs.jpg Type: image/jpeg Size: 28904 bytes Desc: not available URL: From oehmes at gmail.com Wed Jul 15 15:33:11 2015 From: oehmes at gmail.com (Sven Oehme) Date: Wed, 15 Jul 2015 14:33:11 +0000 Subject: [gpfsug-discuss] GPFS UG 10 Presentations - Sven Oehme In-Reply-To: References: Message-ID: Hi Jon, the answer is no, its an development internal tool. sven On Wed, Jul 15, 2015 at 1:20 AM Jon Bernard wrote: > If I may revive this: is trcio publicly available? > > Jon Bernard > > On Fri, May 2, 2014 at 5:06 PM, Bob Oesterlin wrote: > >> It Sven's presentation, he mentions a tools "trcio" (in >> /xcat/oehmes/gpfs-clone) >> >> Where can I find that? >> >> Bob Oesterlin >> >> >> >> On Fri, May 2, 2014 at 9:49 AM, Jez Tucker (Chair) >> wrote: >> >>> Hello all >>> >>> Firstly, thanks for the feedback we've had so far. Very much >>> appreciated. >>> >>> Secondly, GPFS UG 10 Presentations are now available on the >>> Presentations section of the website. >>> Any outstanding presentations will follow shortly. >>> >>> See: http://www.gpfsug.org/ >>> >>> Best regards, >>> >>> Jez >>> >>> UG Chair >>> _______________________________________________ >>> gpfsug-discuss mailing list >>> gpfsug-discuss at gpfsug.org >>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >>> >> >> >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at gpfsug.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >> >> > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at gpfsug.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ewahl at osc.edu Wed Jul 15 15:37:57 2015 From: ewahl at osc.edu (Wahl, Edward) Date: Wed, 15 Jul 2015 14:37:57 +0000 Subject: [gpfsug-discuss] data interface and management infercace. In-Reply-To: <55A625BE.9000809@ebi.ac.uk> References: <559F9960.7010509@ebi.ac.uk> <55A3AF96.3060303@ebi.ac.uk> <5269A4E9-416B-4D70-AAE0-B86042FC96B9@ddn.com> <55A3BD4E.3000205@ebi.ac.uk> , <55A625BE.9000809@ebi.ac.uk> Message-ID: <9DA9EC7A281AC7428A9618AFDC49049955A606E4@CIO-KRC-D1MBX02.osuad.osu.edu> I don't see this in the thread but perhaps I missed it, what version are you running? I'm still on 3.5 so this is all based on that. A few notes for a little "heads up" here hoping to help with the pitfalls. I seem to recall a number of caveats when I did this a while back. Such as using the 'subnets' option being discussed, stops GPFS from failing over to other TCP networks when there are failures. VERY important! 'mmdiag --network' will show your setup. Definitely verify this if failing downwards is in your plans. We fail from 56Gb RDMA->10GbE TCP-> 1GbE here. And having had it work during some bad power events last year it was VERY nice that the users only noticed a slowdown when we completely lost Lustre and other resources. Also I recall that there was a restriction on having multiple private networks, and some special switch to force this. I have a note about "privateSubnetOverride" so you might read up about this. I seem to recall this was for TCP connections and daemonnodename being a private IP. Or maybe it was that AND mmlscluster having private IPs as well? I think the developerworks wiki had some writeup on this. I don't see it in the admin manuals. Hopefully this may help as you plan this out. Ed Wahl OSC ________________________________ From: gpfsug-discuss-bounces at gpfsug.org [gpfsug-discuss-bounces at gpfsug.org] on behalf of Salvatore Di Nardo [sdinardo at ebi.ac.uk] Sent: Wednesday, July 15, 2015 5:19 AM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] data interface and management infercace. Thanks for the input.. this is actually very interesting! Reading here: https://www.ibm.com/developerworks/community/wikis/home?lang=en#!/wiki/General+Parallel+File+System+%28GPFS%29/page/GPFS+Network+Communication+Overview , specifically the " Using more than one network" part it seems to me that this way we should be able to split the lease/token/ping from the data. Supposing that I implement a GSS cluster with only NDS and a second cluster with only clients: [cid:part1.03040109.00080709 at ebi.ac.uk] As far i understood if on the NDS cluster add first the subnet 10.20.0.0/16 and then 10.30.0.0 is should use the internal network for all the node-to-node comunication, leaving the 10.30.0.0/30 only for data traffic witht he remote cluster ( the clients). Similarly, in the client cluster, adding first 10.10.0.0/16 and then 10.30.0.0, will guarantee than the node-to-node comunication pass trough a different interface there the data is passing. Since the client are just "clients" the traffic trough 10.10.0.0/16 should be minimal (only token ,lease, ping and so on ) and not affected by the rest. Should be possible at this point move aldo the "admin network" on the internal interface, so we effectively splitted all the "non data" traffic on a dedicated interface. I'm wondering if I'm missing something, and in case i didn't, what could be the real traffic in the internal (black) networks ( 1g link its fine or i still need 10g for that). Another thing I I'm wondering its the load of the "non data" traffic between the clusters.. i suppose some "daemon traffic" goes trough the blue interface for the inter-cluster communication. Any thoughts ? Salvatore On 13/07/15 18:19, Muhammad Habib wrote: Did you look at "subnets" parameter used with "mmchconfig" command. I think you can use order list of subnets for daemon communication and then actual daemon interface can be used for data transfer. When the GPFS will start it will use actual daemon interface for communication , however , once its started , it will use the IPs from the subnet list whichever coming first in the list. To further validate , you can put network sniffer before you do actual implementation or alternatively you can open a PMR with IBM. If your cluster having expel situation , you may fine tune your cluster e.g. increase ping timeout period , having multiple NSD servers and distributing filesystems across these NSD servers. Also critical servers can have HBA cards installed for direct I/O through fiber. Thanks On Mon, Jul 13, 2015 at 11:22 AM, Jason Hick > wrote: Hi, Yes having separate data and management networks has been critical for us for keeping health monitoring/communication unimpeded by data movement. Not as important, but you can also tune the networks differently (packet sizes, buffer sizes, SAK, etc) which can help. Jason On Jul 13, 2015, at 7:25 AM, Vic Cornell > wrote: Hi Salvatore, I agree that that is what the manual - and some of the wiki entries say. However , when we have had problems (typically congestion) with ethernet networks in the past (20GbE or 40GbE) we have resolved them by setting up a separate ?Admin? network. The before and after cluster health we have seen measured in number of expels and waiters has been very marked. Maybe someone ?in the know? could comment on this split. Regards, Vic On 13 Jul 2015, at 14:29, Salvatore Di Nardo > wrote: Hello Vic. We are currently draining our gpfs to do all the recabling to add a management network, but looking what the admin interface does ( man mmchnode ) it says something different: --admin-interface={hostname | ip_address} Specifies the name of the node to be used by GPFS administration commands when communicating between nodes. The admin node name must be specified as an IP address or a hostname that is resolved by the host command to the desired IP address. If the keyword DEFAULT is specified, the admin interface for the node is set to be equal to the daemon interface for the node. So, seems used only for commands propagation, hence have nothing to do with the node-to-node traffic. Infact the other interface description is: --daemon-interface={hostname | ip_address} Specifies the host name or IP address to be used by the GPFS daemons for node-to-node communication. The host name or IP address must refer to the commu- nication adapter over which the GPFS daemons communicate. Alias interfaces are not allowed. Use the original address or a name that is resolved by the host command to that original address. The "expired lease" issue and file locking mechanism a( most of our expells happens when 2 clients try to write in the same file) are exactly node-to node-comunication, so im wondering what's the point to separate the "admin network". I want to be sure to plan the right changes before we do a so massive task. We are talking about adding a new interface on 700 clients, so the recabling work its not small. Regards, Salvatore On 13/07/15 14:00, Vic Cornell wrote: Hi Salavatore, Does your GSS have the facility for a 1GbE ?management? network? If so I think that changing the ?admin? node names of the cluster members to a set of IPs on the management network would give you the split that you need. What about the clients? Can they also connect to a separate admin network? Remember that if you are using multi-cluster all of the nodes in both networks must share the same admin network. Kind Regards, Vic On 13 Jul 2015, at 13:31, Salvatore Di Nardo > wrote: Anyone? On 10/07/15 11:07, Salvatore Di Nardo wrote: Hello guys. Quite a while ago i mentioned that we have a big expel issue on our gss ( first gen) and white a lot people suggested that the root cause could be that we use the same interface for all the traffic, and that we should split the data network from the admin network. Finally we could plan a downtime and we are migrating the data out so, i can soon safelly play with the change, but looking what exactly i should to do i'm a bit puzzled. Our mmlscluster looks like this: GPFS cluster information ======================== GPFS cluster name: GSS.ebi.ac.uk GPFS cluster id: 17987981184946329605 GPFS UID domain: GSS.ebi.ac.uk Remote shell command: /usr/bin/ssh Remote file copy command: /usr/bin/scp GPFS cluster configuration servers: ----------------------------------- Primary server: gss01a.ebi.ac.uk Secondary server: gss02b.ebi.ac.uk Node Daemon node name IP address Admin node name Designation ----------------------------------------------------------------------- 1 gss01a.ebi.ac.uk 10.7.28.2 gss01a.ebi.ac.uk quorum-manager 2 gss01b.ebi.ac.uk 10.7.28.3 gss01b.ebi.ac.uk quorum-manager 3 gss02a.ebi.ac.uk 10.7.28.67 gss02a.ebi.ac.uk quorum-manager 4 gss02b.ebi.ac.uk 10.7.28.66 gss02b.ebi.ac.uk quorum-manager 5 gss03a.ebi.ac.uk 10.7.28.34 gss03a.ebi.ac.uk quorum-manager 6 gss03b.ebi.ac.uk 10.7.28.35 gss03b.ebi.ac.uk quorum-manager It was my understanding that the "admin node" should use a different interface ( a 1g link copper should be fine), while the daemon node is where the data was passing , so should point to the bonded 10g interfaces. but when i read the mmchnode man page i start to be quite confused. It says: --daemon-interface={hostname | ip_address} Specifies the host name or IP address to be used by the GPFS daemons for node-to-node communication. The host name or IP address must refer to the communication adapter over which the GPFS daemons communicate. Alias interfaces are not allowed. Use the original address or a name that is resolved by the host command to that original address. --admin-interface={hostname | ip_address} Specifies the name of the node to be used by GPFS administration commands when communicating between nodes. The admin node name must be specified as an IP address or a hostname that is resolved by the host command to the desired IP address. If the keyword DEFAULT is specified, the admin interface for the node is set to be equal to the daemon interface for the node. What exactly means "node-to node-communications" ? Means DATA or also the "lease renew", and the token communication between the clients to get/steal the locks to be able to manage concurrent write to thr same file? Since we are getting expells ( especially when several clients contends the same file ) i assumed i have to split this type of packages from the data stream, but reading the documentation it looks to me that those internal comunication between nodes use the daemon-interface wich i suppose are used also for the data. so HOW exactly i can split them? Thanks in advance, Salvatore _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -- This communication contains confidential information intended only for the persons to whom it is addressed. Any other distribution, copying or disclosure is strictly prohibited. If you have received this communication in error, please notify the sender and delete this e-mail message immediately. Le pr?sent message contient des renseignements de nature confidentielle r?serv?s uniquement ? l'usage du destinataire. Toute diffusion, distribution, divulgation, utilisation ou reproduction de la pr?sente communication, et de tout fichier qui y est joint, est strictement interdite. Si vous avez re?u le pr?sent message ?lectronique par erreur, veuillez informer imm?diatement l'exp?diteur et supprimer le message de votre ordinateur et de votre serveur. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: gpfs.jpg Type: image/jpeg Size: 28904 bytes Desc: gpfs.jpg URL: From S.J.Thompson at bham.ac.uk Sun Jul 19 11:45:09 2015 From: S.J.Thompson at bham.ac.uk (Simon Thompson (Research Computing - IT Services)) Date: Sun, 19 Jul 2015 10:45:09 +0000 Subject: [gpfsug-discuss] 4.1.1 immutable filesets Message-ID: I was wondering if anyone had looked at the immutable fileset features in 4.1.1? In particular I was looking at the iam compliant mode, but I've a couple of questions. * if I have an iam compliant fileset, and it contains immutable files or directories, can I still unlink and delete the filset? * will HSM work with immutable files? I.e. Can I migrate files to tape and restore them? The docs mention that extended attributes can be updated internally by dmapi, so I guess HSM might work? Thanks Simon From kraemerf at de.ibm.com Sun Jul 19 13:45:35 2015 From: kraemerf at de.ibm.com (Frank Kraemer) Date: Sun, 19 Jul 2015 14:45:35 +0200 Subject: [gpfsug-discuss] Immutable fileset features In-Reply-To: References: Message-ID: >I was wondering if anyone had looked at the immutable fileset features in 4.1.1? yes, Nils Haustein has see: https://www.ibm.com/developerworks/community/blogs/storageneers/entry/Insight_to_the_IBM_Spectrum_Scale_GPFS_Immutability_function Frank Kraemer IBM Consulting IT Specialist / Client Technical Architect Hechtsheimer Str. 2, 55131 Mainz mailto:kraemerf at de.ibm.com voice: +49171-3043699 IBM Germany From S.J.Thompson at bham.ac.uk Sun Jul 19 14:35:47 2015 From: S.J.Thompson at bham.ac.uk (Simon Thompson (Research Computing - IT Services)) Date: Sun, 19 Jul 2015 13:35:47 +0000 Subject: [gpfsug-discuss] Immutable fileset features In-Reply-To: References: , Message-ID: Hi Frank, Yeah id read that this.morning, which is why I was asking... I couldn't see anything about HSM in there or if its possible to delete a fileset with immutable files. I remember Scott (maybe) mentioning it at the gpfs ug meeting in York, but I thought that was immutable file systems, which you have to destroy. Simon ________________________________________ From: gpfsug-discuss-bounces at gpfsug.org [gpfsug-discuss-bounces at gpfsug.org] on behalf of Frank Kraemer [kraemerf at de.ibm.com] Sent: 19 July 2015 13:45 To: gpfsug-discuss at gpfsug.org Subject: [gpfsug-discuss] Immutable fileset features >I was wondering if anyone had looked at the immutable fileset features in 4.1.1? yes, Nils Haustein has see: https://www.ibm.com/developerworks/community/blogs/storageneers/entry/Insight_to_the_IBM_Spectrum_Scale_GPFS_Immutability_function Frank Kraemer IBM Consulting IT Specialist / Client Technical Architect Hechtsheimer Str. 2, 55131 Mainz mailto:kraemerf at de.ibm.com voice: +49171-3043699 IBM Germany _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss From S.J.Thompson at bham.ac.uk Sun Jul 19 21:09:26 2015 From: S.J.Thompson at bham.ac.uk (Simon Thompson (Research Computing - IT Services)) Date: Sun, 19 Jul 2015 20:09:26 +0000 Subject: [gpfsug-discuss] Immutable fileset features In-Reply-To: References: Message-ID: On 19/07/2015 13:45, "Frank Kraemer" wrote: >>I was wondering if anyone had looked at the immutable fileset features in >4.1.1? > >yes, Nils Haustein has see: > >https://www.ibm.com/developerworks/community/blogs/storageneers/entry/Insi >ght_to_the_IBM_Spectrum_Scale_GPFS_Immutability_function I was re-reading some of this blog post and am a little confused. It talks about setting retention times by setting the ATIME from touch, or by using -E to mmchattr. Does that mean if a file is accessed, then the ATIME is updated and so the retention period is changed? What if our retention policy is based on last access time of file +period of time. I was thinking it would be useful to do a policy scan to find newly access files and then set the retention (either directly by policy if possible? Or by passing the file list to a script). Would this work or if the ATIME is overloaded, then I guess we can?t use this? Finally, is this a feature that is supported by IBM? The -E flag for mmchattr is neither in the man page nor the online docs at: http://www-01.ibm.com/support/knowledgecenter/#!/STXKQY_4.1.1/com.ibm.spect rum.scale.v4r11.adm.doc/bl1adm_mmchattr.htm (My possibly incorrect understanding was that if its documented, then is supported, otherwise it might work)? Simon From jamiedavis at us.ibm.com Mon Jul 20 13:26:17 2015 From: jamiedavis at us.ibm.com (James Davis) Date: Mon, 20 Jul 2015 08:26:17 -0400 Subject: [gpfsug-discuss] Immutable fileset features In-Reply-To: References: Message-ID: <201507200027.t6K0RD8b003417@d01av02.pok.ibm.com> Simon, I spoke to a tester who worked on this line item. She thinks mmchattr -E should have been documented. We will follow up. If it was an oversight it should be corrected soon. Jamie Jamie Davis GPFS Functional Verification Test (FVT) jamiedavis at us.ibm.com From: "Simon Thompson (Research Computing - IT Services)" To: gpfsug main discussion list Date: 19-07-15 04:09 PM Subject: Re: [gpfsug-discuss] Immutable fileset features Sent by: gpfsug-discuss-bounces at gpfsug.org On 19/07/2015 13:45, "Frank Kraemer" wrote: >>I was wondering if anyone had looked at the immutable fileset features in >4.1.1? > >yes, Nils Haustein has see: > >https://www.ibm.com/developerworks/community/blogs/storageneers/entry/Insi >ght_to_the_IBM_Spectrum_Scale_GPFS_Immutability_function I was re-reading some of this blog post and am a little confused. It talks about setting retention times by setting the ATIME from touch, or by using -E to mmchattr. Does that mean if a file is accessed, then the ATIME is updated and so the retention period is changed? What if our retention policy is based on last access time of file +period of time. I was thinking it would be useful to do a policy scan to find newly access files and then set the retention (either directly by policy if possible? Or by passing the file list to a script). Would this work or if the ATIME is overloaded, then I guess we can?t use this? Finally, is this a feature that is supported by IBM? The -E flag for mmchattr is neither in the man page nor the online docs at: http://www-01.ibm.com/support/knowledgecenter/#!/STXKQY_4.1.1/com.ibm.spect rum.scale.v4r11.adm.doc/bl1adm_mmchattr.htm (My possibly incorrect understanding was that if its documented, then is supported, otherwise it might work)? Simon _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: From Luke.Raimbach at crick.ac.uk Mon Jul 20 08:02:01 2015 From: Luke.Raimbach at crick.ac.uk (Luke Raimbach) Date: Mon, 20 Jul 2015 07:02:01 +0000 Subject: [gpfsug-discuss] 4.1.1 immutable filesets In-Reply-To: References: Message-ID: Can I add to this list of questions? Apparently, one cannot set immutable, or append-only attributes on files / directories within an AFM cache. However, if I have an independent writer and set immutability at home, what does the AFM IW cache do about this? Or does this restriction just apply to entire filesets (which would make more sense)? Cheers, Luke. -----Original Message----- From: gpfsug-discuss-bounces at gpfsug.org [mailto:gpfsug-discuss-bounces at gpfsug.org] On Behalf Of Simon Thompson (Research Computing - IT Services) Sent: 19 July 2015 11:45 To: gpfsug main discussion list Subject: [gpfsug-discuss] 4.1.1 immutable filesets I was wondering if anyone had looked at the immutable fileset features in 4.1.1? In particular I was looking at the iam compliant mode, but I've a couple of questions. * if I have an iam compliant fileset, and it contains immutable files or directories, can I still unlink and delete the filset? * will HSM work with immutable files? I.e. Can I migrate files to tape and restore them? The docs mention that extended attributes can be updated internally by dmapi, so I guess HSM might work? Thanks Simon _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss The Francis Crick Institute Limited is a registered charity in England and Wales no. 1140062 and a company registered in England and Wales no. 06885462, with its registered office at 215 Euston Road, London NW1 2BE. From kallbac at iu.edu Wed Jul 22 11:50:58 2015 From: kallbac at iu.edu (Kristy Kallback-Rose) Date: Wed, 22 Jul 2015 06:50:58 -0400 Subject: [gpfsug-discuss] SMB support and config In-Reply-To: References: <0193637F-4448-4068-A049-3A40A9AFD998@iu.edu> Message-ID: <203758A6-C7E0-4D3F-BA31-A130CF92DCBC@iu.edu> Yes interested, please post. We?ll probably keep running Samba separately, as we do today, for quite some time, but will be facing this transition at some point so we can be supported by IBM for Samba. On Jul 10, 2015, at 8:06 AM, Simon Thompson (Research Computing - IT Services) wrote: > So IBM came back and said what I was doing wasn?t supported. > > They did say that you can use ?user defined? authentication. Which I?ve > got working now on my environment (figured what I was doing wrong, and you > can?t use mmsmb to do some of the bits I need for it to work for user > defined mode for me...). But I still think it needs a patch to one of the > files for CES for use in user defined authentication. (Right now it > appears to remove all my ?user defined? settings from nsswitch.conf when > you stop CES/GPFS on a node). I?ve supplied my patch to IBM which works > for my case, we?ll see what they do about it? > > (If people are interested, I?ll gather my notes into a blog post). > > Simon > > On 06/07/2015 23:06, "Kallback-Rose, Kristy A" wrote: > >> Just to chime in as another interested party, we do something fairly >> similar but use sssd instead of nslcd. Very interested to see how >> accommodating the IBM Samba is to local configuration needs. >> >> Best, >> Kristy >> >> On Jul 6, 2015, at 6:09 AM, Simon Thompson (Research Computing - IT >> Services) wrote: >> >>> Hi, >>> >>> (sorry, lots of questions about this stuff at the moment!) >>> >>> I?m currently looking at removing the sernet smb configs we had >>> previously >>> and moving to IBM SMB. I?ve removed all the old packages and only now >>> have >>> gpfs.smb installed on the systems. >>> >>> I?m struggling to get the config tools to work for our environment. >>> >>> We have MS Windows AD Domain for authentication. For various reasons, >>> however doesn?t hold the UIDs/GIDs, which are instead held in a >>> different >>> LDAP directory. >>> >>> In the past, we?d configure the Linux servers running Samba so that >>> NSLCD >>> was configured to get details from the LDAP server. (e.g. getent passwd >>> would return the data for an AD user). The Linux boxes would also be >>> configured to use KRB5 authentication where users were allowed to ssh >>> etc >>> in for password authentication. >>> >>> So as far as Samba was concerned, it would do ?security = ADS? and then >>> we?d also have "idmap config * : backend = tdb2? >>> >>> I.e. Use Domain for authentication, but look locally for ID mapping >>> data. >>> >>> Now I can configured IBM SMB to use ADS for authentication: >>> >>> mmuserauth service create --type ad --data-access-method file >>> --netbios-name its-rds --user-name ADMINUSER --servers DOMAIN.ADF >>> --idmap-role subordinate >>> >>> >>> However I can?t see anyway for me to manipulate the config so that it >>> doesn?t use autorid. Using this we end up with: >>> >>> mmsmb config list | grep -i idmap >>> idmap config * : backend autorid >>> idmap config * : range 10000000-299999999 >>> idmap config * : rangesize 1000000 >>> idmap config * : read only yes >>> idmap:cache no >>> >>> >>> It also adds: >>> >>> mmsmb config list | grep -i auth >>> auth methods guest sam winbind >>> >>> (though I don?t think that is a problem). >>> >>> >>> I also can?t change the idmap using the mmsmb command (I think would >>> look >>> like this): >>> # mmsmb config change --option="idmap config * : backend=tdb2" >>> idmap config * : backend=tdb2: [E] Unsupported smb option. More >>> information about smb options is availabe in the man page. >>> >>> >>> >>> I can?t see anything in the docs at: >>> >>> http://www-01.ibm.com/support/knowledgecenter/#!/STXKQY_4.1.1/com.ibm.spe >>> ct >>> rum.scale.v4r11.adm.doc/bl1adm_configfileauthentication.htm >>> >>> That give me a clue how to do what I want. >>> >>> I?d be happy to do some mixture of AD for authentication and LDAP for >>> lookups (rather than just falling back to ?local? from nslcd), but I >>> can?t >>> see a way to do this, and ?manual? seems to stop ADS authentication in >>> Samba. >>> >>> Anyone got any suggestions? >>> >>> >>> Thanks >>> >>> Simon >>> >>> >>> _______________________________________________ >>> gpfsug-discuss mailing list >>> gpfsug-discuss at gpfsug.org >>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >> >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at gpfsug.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at gpfsug.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 495 bytes Desc: Message signed with OpenPGP using GPGMail URL: From S.J.Thompson at bham.ac.uk Wed Jul 22 11:59:56 2015 From: S.J.Thompson at bham.ac.uk (Simon Thompson (Research Computing - IT Services)) Date: Wed, 22 Jul 2015 10:59:56 +0000 Subject: [gpfsug-discuss] SMB support and config In-Reply-To: <203758A6-C7E0-4D3F-BA31-A130CF92DCBC@iu.edu> References: <0193637F-4448-4068-A049-3A40A9AFD998@iu.edu> , <203758A6-C7E0-4D3F-BA31-A130CF92DCBC@iu.edu> Message-ID: Hi Kristy, Funny you should ask, I wrote it up last night... http://www.roamingzebra.co.uk/2015/07/smb-protocol-support-with-spectrum.html They did tell me it was all tested with Samba 4, so should work, subject to you checking your own smb config options. But i like not having to build it myself now ;) The move was actually pretty easy and in theory you can run mixed over existing nodes and upgraded protocol nodes, but you might need a different clustered name. Simon ________________________________________ From: gpfsug-discuss-bounces at gpfsug.org [gpfsug-discuss-bounces at gpfsug.org] on behalf of Kristy Kallback-Rose [kallbac at iu.edu] Sent: 22 July 2015 11:50 To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] SMB support and config Yes interested, please post. We?ll probably keep running Samba separately, as we do today, for quite some time, but will be facing this transition at some point so we can be supported by IBM for Samba. On Jul 10, 2015, at 8:06 AM, Simon Thompson (Research Computing - IT Services) wrote: > So IBM came back and said what I was doing wasn?t supported. > > They did say that you can use ?user defined? authentication. Which I?ve > got working now on my environment (figured what I was doing wrong, and you > can?t use mmsmb to do some of the bits I need for it to work for user > defined mode for me...). But I still think it needs a patch to one of the > files for CES for use in user defined authentication. (Right now it > appears to remove all my ?user defined? settings from nsswitch.conf when > you stop CES/GPFS on a node). I?ve supplied my patch to IBM which works > for my case, we?ll see what they do about it? > > (If people are interested, I?ll gather my notes into a blog post). > > Simon > > On 06/07/2015 23:06, "Kallback-Rose, Kristy A" wrote: > >> Just to chime in as another interested party, we do something fairly >> similar but use sssd instead of nslcd. Very interested to see how >> accommodating the IBM Samba is to local configuration needs. >> >> Best, >> Kristy >> >> On Jul 6, 2015, at 6:09 AM, Simon Thompson (Research Computing - IT >> Services) wrote: >> >>> Hi, >>> >>> (sorry, lots of questions about this stuff at the moment!) >>> >>> I?m currently looking at removing the sernet smb configs we had >>> previously >>> and moving to IBM SMB. I?ve removed all the old packages and only now >>> have >>> gpfs.smb installed on the systems. >>> >>> I?m struggling to get the config tools to work for our environment. >>> >>> We have MS Windows AD Domain for authentication. For various reasons, >>> however doesn?t hold the UIDs/GIDs, which are instead held in a >>> different >>> LDAP directory. >>> >>> In the past, we?d configure the Linux servers running Samba so that >>> NSLCD >>> was configured to get details from the LDAP server. (e.g. getent passwd >>> would return the data for an AD user). The Linux boxes would also be >>> configured to use KRB5 authentication where users were allowed to ssh >>> etc >>> in for password authentication. >>> >>> So as far as Samba was concerned, it would do ?security = ADS? and then >>> we?d also have "idmap config * : backend = tdb2? >>> >>> I.e. Use Domain for authentication, but look locally for ID mapping >>> data. >>> >>> Now I can configured IBM SMB to use ADS for authentication: >>> >>> mmuserauth service create --type ad --data-access-method file >>> --netbios-name its-rds --user-name ADMINUSER --servers DOMAIN.ADF >>> --idmap-role subordinate >>> >>> >>> However I can?t see anyway for me to manipulate the config so that it >>> doesn?t use autorid. Using this we end up with: >>> >>> mmsmb config list | grep -i idmap >>> idmap config * : backend autorid >>> idmap config * : range 10000000-299999999 >>> idmap config * : rangesize 1000000 >>> idmap config * : read only yes >>> idmap:cache no >>> >>> >>> It also adds: >>> >>> mmsmb config list | grep -i auth >>> auth methods guest sam winbind >>> >>> (though I don?t think that is a problem). >>> >>> >>> I also can?t change the idmap using the mmsmb command (I think would >>> look >>> like this): >>> # mmsmb config change --option="idmap config * : backend=tdb2" >>> idmap config * : backend=tdb2: [E] Unsupported smb option. More >>> information about smb options is availabe in the man page. >>> >>> >>> >>> I can?t see anything in the docs at: >>> >>> http://www-01.ibm.com/support/knowledgecenter/#!/STXKQY_4.1.1/com.ibm.spe >>> ct >>> rum.scale.v4r11.adm.doc/bl1adm_configfileauthentication.htm >>> >>> That give me a clue how to do what I want. >>> >>> I?d be happy to do some mixture of AD for authentication and LDAP for >>> lookups (rather than just falling back to ?local? from nslcd), but I >>> can?t >>> see a way to do this, and ?manual? seems to stop ADS authentication in >>> Samba. >>> >>> Anyone got any suggestions? >>> >>> >>> Thanks >>> >>> Simon >>> >>> >>> _______________________________________________ >>> gpfsug-discuss mailing list >>> gpfsug-discuss at gpfsug.org >>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >> >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at gpfsug.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at gpfsug.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss From mhabib73 at gmail.com Wed Jul 22 13:58:51 2015 From: mhabib73 at gmail.com (Muhammad Habib) Date: Wed, 22 Jul 2015 08:58:51 -0400 Subject: [gpfsug-discuss] data interface and management infercace. In-Reply-To: <55A625BE.9000809@ebi.ac.uk> References: <559F9960.7010509@ebi.ac.uk> <55A3AF96.3060303@ebi.ac.uk> <5269A4E9-416B-4D70-AAE0-B86042FC96B9@ddn.com> <55A3BD4E.3000205@ebi.ac.uk> <55A625BE.9000809@ebi.ac.uk> Message-ID: did you implement it ? looks ok. All daemon traffic should be going through black network including inter-cluster daemon traffic ( assume black subnet routable). All data traffic should be going through the blue network. You may need to run iptrace or tcpdump to make sure proper network are in use. You can always open a PMR if you having issue during the configuration . Thanks On Wed, Jul 15, 2015 at 5:19 AM, Salvatore Di Nardo wrote: > Thanks for the input.. this is actually very interesting! > > Reading here: > https://www.ibm.com/developerworks/community/wikis/home?lang=en#!/wiki/General+Parallel+File+System+%28GPFS%29/page/GPFS+Network+Communication+Overview > , > specifically the " Using more than one network" part it seems to me that > this way we should be able to split the lease/token/ping from the data. > > Supposing that I implement a GSS cluster with only NDS and a second > cluster with only clients: > > > > As far i understood if on the NDS cluster add first the subnet > 10.20.0.0/16 and then 10.30.0.0 is should use the internal network for > all the node-to-node comunication, leaving the 10.30.0.0/30 only for data > traffic witht he remote cluster ( the clients). Similarly, in the client > cluster, adding first 10.10.0.0/16 and then 10.30.0.0, will guarantee > than the node-to-node comunication pass trough a different interface there > the data is passing. Since the client are just "clients" the traffic trough > 10.10.0.0/16 should be minimal (only token ,lease, ping and so on ) and > not affected by the rest. Should be possible at this point move aldo the > "admin network" on the internal interface, so we effectively splitted all > the "non data" traffic on a dedicated interface. > > I'm wondering if I'm missing something, and in case i didn't, what could > be the real traffic in the internal (black) networks ( 1g link its fine or > i still need 10g for that). Another thing I I'm wondering its the load of > the "non data" traffic between the clusters.. i suppose some "daemon > traffic" goes trough the blue interface for the inter-cluster > communication. > > > Any thoughts ? > > Salvatore > > On 13/07/15 18:19, Muhammad Habib wrote: > > Did you look at "subnets" parameter used with "mmchconfig" command. I > think you can use order list of subnets for daemon communication and then > actual daemon interface can be used for data transfer. When the GPFS will > start it will use actual daemon interface for communication , however , > once its started , it will use the IPs from the subnet list whichever > coming first in the list. To further validate , you can put network > sniffer before you do actual implementation or alternatively you can open a > PMR with IBM. > > If your cluster having expel situation , you may fine tune your cluster > e.g. increase ping timeout period , having multiple NSD servers and > distributing filesystems across these NSD servers. Also critical servers > can have HBA cards installed for direct I/O through fiber. > > Thanks > > On Mon, Jul 13, 2015 at 11:22 AM, Jason Hick wrote: > >> Hi, >> >> Yes having separate data and management networks has been critical for >> us for keeping health monitoring/communication unimpeded by data movement. >> >> Not as important, but you can also tune the networks differently >> (packet sizes, buffer sizes, SAK, etc) which can help. >> >> Jason >> >> On Jul 13, 2015, at 7:25 AM, Vic Cornell wrote: >> >> Hi Salvatore, >> >> I agree that that is what the manual - and some of the wiki entries say. >> >> However , when we have had problems (typically congestion) with >> ethernet networks in the past (20GbE or 40GbE) we have resolved them by >> setting up a separate ?Admin? network. >> >> The before and after cluster health we have seen measured in number of >> expels and waiters has been very marked. >> >> Maybe someone ?in the know? could comment on this split. >> >> Regards, >> >> Vic >> >> >> On 13 Jul 2015, at 14:29, Salvatore Di Nardo wrote: >> >> Hello Vic. >> We are currently draining our gpfs to do all the recabling to add a >> management network, but looking what the admin interface does ( man >> mmchnode ) it says something different: >> >> --admin-interface={hostname | ip_address} >> Specifies the name of the node to be used by >> GPFS administration commands when communicating between nodes. The admin >> node name must be specified as an IP >> address or a hostname that is resolved by the >> host command to the desired IP address. If the keyword DEFAULT is >> specified, the admin interface for the >> node is set to be equal to the daemon interface >> for the node. >> >> >> So, seems used only for commands propagation, hence have nothing to do >> with the node-to-node traffic. Infact the other interface description is: >> >> --daemon-interface={hostname | ip_address} >> Specifies the host name or IP address *to be >> used by the GPFS daemons for node-to-node communication*. The host name >> or IP address must refer to the commu- >> nication adapter over which the GPFS daemons >> communicate. Alias interfaces are not allowed. Use the original address or >> a name that is resolved by the >> host command to that original address. >> >> >> The "expired lease" issue and file locking mechanism a( most of our >> expells happens when 2 clients try to write in the same file) are exactly >> node-to node-comunication, so im wondering what's the point to separate >> the "admin network". I want to be sure to plan the right changes before we >> do a so massive task. We are talking about adding a new interface on 700 >> clients, so the recabling work its not small. >> >> >> Regards, >> Salvatore >> >> >> >> On 13/07/15 14:00, Vic Cornell wrote: >> >> Hi Salavatore, >> >> Does your GSS have the facility for a 1GbE ?management? network? If so >> I think that changing the ?admin? node names of the cluster members to a >> set of IPs on the management network would give you the split that you need. >> >> What about the clients? Can they also connect to a separate admin >> network? >> >> Remember that if you are using multi-cluster all of the nodes in both >> networks must share the same admin network. >> >> Kind Regards, >> >> Vic >> >> >> On 13 Jul 2015, at 13:31, Salvatore Di Nardo wrote: >> >> Anyone? >> >> On 10/07/15 11:07, Salvatore Di Nardo wrote: >> >> Hello guys. >> Quite a while ago i mentioned that we have a big expel issue on our gss >> ( first gen) and white a lot people suggested that the root cause could be >> that we use the same interface for all the traffic, and that we should >> split the data network from the admin network. Finally we could plan a >> downtime and we are migrating the data out so, i can soon safelly play with >> the change, but looking what exactly i should to do i'm a bit puzzled. Our >> mmlscluster looks like this: >> >> GPFS cluster information >> ======================== >> GPFS cluster name: GSS.ebi.ac.uk >> GPFS cluster id: 17987981184946329605 >> GPFS UID domain: GSS.ebi.ac.uk >> Remote shell command: /usr/bin/ssh >> Remote file copy command: /usr/bin/scp >> >> GPFS cluster configuration servers: >> ----------------------------------- >> Primary server: gss01a.ebi.ac.uk >> Secondary server: gss02b.ebi.ac.uk >> >> Node Daemon node name IP address Admin node name Designation >> ----------------------------------------------------------------------- >> 1 gss01a.ebi.ac.uk 10.7.28.2 gss01a.ebi.ac.uk quorum-manager >> 2 gss01b.ebi.ac.uk 10.7.28.3 gss01b.ebi.ac.uk quorum-manager >> 3 gss02a.ebi.ac.uk 10.7.28.67 gss02a.ebi.ac.uk quorum-manager >> 4 gss02b.ebi.ac.uk 10.7.28.66 gss02b.ebi.ac.uk quorum-manager >> 5 gss03a.ebi.ac.uk 10.7.28.34 gss03a.ebi.ac.uk quorum-manager >> 6 gss03b.ebi.ac.uk 10.7.28.35 gss03b.ebi.ac.uk quorum-manager >> >> >> It was my understanding that the "admin node" should use a different >> interface ( a 1g link copper should be fine), while the daemon node is >> where the data was passing , so should point to the bonded 10g interfaces. >> but when i read the mmchnode man page i start to be quite confused. It says: >> >> --daemon-interface={hostname | ip_address} >> Specifies the host name or IP address *to be >> used by the GPFS daemons for node-to-node communication*. The host name >> or IP address must refer to the communication adapter over which the GPFS >> daemons communicate. >> Alias interfaces are not allowed. Use the >> original address or a name that is resolved by the host command to that >> original address. >> >> --admin-interface={hostname | ip_address} >> Specifies the name of the node to be used by >> GPFS administration commands when communicating between nodes. The admin >> node name must be specified as an IP address or a hostname that is resolved >> by the host command >> to the desired IP address. If the keyword >> DEFAULT is specified, the admin interface for the node is set to be equal >> to the daemon interface for the node. >> >> What exactly means "node-to node-communications" ? >> Means DATA or also the "lease renew", and the token communication between >> the clients to get/steal the locks to be able to manage concurrent write to >> thr same file? >> Since we are getting expells ( especially when several clients contends >> the same file ) i assumed i have to split this type of packages from the >> data stream, but reading the documentation it looks to me that those >> internal comunication between nodes use the daemon-interface wich i suppose >> are used also for the data. so HOW exactly i can split them? >> >> >> Thanks in advance, >> Salvatore >> >> >> >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at gpfsug.orghttp://gpfsug.org/mailman/listinfo/gpfsug-discuss >> >> >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at gpfsug.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >> >> >> >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at gpfsug.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >> >> >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at gpfsug.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >> >> >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at gpfsug.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >> >> > > > -- > This communication contains confidential information intended only for the > persons to whom it is addressed. Any other distribution, copying or > disclosure is strictly prohibited. If you have received this communication > in error, please notify the sender and delete this e-mail message > immediately. > > Le pr?sent message contient des renseignements de nature confidentielle > r?serv?s uniquement ? l'usage du destinataire. Toute diffusion, > distribution, divulgation, utilisation ou reproduction de la pr?sente > communication, et de tout fichier qui y est joint, est strictement > interdite. Si vous avez re?u le pr?sent message ?lectronique par erreur, > veuillez informer imm?diatement l'exp?diteur et supprimer le message de > votre ordinateur et de votre serveur. > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at gpfsug.orghttp://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at gpfsug.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > -- This communication contains confidential information intended only for the persons to whom it is addressed. Any other distribution, copying or disclosure is strictly prohibited. If you have received this communication in error, please notify the sender and delete this e-mail message immediately. Le pr?sent message contient des renseignements de nature confidentielle r?serv?s uniquement ? l'usage du destinataire. Toute diffusion, distribution, divulgation, utilisation ou reproduction de la pr?sente communication, et de tout fichier qui y est joint, est strictement interdite. Si vous avez re?u le pr?sent message ?lectronique par erreur, veuillez informer imm?diatement l'exp?diteur et supprimer le message de votre ordinateur et de votre serveur. -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: gpfs.jpg Type: image/jpeg Size: 28904 bytes Desc: not available URL: From sdinardo at ebi.ac.uk Wed Jul 22 14:51:04 2015 From: sdinardo at ebi.ac.uk (Salvatore Di Nardo) Date: Wed, 22 Jul 2015 14:51:04 +0100 Subject: [gpfsug-discuss] data interface and management infercace. In-Reply-To: References: <559F9960.7010509@ebi.ac.uk> <55A3AF96.3060303@ebi.ac.uk> <5269A4E9-416B-4D70-AAE0-B86042FC96B9@ddn.com> <55A3BD4E.3000205@ebi.ac.uk> <55A625BE.9000809@ebi.ac.uk> Message-ID: <55AF9FC8.6050107@ebi.ac.uk> Hello, no, still didn't anything because we have to drain 2PB data , into a slower storage.. so it will take few weeks. I expect doing it the second half of August. Will let you all know the results once done and properly tested. Salvatore On 22/07/15 13:58, Muhammad Habib wrote: > did you implement it ? looks ok. All daemon traffic should be going > through black network including inter-cluster daemon traffic ( assume > black subnet routable). All data traffic should be going through the > blue network. You may need to run iptrace or tcpdump to make sure > proper network are in use. You can always open a PMR if you having > issue during the configuration . > > Thanks > > On Wed, Jul 15, 2015 at 5:19 AM, Salvatore Di Nardo > > wrote: > > Thanks for the input.. this is actually very interesting! > > Reading here: > https://www.ibm.com/developerworks/community/wikis/home?lang=en#!/wiki/General+Parallel+File+System+%28GPFS%29/page/GPFS+Network+Communication+Overview > > , > specifically the " Using more than one network" part it seems to > me that this way we should be able to split the lease/token/ping > from the data. > > Supposing that I implement a GSS cluster with only NDS and a > second cluster with only clients: > > > > As far i understood if on the NDS cluster add first the subnet > 10.20.0.0/16 and then 10.30.0.0 is should > use the internal network for all the node-to-node comunication, > leaving the 10.30.0.0/30 only for data > traffic witht he remote cluster ( the clients). Similarly, in the > client cluster, adding first 10.10.0.0/16 > and then 10.30.0.0, will guarantee than the node-to-node > comunication pass trough a different interface there the data is > passing. Since the client are just "clients" the traffic trough > 10.10.0.0/16 should be minimal (only token > ,lease, ping and so on ) and not affected by the rest. Should be > possible at this point move aldo the "admin network" on the > internal interface, so we effectively splitted all the "non data" > traffic on a dedicated interface. > > I'm wondering if I'm missing something, and in case i didn't, what > could be the real traffic in the internal (black) networks ( 1g > link its fine or i still need 10g for that). Another thing I I'm > wondering its the load of the "non data" traffic between the > clusters.. i suppose some "daemon traffic" goes trough the blue > interface for the inter-cluster communication. > > > Any thoughts ? > > Salvatore > > On 13/07/15 18:19, Muhammad Habib wrote: >> Did you look at "subnets" parameter used with "mmchconfig" >> command. I think you can use order list of subnets for daemon >> communication and then actual daemon interface can be used for >> data transfer. When the GPFS will start it will use actual >> daemon interface for communication , however , once its started , >> it will use the IPs from the subnet list whichever coming first >> in the list. To further validate , you can put network sniffer >> before you do actual implementation or alternatively you can open >> a PMR with IBM. >> >> If your cluster having expel situation , you may fine tune your >> cluster e.g. increase ping timeout period , having multiple NSD >> servers and distributing filesystems across these NSD servers. >> Also critical servers can have HBA cards installed for direct I/O >> through fiber. >> >> Thanks >> >> On Mon, Jul 13, 2015 at 11:22 AM, Jason Hick > > wrote: >> >> Hi, >> >> Yes having separate data and management networks has been >> critical for us for keeping health monitoring/communication >> unimpeded by data movement. >> >> Not as important, but you can also tune the networks >> differently (packet sizes, buffer sizes, SAK, etc) which can >> help. >> >> Jason >> >> On Jul 13, 2015, at 7:25 AM, Vic Cornell >> > wrote: >> >>> Hi Salvatore, >>> >>> I agree that that is what the manual - and some of the wiki >>> entries say. >>> >>> However , when we have had problems (typically congestion) >>> with ethernet networks in the past (20GbE or 40GbE) we have >>> resolved them by setting up a separate ?Admin? network. >>> >>> The before and after cluster health we have seen measured in >>> number of expels and waiters has been very marked. >>> >>> Maybe someone ?in the know? could comment on this split. >>> >>> Regards, >>> >>> Vic >>> >>> >>>> On 13 Jul 2015, at 14:29, Salvatore Di Nardo >>>> > wrote: >>>> >>>> Hello Vic. >>>> We are currently draining our gpfs to do all the recabling >>>> to add a management network, but looking what the admin >>>> interface does ( man mmchnode ) it says something different: >>>> >>>> --admin-interface={hostname | ip_address} >>>> Specifies the name of the node to be used by GPFS >>>> administration commands when communicating between >>>> nodes. The admin node name must be specified as an IP >>>> address or a hostname that is resolved by the host >>>> command to the desired IP address. If the keyword >>>> DEFAULT is specified, the admin interface for the >>>> node is set to be equal to the daemon interface for >>>> the node. >>>> >>>> >>>> So, seems used only for commands propagation, hence have >>>> nothing to do with the node-to-node traffic. Infact the >>>> other interface description is: >>>> >>>> --daemon-interface={hostname | ip_address} >>>> Specifies the host name or IP address _*to be used >>>> by the GPFS daemons for node-to-node >>>> communication*_. The host name or IP address must >>>> refer to the commu- >>>> nication adapter over which the GPFS daemons >>>> communicate. Alias interfaces are not allowed. Use >>>> the original address or a name that is resolved >>>> by the >>>> host command to that original address. >>>> >>>> >>>> The "expired lease" issue and file locking mechanism a( >>>> most of our expells happens when 2 clients try to write in >>>> the same file) are exactly node-to node-comunication, so >>>> im wondering what's the point to separate the "admin >>>> network". I want to be sure to plan the right changes >>>> before we do a so massive task. We are talking about adding >>>> a new interface on 700 clients, so the recabling work its >>>> not small. >>>> >>>> >>>> Regards, >>>> Salvatore >>>> >>>> >>>> >>>> On 13/07/15 14:00, Vic Cornell wrote: >>>>> Hi Salavatore, >>>>> >>>>> Does your GSS have the facility for a 1GbE ?management? >>>>> network? If so I think that changing the ?admin? node >>>>> names of the cluster members to a set of IPs on the >>>>> management network would give you the split that you need. >>>>> >>>>> What about the clients? Can they also connect to a >>>>> separate admin network? >>>>> >>>>> Remember that if you are using multi-cluster all of the >>>>> nodes in both networks must share the same admin network. >>>>> >>>>> Kind Regards, >>>>> >>>>> Vic >>>>> >>>>> >>>>>> On 13 Jul 2015, at 13:31, Salvatore Di Nardo >>>>>> > wrote: >>>>>> >>>>>> Anyone? >>>>>> >>>>>> On 10/07/15 11:07, Salvatore Di Nardo wrote: >>>>>>> Hello guys. >>>>>>> Quite a while ago i mentioned that we have a big expel >>>>>>> issue on our gss ( first gen) and white a lot people >>>>>>> suggested that the root cause could be that we use the >>>>>>> same interface for all the traffic, and that we should >>>>>>> split the data network from the admin network. Finally >>>>>>> we could plan a downtime and we are migrating the data >>>>>>> out so, i can soon safelly play with the change, but >>>>>>> looking what exactly i should to do i'm a bit puzzled. >>>>>>> Our mmlscluster looks like this: >>>>>>> >>>>>>> GPFS cluster information >>>>>>> ======================== >>>>>>> GPFS cluster name: GSS.ebi.ac.uk >>>>>>> >>>>>>> GPFS cluster id: 17987981184946329605 >>>>>>> GPFS UID domain: GSS.ebi.ac.uk >>>>>>> >>>>>>> Remote shell command: /usr/bin/ssh >>>>>>> Remote file copy command: /usr/bin/scp >>>>>>> >>>>>>> GPFS cluster configuration servers: >>>>>>> ----------------------------------- >>>>>>> Primary server: gss01a.ebi.ac.uk >>>>>>> >>>>>>> Secondary server: gss02b.ebi.ac.uk >>>>>>> >>>>>>> >>>>>>> Node Daemon node name IP address Admin >>>>>>> node name Designation >>>>>>> ----------------------------------------------------------------------- >>>>>>> 1 gss01a.ebi.ac.uk >>>>>>> 10.7.28.2 >>>>>>> gss01a.ebi.ac.uk >>>>>>> quorum-manager >>>>>>> 2 gss01b.ebi.ac.uk >>>>>>> 10.7.28.3 >>>>>>> gss01b.ebi.ac.uk >>>>>>> quorum-manager >>>>>>> 3 gss02a.ebi.ac.uk >>>>>>> 10.7.28.67 >>>>>>> gss02a.ebi.ac.uk >>>>>>> quorum-manager >>>>>>> 4 gss02b.ebi.ac.uk >>>>>>> 10.7.28.66 >>>>>>> gss02b.ebi.ac.uk >>>>>>> quorum-manager >>>>>>> 5 gss03a.ebi.ac.uk >>>>>>> 10.7.28.34 >>>>>>> gss03a.ebi.ac.uk >>>>>>> quorum-manager >>>>>>> 6 gss03b.ebi.ac.uk >>>>>>> 10.7.28.35 >>>>>>> gss03b.ebi.ac.uk >>>>>>> quorum-manager >>>>>>> >>>>>>> >>>>>>> It was my understanding that the "admin node" should use >>>>>>> a different interface ( a 1g link copper should be >>>>>>> fine), while the daemon node is where the data was >>>>>>> passing , so should point to the bonded 10g interfaces. >>>>>>> but when i read the mmchnode man page i start to be >>>>>>> quite confused. It says: >>>>>>> >>>>>>> --daemon-interface={hostname | ip_address} >>>>>>> Specifies the host name or IP address _*to be used by >>>>>>> the GPFS daemons for node-to-node communication*_. The >>>>>>> host name or IP address must refer to the communication >>>>>>> adapter over which the GPFS daemons communicate. >>>>>>> Alias interfaces are not allowed. Use the >>>>>>> original address or a name that is resolved by the host >>>>>>> command to that original address. >>>>>>> >>>>>>> --admin-interface={hostname | ip_address} >>>>>>> Specifies the name of the node to be used by GPFS >>>>>>> administration commands when communicating between >>>>>>> nodes. The admin node name must be specified as an IP >>>>>>> address or a hostname that is resolved by the host command >>>>>>> tothe desired IP address. If the keyword >>>>>>> DEFAULT is specified, the admin interface for the node >>>>>>> is set to be equal to the daemon interface for the node. >>>>>>> >>>>>>> What exactly means "node-to node-communications" ? >>>>>>> Means DATA or also the "lease renew", and the token >>>>>>> communication between the clients to get/steal the locks >>>>>>> to be able to manage concurrent write to thr same file? >>>>>>> Since we are getting expells ( especially when several >>>>>>> clients contends the same file ) i assumed i have to >>>>>>> split this type of packages from the data stream, but >>>>>>> reading the documentation it looks to me that those >>>>>>> internal comunication between nodes use the >>>>>>> daemon-interface wich i suppose are used also for the >>>>>>> data. so HOW exactly i can split them? >>>>>>> >>>>>>> >>>>>>> Thanks in advance, >>>>>>> Salvatore >>>>>>> >>>>>>> >>>>>>> >>>>>>> _______________________________________________ >>>>>>> gpfsug-discuss mailing list >>>>>>> gpfsug-discuss atgpfsug.org >>>>>>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >>>>>> >>>>>> _______________________________________________ >>>>>> gpfsug-discuss mailing list >>>>>> gpfsug-discuss at gpfsug.org >>>>>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >>>>> >>>> >>>> _______________________________________________ >>>> gpfsug-discuss mailing list >>>> gpfsug-discuss at gpfsug.org >>>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >>> >>> _______________________________________________ >>> gpfsug-discuss mailing list >>> gpfsug-discuss at gpfsug.org >>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >> >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at gpfsug.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >> >> >> >> >> -- >> This communication contains confidential information intended >> only for the persons to whom it is addressed. Any other >> distribution, copying or disclosure is strictly prohibited. If >> you have received this communication in error, please notify the >> sender and delete this e-mail message immediately. >> >> Le pr?sent message contient des renseignements de nature >> confidentielle r?serv?s uniquement ? l'usage du destinataire. >> Toute diffusion, distribution, divulgation, utilisation ou >> reproduction de la pr?sente communication, et de tout fichier qui >> y est joint, est strictement interdite. Si vous avez re?u le >> pr?sent message ?lectronique par erreur, veuillez informer >> imm?diatement l'exp?diteur et supprimer le message de votre >> ordinateur et de votre serveur. >> >> >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss atgpfsug.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at gpfsug.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > > -- > This communication contains confidential information intended only for > the persons to whom it is addressed. Any other distribution, copying > or disclosure is strictly prohibited. If you have received this > communication in error, please notify the sender and delete this > e-mail message immediately. > > Le pr?sent message contient des renseignements de nature > confidentielle r?serv?s uniquement ? l'usage du destinataire. Toute > diffusion, distribution, divulgation, utilisation ou reproduction de > la pr?sente communication, et de tout fichier qui y est joint, est > strictement interdite. Si vous avez re?u le pr?sent message > ?lectronique par erreur, veuillez informer imm?diatement l'exp?diteur > et supprimer le message de votre ordinateur et de votre serveur. > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at gpfsug.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/jpeg Size: 28904 bytes Desc: not available URL: From S.J.Thompson at bham.ac.uk Mon Jul 27 22:24:11 2015 From: S.J.Thompson at bham.ac.uk (Simon Thompson (Research Computing - IT Services)) Date: Mon, 27 Jul 2015 21:24:11 +0000 Subject: [gpfsug-discuss] GPFS and Community Scientific Cloud Message-ID: Hi Ulf, Thanks for the email, as suggested, I'm copying this to the GPFS UG mailing list as well as I'm sure the discussion is of interest to others. I guess what we're looking to do is to have arbitrary VMs running provided by users (I.e. Completely untrusted), but to provide them a way to get secure access to only their data. Right now we can't give them a GPFS client as this is too trusting, I was wondering how easy it would be for us to implement something like: User has a VM User runs 'kinit user at DOMAIN' to gain kerberos ticket and can then securely gain access to only their files from my NFS server. I also mentioned Janet ASSENT, which is a relatively recent project: https://jisc.ac.uk/assent (It was piloted as Janet Moonshot). Which builds on top of SAML to provide other software access to federation. My understanding is that site-specific UID mapping is needed (e.g. On the NFS/GPFS server). Simon >I have some experience with the following questions: > >> NFS just isn?t built for security really. I guess NFSv4 with KRB5 is >> one option to look at, with user based credentials. That might just >> about be feasible if the user were do authenticate with kinit before >> being able to access NFSv4 mounted files. I.e. Its done at the user >> level rather than the instance level. That might be an interesting >> project as a feasibility study to look at, will it work? How would >> we integrate into a federated access management system (something >> like UK Federation and ABFAB/Moonshot/Assent maybe?). Could we >> provide easy steps for a user in a VM to follow? Can we even make it >> work with Ganesha in such an environment? > > >Kerberized NFSv3 and Kerberized NFSv4 provide nearly the same level of >security. Kerberos makes the difference and not the NFS version. I have >posted some background information to the GPFS forum: >http://ibm.co/1VFLUR4 > >Kerberized NFSv4 has the advantage that it allows different UID/GID ranges >on NFS server and NFS client. I have led a proof-of-concept where we have >used this feature to provide secure data access to personalized patient >data for multiple tenants where the tenants had conflicting UID/GID >ranges. >I have some material which I will share via the GPFS forum. > >UK Federation seems to be based on SAML/Shibboleth. Unfortunately there is >no easy integration of network file protocols such as NFS and SMB and >SAML/Shibboleth, because file protocols require attributes which are >typically not stored in SAML/Shibboleth. Fortunately I provided technical >guidance to a customer who exactly implemented this integration in order >to >provide secure file service to multiple universities, again with >conflicting UID/GID ranges. I need some time to write it up and publish >it. From martin.gasthuber at desy.de Tue Jul 28 17:28:44 2015 From: martin.gasthuber at desy.de (Martin Gasthuber) Date: Tue, 28 Jul 2015 18:28:44 +0200 Subject: [gpfsug-discuss] fast ACL alter solution Message-ID: Hi, since a few months we're running a new infrastructure, with the core built on GPFS (4.1.0.8), for 'light source - X-Rays' experiments local at the site. The system is used for the data acquisition chain, data analysis, data exports and archive. Right now we got new detector types (homebuilt, experimental) generating millions of small files - the last run produced ~9 million files at 64 to 128K in size ;-). In our setup, the files gets copied to a (user accessible) GPFS instance which controls the access by NFSv4 ACLs (only !) and from time to time, we had to modify these ACLs (add/remove user/group etc.). Doing a (non policy-run based) simple approach, changing 9 million files requires ~200 hours to run - which we consider not really a good option. Running mmgetacl/mmputacl whithin a policy-run will clearly speed that up - but the biggest time consuming operations are the get and put ACL ops. Is anybody aware of any faster ACL access operation (whithin the policy-run) - or even a 'mod-acl' operation ? best regards, Martin From jonathan at buzzard.me.uk Tue Jul 28 19:06:30 2015 From: jonathan at buzzard.me.uk (Jonathan Buzzard) Date: Tue, 28 Jul 2015 19:06:30 +0100 Subject: [gpfsug-discuss] fast ACL alter solution In-Reply-To: References: Message-ID: <55B7C4A6.9020205@buzzard.me.uk> On 28/07/15 17:28, Martin Gasthuber wrote: > Hi, > > since a few months we're running a new infrastructure, with the core > built on GPFS (4.1.0.8), for 'light source - X-Rays' experiments > local at the site. The system is used for the data acquisition chain, > data analysis, data exports and archive. Right now we got new > detector types (homebuilt, experimental) generating millions of small > files - the last run produced ~9 million files at 64 to 128K in size > ;-). In our setup, the files gets copied to a (user accessible) GPFS > instance which controls the access by NFSv4 ACLs (only !) and from > time to time, we had to modify these ACLs (add/remove user/group > etc.). Doing a (non policy-run based) simple approach, changing 9 > million files requires ~200 hours to run - which we consider not > really a good option. Running mmgetacl/mmputacl whithin a policy-run > will clearly speed that up - but the biggest time consuming > operations are the get and put ACL ops. Is anybody aware of any > faster ACL access operation (whithin the policy-run) - or even a > 'mod-acl' operation ? > In the past IBM have said that their expectations are that the ACL's are set via Windows on remote workstations and not from the command line on the GPFS servers themselves!!! Crazy I know. There really needs to be a mm version of the NFSv4 setfacl/nfs4_getfacl commands that ideally makes use of the fast inode traversal features to make things better. In the past I wrote some C code that set specific ACL's on files. This however was to deal with migrating files onto a system and needed to set initial ACL's and didn't make use of the fast traversal features and is completely unpolished. A good starting point would probably be the FreeBSD setfacl/getfacl tools, that at least was my plan but I have never gotten around to it. JAB. -- Jonathan A. Buzzard Email: jonathan (at) buzzard.me.uk Fife, United Kingdom. From TROPPENS at de.ibm.com Wed Jul 29 09:02:59 2015 From: TROPPENS at de.ibm.com (Ulf Troppens) Date: Wed, 29 Jul 2015 10:02:59 +0200 Subject: [gpfsug-discuss] GPFS and Community Scientific Cloud In-Reply-To: References: Message-ID: Hi Simon, I have started to draft a response, but it gets longer and longer. I need some more time to respond. Best regards, Ulf. -- IBM Spectrum Scale Development - Client Engagements & Solutions Delivery Consulting IT Specialist Author "Storage Networks Explained" IBM Deutschland Research & Development GmbH Vorsitzende des Aufsichtsrats: Martina Koederitz Gesch?ftsf?hrung: Dirk Wittkopp Sitz der Gesellschaft: B?blingen / Registergericht: Amtsgericht Stuttgart, HRB 243294 From: "Simon Thompson (Research Computing - IT Services)" To: gpfsug main discussion list Date: 27.07.2015 23:24 Subject: Re: [gpfsug-discuss] GPFS and Community Scientific Cloud Sent by: gpfsug-discuss-bounces at gpfsug.org Hi Ulf, Thanks for the email, as suggested, I'm copying this to the GPFS UG mailing list as well as I'm sure the discussion is of interest to others. I guess what we're looking to do is to have arbitrary VMs running provided by users (I.e. Completely untrusted), but to provide them a way to get secure access to only their data. Right now we can't give them a GPFS client as this is too trusting, I was wondering how easy it would be for us to implement something like: User has a VM User runs 'kinit user at DOMAIN' to gain kerberos ticket and can then securely gain access to only their files from my NFS server. I also mentioned Janet ASSENT, which is a relatively recent project: https://jisc.ac.uk/assent (It was piloted as Janet Moonshot). Which builds on top of SAML to provide other software access to federation. My understanding is that site-specific UID mapping is needed (e.g. On the NFS/GPFS server). Simon >I have some experience with the following questions: > >> NFS just isn?t built for security really. I guess NFSv4 with KRB5 is >> one option to look at, with user based credentials. That might just >> about be feasible if the user were do authenticate with kinit before >> being able to access NFSv4 mounted files. I.e. Its done at the user >> level rather than the instance level. That might be an interesting >> project as a feasibility study to look at, will it work? How would >> we integrate into a federated access management system (something >> like UK Federation and ABFAB/Moonshot/Assent maybe?). Could we >> provide easy steps for a user in a VM to follow? Can we even make it >> work with Ganesha in such an environment? > > >Kerberized NFSv3 and Kerberized NFSv4 provide nearly the same level of >security. Kerberos makes the difference and not the NFS version. I have >posted some background information to the GPFS forum: >http://ibm.co/1VFLUR4 > >Kerberized NFSv4 has the advantage that it allows different UID/GID ranges >on NFS server and NFS client. I have led a proof-of-concept where we have >used this feature to provide secure data access to personalized patient >data for multiple tenants where the tenants had conflicting UID/GID >ranges. >I have some material which I will share via the GPFS forum. > >UK Federation seems to be based on SAML/Shibboleth. Unfortunately there is >no easy integration of network file protocols such as NFS and SMB and >SAML/Shibboleth, because file protocols require attributes which are >typically not stored in SAML/Shibboleth. Fortunately I provided technical >guidance to a customer who exactly implemented this integration in order >to >provide secure file service to multiple universities, again with >conflicting UID/GID ranges. I need some time to write it up and publish >it. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss From chair at gpfsug.org Thu Jul 30 21:36:07 2015 From: chair at gpfsug.org (chair-gpfsug.org) Date: Thu, 30 Jul 2015 21:36:07 +0100 Subject: [gpfsug-discuss] July Meet the devs Message-ID: I've heard some great feedback about the July meet the devs held at IBM Warwick this week. Thanks to Ross and Patrick at IBM and Clare for coordinating the registration for this! Jez has a few photos so we'll try and get those uploaded in the next week or so to the website. Simon (GPFS UG Chair)