From jonathan.buzzard at strath.ac.uk Fri Oct 2 17:14:12 2020 From: jonathan.buzzard at strath.ac.uk (Jonathan Buzzard) Date: Fri, 2 Oct 2020 17:14:12 +0100 Subject: [gpfsug-discuss] Services on DSS/ESS nodes Message-ID: <38bbcd24-4cc2-2abe-754f-3017d71c1eaa@strath.ac.uk> What if any are the rules around running additional services on DSS/ESS nodes with regard to support? Let me outline our scenario Our main cluster uses 10Gbps ethernet for storage with the DSS-G nodes hooked up with redundant 40Gbps ethernet. However we have an older cluster that is used for undergraduate teaching that only has 1Gbps ethernet and QDR Infiniband. With no money to upgrade this to 10Gbps ethernet to support this we flipped one of the ports on the ConnectX4 cards on each DSS-G node to Infiniband and using IPoIB run the teaching nodes in this way. However it means that we need an Ethernet to Infiniband gateway as the ethernet only connected nodes want to talk to the Infiniband connected ones on their Infiniband address. Not a problem we grabbed an old spare machine installed CentOS and configured it up to act as a bridge, and deploy a custom route to all the ethernet only connected nodes. It has been working fine for a couple of years now. The problem is that this becomes firstly a single point of failure, on hardware that is six years old now. Secondly to apply updates on the gateway machine means all the teaching nodes have to be drained and GPFS umounted to reboot the machine after updates have been installed. It is currently not getting patched as frequently as I would like (and required by the Scottish government). So thinking about it I have come to the conclusion that the ideal situation would be to use the DSS-G nodes as the gateway and run keepalived to move the gateway ethernet IP address between the two machines. It is idea because as long as one DSS-G node is up then there is a functioning gateway and nodes don't get ejected from the cluster. If both DSS-G nodes are down then there is no GPFS to mount anyway and lack of a gateway is a moot point. I grabbed a couple of the teaching compute nodes in the summer and trialed it out. It works a treat. I now need to check IBM are not going to throw a wobbler down the line if I need to get support before deploying it to the DSS-G nodes :-) JAB. -- Jonathan A. Buzzard Tel: +44141-5483420 HPC System Administrator, ARCHIE-WeSt. University of Strathclyde, John Anderson Building, Glasgow. G4 0NG From abeattie at au1.ibm.com Fri Oct 2 23:19:15 2020 From: abeattie at au1.ibm.com (Andrew Beattie) Date: Fri, 2 Oct 2020 22:19:15 +0000 Subject: [gpfsug-discuss] Services on DSS/ESS nodes In-Reply-To: <38bbcd24-4cc2-2abe-754f-3017d71c1eaa@strath.ac.uk> Message-ID: Jonathan, I suggest you get a formal statement from Lenovo as the DSS-G Platform is no longer an IBM platform. But for ESS based platforms the answer would be, it is not supported to run anything on the IO Servers other than GNR and the relevant Scale management services, due to the fact that if you lose an IO Server, or if you in an extended maintenance window the Server needs to host all the work that would be being performed by both IO servers. I don't know if Lenovo have different point if view. Regards, Andrew Sent from my iPhone > On 3 Oct 2020, at 02:14, Jonathan Buzzard wrote: > > > What if any are the rules around running additional services on DSS/ESS > nodes with regard to support? Let me outline our scenario > > Our main cluster uses 10Gbps ethernet for storage with the DSS-G nodes > hooked up with redundant 40Gbps ethernet. > > However we have an older cluster that is used for undergraduate teaching > that only has 1Gbps ethernet and QDR Infiniband. With no money to > upgrade this to 10Gbps ethernet to support this we flipped one of the > ports on the ConnectX4 cards on each DSS-G node to Infiniband and using > IPoIB run the teaching nodes in this way. > > However it means that we need an Ethernet to Infiniband gateway as the > ethernet only connected nodes want to talk to the Infiniband connected > ones on their Infiniband address. Not a problem we grabbed an old spare > machine installed CentOS and configured it up to act as a bridge, and > deploy a custom route to all the ethernet only connected nodes. It has > been working fine for a couple of years now. > > The problem is that this becomes firstly a single point of failure, on > hardware that is six years old now. Secondly to apply updates on the > gateway machine means all the teaching nodes have to be drained and GPFS > umounted to reboot the machine after updates have been installed. It is > currently not getting patched as frequently as I would like (and > required by the Scottish government). > > So thinking about it I have come to the conclusion that the ideal > situation would be to use the DSS-G nodes as the gateway and run > keepalived to move the gateway ethernet IP address between the two > machines. It is idea because as long as one DSS-G node is up then there > is a functioning gateway and nodes don't get ejected from the cluster. > If both DSS-G nodes are down then there is no GPFS to mount anyway and > lack of a gateway is a moot point. > > I grabbed a couple of the teaching compute nodes in the summer and > trialed it out. It works a treat. > > I now need to check IBM are not going to throw a wobbler down the line > if I need to get support before deploying it to the DSS-G nodes :-) > > > JAB. > > -- > Jonathan A. Buzzard Tel: +44141-5483420 > HPC System Administrator, ARCHIE-WeSt. > University of Strathclyde, John Anderson Building, Glasgow. G4 0NG > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jonathan.buzzard at strath.ac.uk Sat Oct 3 11:06:41 2020 From: jonathan.buzzard at strath.ac.uk (Jonathan Buzzard) Date: Sat, 3 Oct 2020 11:06:41 +0100 Subject: [gpfsug-discuss] Services on DSS/ESS nodes In-Reply-To: References: Message-ID: <7af0ac41-a280-5ecd-3658-7af761a4bf9b@strath.ac.uk> On 02/10/2020 23:19, Andrew Beattie wrote: > Jonathan, > I suggest you get a formal statement from Lenovo as the DSS-G Platform > is no longer an IBM platform. > > But for ESS based platforms the answer would be, it is not supported to > run anything on the IO Servers other than GNR and the relevant Scale > management services, due to the fact that if you lose an IO Server, or > if you in an extended maintenance window the Server needs to host all > the work that would be being performed by both IO servers. > In the past ~500 days the Infiniband to Ethernet gateway has shifted ~13GB of data, or about 25MB a day. Meanwhile in the last 470 days the DSS-G nodes have each shifted several PB. The proposed additional traffic is a drop in the ocean. On my actual routers which shift much more data (over 300TB externally) with an uptime of ~180 days at the moment the CPU time consumed by keepalived is just under 31 minutes or about 8 seconds a day. These are much punier CPU's too. The proposed additional CPU usage is another drop in the ocean. Given Lenovo sold the *same* configuration with x3650's and SR650's the "need all the CPU grunt" is somewhat fishy. Between the bid being submitted and actual tender award the SR650's came out and we paid a bit extra to uplift to the newer server hardware with exactly the same disk configuration. I believe IBM have done the same with the ESS/GNR servers too over time the same applies there too. IMHO given keepalived is a base RHEL package, IBM/Lenovo should be offering running Infiniband to Ethernet gateways on the DSS/ESS nodes as a supported configuration for mixed network technology clusters :-) Running a couple extra servers for this purpose is obnoxious from an environmental standpoint. That's IBM's green credentials out the window if you ask me. I would note under those rules running a Nagios, Zabbix etc. client on the nodes is not permitted either. I would suggest that most sites would be rather unhappy about that :-) > I don't know if Lenovo have different point if view. > Problem is when I ring up for support on my DSS-G I speak to an IBM employee not a Lenovo one :-) JAB. -- Jonathan A. Buzzard Tel: +44141-5483420 HPC System Administrator, ARCHIE-WeSt. University of Strathclyde, John Anderson Building, Glasgow. G4 0NG From abeattie at au1.ibm.com Sat Oct 3 11:55:05 2020 From: abeattie at au1.ibm.com (Andrew Beattie) Date: Sat, 3 Oct 2020 10:55:05 +0000 Subject: [gpfsug-discuss] Services on DSS/ESS nodes In-Reply-To: <7af0ac41-a280-5ecd-3658-7af761a4bf9b@strath.ac.uk> Message-ID: Why do you need to run any kind of monitoring client on an IO server the GUI / performance monitor already does all of that work for you and collects the data on the dedicated EMS server. If you have a small storage environment the. Yes the processor and memory may feel like overkill, but tuned appropriately an IO server will use all the memory you can give it to drive IO performance, If you want to run a hybrid / non standard architecture then the IBM ESS / DGSS platform may not be the right platform in comparison to a build your own architecture, how ever you then take all the support issues onto your self rather than it being the vendors problem. Sent from my iPhone > On 3 Oct 2020, at 20:06, Jonathan Buzzard wrote: > > On 02/10/2020 23:19, Andrew Beattie wrote: >> Jonathan, >> I suggest you get a formal statement from Lenovo as the DSS-G Platform >> is no longer an IBM platform. >> >> But for ESS based platforms the answer would be, it is not supported to >> run anything on the IO Servers other than GNR and the relevant Scale >> management services, due to the fact that if you lose an IO Server, or >> if you in an extended maintenance window the Server needs to host all >> the work that would be being performed by both IO servers. >> > > In the past ~500 days the Infiniband to Ethernet gateway has shifted > ~13GB of data, or about 25MB a day. Meanwhile in the last 470 days the > DSS-G nodes have each shifted several PB. The proposed additional > traffic is a drop in the ocean. > > On my actual routers which shift much more data (over 300TB externally) > with an uptime of ~180 days at the moment the CPU time consumed by > keepalived is just under 31 minutes or about 8 seconds a day. These are > much punier CPU's too. The proposed additional CPU usage is another drop > in the ocean. > > Given Lenovo sold the *same* configuration with x3650's and SR650's the > "need all the CPU grunt" is somewhat fishy. Between the bid being > submitted and actual tender award the SR650's came out and we paid a bit > extra to uplift to the newer server hardware with exactly the same disk > configuration. I believe IBM have done the same with the ESS/GNR servers > too over time the same applies there too. > > IMHO given keepalived is a base RHEL package, IBM/Lenovo should be > offering running Infiniband to Ethernet gateways on the DSS/ESS nodes as > a supported configuration for mixed network technology clusters :-) > > Running a couple extra servers for this purpose is obnoxious from an > environmental standpoint. That's IBM's green credentials out the window > if you ask me. > > I would note under those rules running a Nagios, Zabbix etc. client on > the nodes is not permitted either. I would suggest that most sites would > be rather unhappy about that :-) > > >> I don't know if Lenovo have different point if view. >> > > Problem is when I ring up for support on my DSS-G I speak to an IBM > employee not a Lenovo one :-) > > > JAB. > > -- > Jonathan A. Buzzard Tel: +44141-5483420 > HPC System Administrator, ARCHIE-WeSt. > University of Strathclyde, John Anderson Building, Glasgow. G4 0NG > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: From luis.bolinches at fi.ibm.com Sat Oct 3 12:19:36 2020 From: luis.bolinches at fi.ibm.com (Luis Bolinches) Date: Sat, 3 Oct 2020 11:19:36 +0000 Subject: [gpfsug-discuss] Services on DSS/ESS nodes In-Reply-To: Message-ID: Are you mixing those ESS DSS in the same cluster? Or you are only running DSS https://www.ibm.com/support/knowledgecenter/SSYSP8/gnrfaq.html?view=kc#supportqs__building Mixing DSS and ESS in the same cluster is not a supported configuration. You really need to talk with Lenovo as is your vendor. The fact that in your region your support is being given by an IBMer or not is not a relevant point. High enough in the chain always will end at IBM on any region as GNR is IBM tech for 17 years (yes 17) so if weird enough even on regions where Lenovo might do even third level it might end on development and/or research. But that is a Lenovo/IBM agreement not you and IBM. So please get the support statement from Lenovo about this and pls share it if you want/can so we all learn their position. Thanks. -- Cheers > On 3. Oct 2020, at 13.55, Andrew Beattie wrote: > > ? > Why do you need to run any kind of monitoring client on an IO server the GUI / performance monitor already does all of that work for you and collects the data on the dedicated EMS server. > > If you have a small storage environment the. Yes the processor and memory may feel like overkill, but tuned appropriately an IO server will use all the memory you can give it to drive IO performance, > > If you want to run a hybrid / non standard architecture then the IBM ESS / DGSS platform may not be the right platform in comparison to a build your own architecture, how ever you then take all the support issues onto your self rather than it being the vendors problem. > > Sent from my iPhone > > > On 3 Oct 2020, at 20:06, Jonathan Buzzard wrote: > > > > On 02/10/2020 23:19, Andrew Beattie wrote: > >> Jonathan, > >> I suggest you get a formal statement from Lenovo as the DSS-G Platform > >> is no longer an IBM platform. > >> > >> But for ESS based platforms the answer would be, it is not supported to > >> run anything on the IO Servers other than GNR and the relevant Scale > >> management services, due to the fact that if you lose an IO Server, or > >> if you in an extended maintenance window the Server needs to host all > >> the work that would be being performed by both IO servers. > >> > > > > In the past ~500 days the Infiniband to Ethernet gateway has shifted > > ~13GB of data, or about 25MB a day. Meanwhile in the last 470 days the > > DSS-G nodes have each shifted several PB. The proposed additional > > traffic is a drop in the ocean. > > > > On my actual routers which shift much more data (over 300TB externally) > > with an uptime of ~180 days at the moment the CPU time consumed by > > keepalived is just under 31 minutes or about 8 seconds a day. These are > > much punier CPU's too. The proposed additional CPU usage is another drop > > in the ocean. > > > > Given Lenovo sold the *same* configuration with x3650's and SR650's the > > "need all the CPU grunt" is somewhat fishy. Between the bid being > > submitted and actual tender award the SR650's came out and we paid a bit > > extra to uplift to the newer server hardware with exactly the same disk > > configuration. I believe IBM have done the same with the ESS/GNR servers > > too over time the same applies there too. > > > > IMHO given keepalived is a base RHEL package, IBM/Lenovo should be > > offering running Infiniband to Ethernet gateways on the DSS/ESS nodes as > > a supported configuration for mixed network technology clusters :-) > > > > Running a couple extra servers for this purpose is obnoxious from an > > environmental standpoint. That's IBM's green credentials out the window > > if you ask me. > > > > I would note under those rules running a Nagios, Zabbix etc. client on > > the nodes is not permitted either. I would suggest that most sites would > > be rather unhappy about that :-) > > > > > >> I don't know if Lenovo have different point if view. > >> > > > > Problem is when I ring up for support on my DSS-G I speak to an IBM > > employee not a Lenovo one :-) > > > > > > JAB. > > > > -- > > Jonathan A. Buzzard Tel: +44141-5483420 > > HPC System Administrator, ARCHIE-WeSt. > > University of Strathclyde, John Anderson Building, Glasgow. G4 0NG > > _______________________________________________ > > gpfsug-discuss mailing list > > gpfsug-discuss at spectrumscale.org > > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > Ellei edell? ole toisin mainittu: / Unless stated otherwise above: Oy IBM Finland Ab PL 265, 00101 Helsinki, Finland Business ID, Y-tunnus: 0195876-3 Registered in Finland -------------- next part -------------- An HTML attachment was scrubbed... URL: From jonathan.buzzard at strath.ac.uk Sat Oct 3 18:16:33 2020 From: jonathan.buzzard at strath.ac.uk (Jonathan Buzzard) Date: Sat, 3 Oct 2020 18:16:33 +0100 Subject: [gpfsug-discuss] Services on DSS/ESS nodes In-Reply-To: References: Message-ID: On 03/10/2020 11:55, Andrew Beattie wrote: > Why do you need to run any kind of monitoring client on an IO server the > GUI / performance monitor already does all of that work for you and > collects the data on the dedicated EMS server. Because any remotely sensible admin demands a single pane service monitoring system. If I have to look at A for everything but my DSS-G and B for my DSS-G that's an epic fail. I often feel there is a huge disjuncture between the people that develop systems and those that look after them; they think the world revolves around them. It is clear this is one of those cases. > > If you have a small storage environment the. Yes the processor and > memory may feel like overkill, but tuned appropriately an IO server will > use all the memory you can give it to drive IO performance, Right but the SR650's came with not only more CPU but more RAM than the x3650's. In which case why only 192GB of RAM? The SR650's can take much more than that. Why not 384GB of RAM :-) Right now we have a shade over 50GB of RAM being unused. Been way for like ever because we naughtily have a influx DB client setup for a Grafana dashboard. We also presumably naughtily have remote syslog to Splunk. > > If you want to run a hybrid / non standard architecture then the IBM ESS > / DGSS platform may not be the right platform in comparison to a build > your own architecture, how ever you then take all the support issues > onto your self rather than it being the vendors problem. > I don't see anything that says you can't have some clients ethernet connected and some Infiniband connected. That of course requires a gateway, and the most logical place to put it is on the ESS or DSS nodes IMHO. I will see what Lenovo has to say, but looks like the IBM position is decidedly let's burn the planet, who gives a dam. JAB. -- Jonathan A. Buzzard Tel: +44141-5483420 HPC System Administrator, ARCHIE-WeSt. University of Strathclyde, John Anderson Building, Glasgow. G4 0NG From jonathan.buzzard at strath.ac.uk Sat Oct 3 18:16:39 2020 From: jonathan.buzzard at strath.ac.uk (Jonathan Buzzard) Date: Sat, 3 Oct 2020 18:16:39 +0100 Subject: [gpfsug-discuss] Services on DSS/ESS nodes In-Reply-To: References: Message-ID: On 03/10/2020 12:19, Luis Bolinches wrote: > Are you mixing those ESS DSS in the same cluster? Or you are only > running DSS > Only running DSS. We are too far down the rabbit hole to ever switch to ESS now. > > Mixing DSS and ESS in the same cluster is not a supported configuration. > I know, it means you can never ever migrate your storage from DSS to ESS without a full backup and restore. Who with any significant amount of storage is going to want to do that? The logic behind this escapes me, or perhaps in that scenario IBM might relax the rules for the migration period. > You really need to talk with Lenovo as is your vendor. The fact that in > your region your support is being given by an IBMer or not is not a > relevant point. High enough in the chain always will end at IBM on any > region as GNR is IBM tech for 17 years (yes 17) so if weird enough even > on regions where Lenovo might do even third level it might end on > development and/or research. But that is a Lenovo/IBM agreement not you > and IBM. > > So please get the support statement from Lenovo about this and pls share > it if you want/can so we all learn their position. > Will attempt that, though I do think it should be a supported config out the box :-) JAB. -- Jonathan A. Buzzard Tel: +44141-5483420 HPC System Administrator, ARCHIE-WeSt. University of Strathclyde, John Anderson Building, Glasgow. G4 0NG From luis.bolinches at fi.ibm.com Sun Oct 4 10:29:34 2020 From: luis.bolinches at fi.ibm.com (Luis Bolinches) Date: Sun, 4 Oct 2020 09:29:34 +0000 Subject: [gpfsug-discuss] Services on DSS/ESS nodes In-Reply-To: Message-ID: Hi As stated on the same link you can do remote mounts from each other and be a supported setup. ? You can use the remote mount feature of IBM Spectrum Scale to share file system data across clusters.? -- Cheers > On 3. Oct 2020, at 20.16, Jonathan Buzzard wrote: > > ?On 03/10/2020 12:19, Luis Bolinches wrote: >> Are you mixing those ESS DSS in the same cluster? Or you are only >> running DSS > > Only running DSS. We are too far down the rabbit hole to ever switch to > ESS now. > >> Mixing DSS and ESS in the same cluster is not a supported configuration. > > I know, it means you can never ever migrate your storage from DSS to ESS > without a full backup and restore. Who with any significant amount of > storage is going to want to do that? The logic behind this escapes me, > or perhaps in that scenario IBM might relax the rules for the migration > period. > > >> You really need to talk with Lenovo as is your vendor. The fact that in >> your region your support is being given by an IBMer or not is not a >> relevant point. High enough in the chain always will end at IBM on any >> region as GNR is IBM tech for 17 years (yes 17) so if weird enough even >> on regions where Lenovo might do even third level it might end on >> development and/or research. But that is a Lenovo/IBM agreement not you >> and IBM. >> So please get the support statement from Lenovo about this and pls share >> it if you want/can so we all learn their position. > > Will attempt that, though I do think it should be a supported config out > the box :-) > > > JAB. > > -- > Jonathan A. Buzzard Tel: +44141-5483420 > HPC System Administrator, ARCHIE-WeSt. > University of Strathclyde, John Anderson Building, Glasgow. G4 0NG > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss Ellei edell? ole toisin mainittu: / Unless stated otherwise above: Oy IBM Finland Ab PL 265, 00101 Helsinki, Finland Business ID, Y-tunnus: 0195876-3 Registered in Finland -------------- next part -------------- An HTML attachment was scrubbed... URL: From jonathan.buzzard at strath.ac.uk Sun Oct 4 11:17:30 2020 From: jonathan.buzzard at strath.ac.uk (Jonathan Buzzard) Date: Sun, 4 Oct 2020 11:17:30 +0100 Subject: [gpfsug-discuss] Services on DSS/ESS nodes In-Reply-To: References: Message-ID: <7ef58f7d-1c70-97d7-100d-395c403d6199@strath.ac.uk> On 04/10/2020 10:29, Luis Bolinches wrote: > Hi > > As stated on the same link you can do remote mounts from each other and > be a supported setup. > > ??You can use the remote mount feature of IBM Spectrum Scale to share > file system data across clusters.? > You can, but imagine I have a DSS-G cluster, with 2PB of storage on it which is quite modest in 2020. It is now end of life and for whatever reason I decide I want to move to ESS instead. What any sane storage admin want to do at this stage is set the ESS, add the ESS nodes to the existing cluster on the DSS-G then do a bit of mmadddisk/mmdeldisk and sit back while the data is seemlessly moved from the DSS-G to the ESS. Admittedly this might take a while :-) Then once all the data is moved a bit of mmdelnode and bingo the storage has been migrated from DSS-G to ESS with zero downtime. As that is not allowed for what I presume are commercial reasons (you could do it in reverse and presumable that is what IBM don't want) then once you are down the rabbit hole of one type of storage the you are not going to switch to a different one. You need to look at it from the perspective of the users. They frankly could not give a monkeys what storage solution you are using. All they care about is having usable storage and large amounts of downtime to switch from one storage type to another is not really acceptable. JAB. -- Jonathan A. Buzzard Tel: +44141-5483420 HPC System Administrator, ARCHIE-WeSt. University of Strathclyde, John Anderson Building, Glasgow. G4 0NG From olaf.weiser at de.ibm.com Mon Oct 5 07:19:40 2020 From: olaf.weiser at de.ibm.com (Olaf Weiser) Date: Mon, 5 Oct 2020 06:19:40 +0000 Subject: [gpfsug-discuss] Services on DSS/ESS nodes In-Reply-To: <7ef58f7d-1c70-97d7-100d-395c403d6199@strath.ac.uk> References: <7ef58f7d-1c70-97d7-100d-395c403d6199@strath.ac.uk>, Message-ID: An HTML attachment was scrubbed... URL: From jordi.caubet at es.ibm.com Mon Oct 5 07:27:39 2020 From: jordi.caubet at es.ibm.com (Jordi Caubet Serrabou) Date: Mon, 5 Oct 2020 06:27:39 +0000 Subject: [gpfsug-discuss] Services on DSS/ESS nodes In-Reply-To: Message-ID: ?Coming to the routing point, is there any reason why you need it ? I mean, this is because GPFS trying to connect between compute nodes or a reason outside GPFS scope ? If the reason is GPFS, imho best approach - without knowledge of the licensing you have - would be to use separate clusters: a storage cluster and two compute clusters. Both compute clusters join using multicluster setup the storage cluster. There is no need both compute clusters see each other, they only need to see the storage cluster. One of the clusters using the 10G, the other cluster using the IPoIB interface. You need at least three quorum nodes in each compute cluster but if licensing is per drive on the DSS, it is covered. -- Jordi Caubet Serrabou IBM Software Defined Infrastructure (SDI) and Flash Technical Sales Specialist Technical Computing and HPC IT Specialist and Architect Ext. Phone: (+34) 679.79.17.84 (internal 55834) E-mail: jordi.caubet at es.ibm.com > On 5 Oct 2020, at 08:19, Olaf Weiser wrote: > > ? > let me add a few comments from some very successful large installations in Eruope > > # InterOP > Even though (as Luis pointed to) , there is no support statement to run intermix DSS/ESS in general, it was ~, and is, and will be, ~ allowed for short term purposes, such as e.g migration. > The reason to not support those DSS/ESS mixed configuration in general is simply driven by the fact, that different release version of DSS/ESS potentially (not in every release, but sometimes) comes with different driver levels, (e.g. MOFED), OS, RDMA-settings, GPFS tuning, etc... > Those changes can have an impact/multiple impacts and therefore, we do not support that in general. Of course -and this would be the advice for every one - if you are faced the need to run a mixed configuration for e.g. a migration and/or e.g. cause of you need to temporary provide space etc... contact you IBM representative and settle to plan that accordingly.. > There will be (likely) some additional requirements/dependencies defined like driver versions, OS, and/or Scale versions, but you'll get a chance to run mixed configuration - temporary limited to your specific scenario. > > # Monitoring > No doubt, monitoring is essential and absolutely needed. - and/but - IBM wants customers to be very sensitive, what kind of additional software (=workload) gets installed on the ESS-IO servers. BTW, this rule applies as well to any other important GPFS node with special roles (e.g. any other NSD server etc) > But given the fact, that customer's usually manage and monitor their server farms from a central point of control (any 3rd party software), it is common/ best practice , that additionally monitor software(clients/endpoints) has to run on GPFS nodes, so as on ESS nodes too. > > If that way of acceptance applies for DSS too, you may want to double check with Lenovo ?! > > > #additionally GW functions > It would be a hot iron, to general allow routing on IO nodes. Similar to the mixed support approach, the field variety for such a statement would be hard(==impossible) to manage. As we all agree, additional network traffic can (and in fact will) impact GPFS. > In your special case, the expected data rates seems to me more than ok and acceptable to go with your suggested config (as long workloads remain on that level / monitor it accordingly as you are already obviously doing) > Again,to be on the safe side.. contact your IBM representative and I'm sure you 'll find a way.. > > > > kind regards.... > olaf > > > ----- Original message ----- > From: Jonathan Buzzard > Sent by: gpfsug-discuss-bounces at spectrumscale.org > To: gpfsug-discuss at spectrumscale.org > Cc: > Subject: [EXTERNAL] Re: [gpfsug-discuss] Services on DSS/ESS nodes > Date: Sun, Oct 4, 2020 12:17 PM > > On 04/10/2020 10:29, Luis Bolinches wrote: > > Hi > > > > As stated on the same link you can do remote mounts from each other and > > be a supported setup. > > > > ? You can use the remote mount feature of IBM Spectrum Scale to share > > file system data across clusters.? > > > > You can, but imagine I have a DSS-G cluster, with 2PB of storage on it > which is quite modest in 2020. It is now end of life and for whatever > reason I decide I want to move to ESS instead. > > What any sane storage admin want to do at this stage is set the ESS, add > the ESS nodes to the existing cluster on the DSS-G then do a bit of > mmadddisk/mmdeldisk and sit back while the data is seemlessly moved from > the DSS-G to the ESS. Admittedly this might take a while :-) > > Then once all the data is moved a bit of mmdelnode and bingo the storage > has been migrated from DSS-G to ESS with zero downtime. > > As that is not allowed for what I presume are commercial reasons (you > could do it in reverse and presumable that is what IBM don't want) then > once you are down the rabbit hole of one type of storage the you are not > going to switch to a different one. > > You need to look at it from the perspective of the users. They frankly > could not give a monkeys what storage solution you are using. All they > care about is having usable storage and large amounts of downtime to > switch from one storage type to another is not really acceptable. > > > JAB. > > -- > Jonathan A. Buzzard Tel: +44141-5483420 > HPC System Administrator, ARCHIE-WeSt. > University of Strathclyde, John Anderson Building, Glasgow. G4 0NG > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > Salvo indicado de otro modo m?s arriba / Unless stated otherwise above: International Business Machines, S.A. Santa Hortensia, 26-28, 28002 Madrid Registro Mercantil de Madrid; Folio 1; Tomo 1525; Hoja M-28146 CIF A28-010791 -------------- next part -------------- An HTML attachment was scrubbed... URL: From S.J.Thompson at bham.ac.uk Mon Oct 5 09:40:56 2020 From: S.J.Thompson at bham.ac.uk (Simon Thompson) Date: Mon, 5 Oct 2020 08:40:56 +0000 Subject: [gpfsug-discuss] Services on DSS/ESS nodes In-Reply-To: <38bbcd24-4cc2-2abe-754f-3017d71c1eaa@strath.ac.uk> References: <38bbcd24-4cc2-2abe-754f-3017d71c1eaa@strath.ac.uk> Message-ID: <73AEC0BB-C67B-4863-A44F-6D32FDA56FEB@bham.ac.uk> > I now need to check IBM are not going to throw a wobbler down the line > if I need to get support before deploying it to the DSS-G nodes :-) I know there were a lot of other emails about this ... I think you maybe want to be careful doing this. Whilst it might work when you setup the DSS-G like this, remember that the memory usage you are seeing at this point in time may not be what you always need. For example if you fail-over the recovery groups, you need to have enough free memory to handle this. E.g. a node failure, or more likely you are upgrading the building blocks. Personally I wouldn't run other things like this on my DSS-G storage nodes. We do run e.g. nrpe monitoring to collect and report faults, but this is pretty lightweight compared to everything else. They even removed support for running the gui packages on the IO nodes - the early DSS-G builds used the IO nodes for this, but now you need separate systems for this. Simon From jonathan.buzzard at strath.ac.uk Mon Oct 5 12:44:48 2020 From: jonathan.buzzard at strath.ac.uk (Jonathan Buzzard) Date: Mon, 5 Oct 2020 12:44:48 +0100 Subject: [gpfsug-discuss] Services on DSS/ESS nodes In-Reply-To: <73AEC0BB-C67B-4863-A44F-6D32FDA56FEB@bham.ac.uk> References: <38bbcd24-4cc2-2abe-754f-3017d71c1eaa@strath.ac.uk> <73AEC0BB-C67B-4863-A44F-6D32FDA56FEB@bham.ac.uk> Message-ID: <905a0bdb-b6a1-90e4-bf57-ed8edae6fb7c@strath.ac.uk> On 05/10/2020 07:27, Jordi Caubet Serrabou wrote: > ?Coming to the routing point, is there any reason why you need it ? I > mean, this is because GPFS trying to connect between compute nodes or > a reason outside GPFS scope ? > If the reason is GPFS, imho best approach - without knowledge of the > licensing you have - would be to use separate clusters: a storage > cluster and two compute clusters. The issue is that individual nodes want to talk to one another on the data interface. Which caught me by surprise as the cluster is set to admin mode central. The admin interface runs over ethernet for all nodes on a specific VLAN which which is given 802.1p priority 5 (that's Voice, < 10 ms latency and jitter). That saved a bunch of switching and cabling as you don't need the extra interface for the admin traffic. The cabling already significantly restricts airflow for a compute rack as it is, without adding a whole bunch more for a barely used admin interface. It's like the people who wrote the best practice about separate interface for the admin traffic know very little about networking to be frankly honest. This is all last century technology. The nodes for undergraduate teaching only have a couple of 1Gb ethernet ports which would suck for storage usage. However they also have QDR Infiniband. That is because even though undergraduates can't run multinode jobs, on the old cluster the Lustre storage was delivered over Infiniband, so they got Infiniband cards. > Both compute clusters join using multicluster setup the storage > cluster. There is no need both compute clusters see each other, they > only need to see the storage cluster. One of the clusters using the > 10G, the other cluster using the IPoIB interface. > You need at least three quorum nodes in each compute cluster but if > licensing is per drive on the DSS, it is covered. Three clusters is starting to get complicated from an admin perspective. The biggest issue is coordinating maintenance and keep sufficient quorum nodes up. Maintenance on compute nodes is done via the job scheduler. I know some people think this is crazy, but it is in reality extremely elegant. We can schedule a reboot on a node as soon as the current job has finished (usually used for firmware upgrades). Or we can schedule a job to run as root (usually for applying updates) as soon as the current job has finished. As such we have no way of knowing when that will be for a given node, and there is a potential for all three quorum nodes to be down at once. Using this scheme we can seamlessly upgrade the nodes safe in the knowledge that a node is either busy and it's running on the current configuration or it has been upgraded and is running the new configuration. Consequently multinode jobs are guaranteed to have all nodes in the job running on the same configuration. The alternative is to drain the node, but there is only a 23% chance the node will become available during working hours leading to a significant loss of compute time when doing maintenance compared to our existing scheme where the loss of compute time is only as long as the upgrade takes to install. Pretty much the only time we have idle nodes is when the scheduler is reserving nodes ready to schedule a multi node job. Right now we have a single cluster with the quorum nodes being the two DSS-G nodes and the node used for backup. It is easy to ensure that quorum is maintained on these, they also all run real RHEL, where as the compute nodes run CentOS. JAB. -- Jonathan A. Buzzard Tel: +44141-5483420 HPC System Administrator, ARCHIE-WeSt. University of Strathclyde, John Anderson Building, Glasgow. G4 0NG From carlz at us.ibm.com Mon Oct 5 13:09:02 2020 From: carlz at us.ibm.com (Carl Zetie - carlz@us.ibm.com) Date: Mon, 5 Oct 2020 12:09:02 +0000 Subject: [gpfsug-discuss] Services on DSS/ESS nodes Message-ID: <714B599F-D06D-4D03-98F3-A2BF6F7360DB@us.ibm.com> Jordi wrote: ?Both compute clusters join using multicluster setup the storage cluster. There is no need both compute clusters see each other, they only need to see the storage cluster. One of the clusters using the 10G, the other cluster using the IPoIB interface. You need at least three quorum nodes in each compute cluster but if licensing is per drive on the DSS, it is covered.? As a side note: One of the reasons we designed capacity (per Disk or per TB) licensing the way we did was specifically so that you could make this kind of architectural decision on its own merits, without worrying about a licensing penalty. Carl Zetie Program Director Offering Management Spectrum Scale ---- (919) 473 3318 ][ Research Triangle Park carlz at us.ibm.com [signature_1243111775] -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image001.png Type: image/png Size: 69558 bytes Desc: image001.png URL: From carlz at us.ibm.com Mon Oct 5 13:20:25 2020 From: carlz at us.ibm.com (Carl Zetie - carlz@us.ibm.com) Date: Mon, 5 Oct 2020 12:20:25 +0000 Subject: [gpfsug-discuss] Services on DSS/ESS nodes Message-ID: <288C3527-32BA-43E2-B5EF-E79CC5765424@us.ibm.com> >> Mixing DSS and ESS in the same cluster is not a supported configuration. > > I know, it means you can never ever migrate your storage from DSS to ESS > without a full backup and restore. Who with any significant amount of > storage is going to want to do that? The logic behind this escapes me, > or perhaps in that scenario IBM might relax the rules for the migration > period. > We do indeed relax the rules temporarily for a migration. The reasoning behind this rule is for support. Many Scale support issues - often the toughest ones - are not about a single node, but about the cluster or network as a whole. So if you have a mix of IBM systems with systems supported by an OEM (this applies to any OEM by the way, not just Lenovo) and a cluster-wide issue, who are you going to call. (Well, in practice you?re going to call IBM and we?ll do our best to help you despite limits on our knowledge of the OEM systems?). --CZ Carl Zetie Program Director Offering Management Spectrum Scale ---- (919) 473 3318 ][ Research Triangle Park carlz at us.ibm.com [signature_386371469] -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image001.png Type: image/png Size: 69558 bytes Desc: image001.png URL: From jonathan.buzzard at strath.ac.uk Mon Oct 5 14:39:12 2020 From: jonathan.buzzard at strath.ac.uk (Jonathan Buzzard) Date: Mon, 5 Oct 2020 14:39:12 +0100 Subject: [gpfsug-discuss] Services on DSS/ESS nodes In-Reply-To: <73AEC0BB-C67B-4863-A44F-6D32FDA56FEB@bham.ac.uk> References: <38bbcd24-4cc2-2abe-754f-3017d71c1eaa@strath.ac.uk> <73AEC0BB-C67B-4863-A44F-6D32FDA56FEB@bham.ac.uk> Message-ID: On 05/10/2020 09:40, Simon Thompson wrote: >> I now need to check IBM are not going to throw a wobbler down the >> line if I need to get support before deploying it to the DSS-G >> nodes :-) > > I know there were a lot of other emails about this ... > > I think you maybe want to be careful doing this. Whilst it might work > when you setup the DSS-G like this, remember that the memory usage > you are seeing at this point in time may not be what you always need. > For example if you fail-over the recovery groups, you need to have > enough free memory to handle this. E.g. a node failure, or more > likely you are upgrading the building blocks. I think there is a lack of understanding on exactly how light weight keepalived is. It's the same code as on my routers which are admittedly different CPU's (MIPS to be precise) but memory usage (taking out shared memory usage - libc for example is loaded anyway) is under 200KB. A bash shell uses more memory... > > Personally I wouldn't run other things like this on my DSS-G storage > nodes. We do run e.g. nrpe monitoring to collect and report faults, > but this is pretty lightweight compared to everything else. They even > removed support for running the gui packages on the IO nodes - the > early DSS-G builds used the IO nodes for this, but now you need > separate systems for this. > And keepalived is in the same range as nrpe, which you do run :-) I have seen nrpe get out of hand and consume significant amounts of resources on a machine; the machine was ground to halt due to nrpe. One of the standard plugins was failing and sitting their busy waiting. Every five minutes it ran again. It of course decided to wait till ~7pm on a Friday to go wonky. By mid morning on Saturday it was virtually unresponsive, several minutes to get a shell... I would note that you can run keepalived quite happily on an Ubiquiti EdgeRouter X which has a dual core 880 MHz MIPS CPU with 256MB of RAM. Mikrotik have models with similar specs that run it too. On a dual Xeon Gold 6142 machine the usage of RAM and CPU by keepalived is noise. JAB. -- Jonathan A. Buzzard Tel: +44141-5483420 HPC System Administrator, ARCHIE-WeSt. University of Strathclyde, John Anderson Building, Glasgow. G4 0NG From committee at io500.org Thu Oct 1 17:40:00 2020 From: committee at io500.org (committee at io500.org) Date: Thu, 01 Oct 2020 10:40:00 -0600 Subject: [gpfsug-discuss] IO500 SC20 Call for Submission Message-ID: <4a20ed6ae985a25c69d953e1ea633d62@io500.org> CALL FOR IO500 SUBMISSION Deadline: 30 October 2020 AoE Stabilization period: 1st October -- 9th October 2020 AoE The IO500 [1] is now accepting and encouraging submissions for the upcoming 7th IO500 list, to be revealed at the IO500 Virtual BOF during SC20. Once again, we are also accepting submissions to the 10 Node I/O Challenge to encourage submission of small scale results. The new ranked lists will be announced at our Virtual SC20 BoF. We hope to see you, and your results, there. A new change for the upcoming submission procedure is the introduction of a stabilization period that aims to harden the benchmark. The final benchmark is released at the end of this period. During the stabilization we encourage the community to test the proper execution of the benchmark and provide us with feedback. We will apply bug fixes to the code base and expect that results obtained will be valid as full submission. We also continue with another list for the Student Cluster Competition, since IO500 is used during this competition. Also new this year is that we have partnered with Anthony Kougkas' team at Illinois Institute of Technology to evaluate the submission metadata describing the storage system on which the test was run to improve the quality and usefulness of the data IO500 collects. You may be contacted by one of his students to clarify one or more of the metadata items from your submission(s). We would appreciate, but do not require, your cooperation to help improve the submission metadata quality. Results from their work will be fed back to improve our submission process for future lists. The IO500 benchmark suite is designed to be easy to run, and the community has multiple active support channels to help with any questions. Please submit results from your system, and we look forward to seeing many of you at SC20! Please note that submissions of all sizes are welcome, including multiple submissions from different storage systems/tiers at a single site. The website has customizable sorting so it is possible to submit on a small system and still get a very good per-client score, for example. Additionally, the list is about much more than just the raw rank; all submissions help the community by collecting and publishing a wider corpus of data. More details below. Following the success of the Top500 in collecting and analyzing historical trends in supercomputer technology and evolution, the IO500 [1] was created in 2017, published its first list at SC17, and has grown continuously since then. The need for such an initiative has long been known within High-Performance Computing; however, defining appropriate benchmarks had long been challenging. Despite this challenge, the community, after long and spirited discussion, finally reached consensus on a suite of benchmarks and a metric for resolving the scores into a single ranking. The multi-fold goals of the benchmark suite are as follows: * Maximizing simplicity in running the benchmark suite * Encouraging complexity in tuning for performance * Allowing submitters to highlight their "hero run" performance numbers * Forcing submitters to simultaneously report performance for challenging IO patterns. Specifically, the benchmark suite includes a hero-run of both IOR and mdtest configured however possible to maximize performance and establish an upper-bound for performance. It also includes an IOR and mdtest run with highly prescribed parameters in an attempt to determine a lower-bound on the performance. Finally, it includes a namespace search, as this has been determined to be a highly sought-after feature in HPC storage systems that have historically not been well-measured. Submitters are encouraged to share their tuning insights for publication. The goals of the community are also multi-fold: * Gather historical data for the sake of analysis and to aid predictions of storage futures * Collect tuning information to share valuable performance optimizations across the community * Encourage vendors and designers to optimize for workloads beyond "hero runs" * Establish bounded expectations for users, procurers, and administrators 10 NODE I/O CHALLENGE The 10 Node Challenge is conducted using the regular IO500 benchmark, however, with the rule that exactly 10 client nodes must be used to run the benchmark. You may use any shared storage with, e.g., any number of servers. When submitting for the IO500 list, you can opt-in for "Participate in the 10 compute node challenge only", then we will not include the results into the ranked list. Other 10-node node submissions will be included in the full list and in the ranked list. We will announce the result in a separate derived list and in the full list but not on the ranked IO500 list at https://io500.org/ [2] BIRDS-OF-A-FEATHER Once again, we encourage you to submit [1], to join our community, and to attend our virtual BoF "The IO500 and the Virtual Institute of I/O" at SC20, where we will announce the new IO500 list, the 10 node challenge list, and the Student Cluster Competition list. We look forward to answering any questions or concerns you might have. * [1] http://www.vi4io.org/io500/submission [3] Thanks, The IO500 Committee Links: ------ [1] http://io500.org/ [2] https://io500.org/ [3] http://www.vi4io.org/io500/submission -------------- next part -------------- An HTML attachment was scrubbed... URL: From valdis.kletnieks at vt.edu Wed Oct 7 00:45:46 2020 From: valdis.kletnieks at vt.edu (Valdis Kl=?utf-8?Q?=c4=93?=tnieks) Date: Tue, 06 Oct 2020 19:45:46 -0400 Subject: [gpfsug-discuss] Services on DSS/ESS nodes In-Reply-To: References: Message-ID: <138651.1602027946@turing-police> On Sat, 03 Oct 2020 10:55:05 -0000, "Andrew Beattie" said: > Why do you need to run any kind of monitoring client on an IO server the > GUI / performance monitor already does all of that work for you and > collects the data on the dedicated EMS server. Does *ALL* that work for me? Will it toss you an alert if your sshd goes away, or if somebody's tossing packets that iptables is blocking for good reasons, or any of the many other things that a competent sysadmin wants to be alerted on that aren't GPFS, but which are things that Nagios and Zabbix and similar tools were invented to track? -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 832 bytes Desc: not available URL: From S.J.Thompson at bham.ac.uk Wed Oct 7 11:28:55 2020 From: S.J.Thompson at bham.ac.uk (Simon Thompson) Date: Wed, 7 Oct 2020 10:28:55 +0000 Subject: [gpfsug-discuss] Services on DSS/ESS nodes In-Reply-To: <138651.1602027946@turing-police> References: <138651.1602027946@turing-police> Message-ID: Agreed ... Report to me a pdisk is failing in my monitoring dashboard we use for *everything else*. Tell me that kswapd is having one of those days. Tell me rsyslogd has stopped sending for some reason. Tell me if there are long waiters on the hosts. Read the ipmi status of the host to tell me an OS drive is failed, or the CMOS battery is flat or ... Whilst the GUI has a bunch of this stuff, in the real world the rest of us have reporting and dashboarding from many more systems... Simon ?On 07/10/2020, 00:45, "gpfsug-discuss-bounces at spectrumscale.org on behalf of Valdis Kl?tnieks" wrote: On Sat, 03 Oct 2020 10:55:05 -0000, "Andrew Beattie" said: > Why do you need to run any kind of monitoring client on an IO server the > GUI / performance monitor already does all of that work for you and > collects the data on the dedicated EMS server. Does *ALL* that work for me? Will it toss you an alert if your sshd goes away, or if somebody's tossing packets that iptables is blocking for good reasons, or any of the many other things that a competent sysadmin wants to be alerted on that aren't GPFS, but which are things that Nagios and Zabbix and similar tools were invented to track? From jonathan.buzzard at strath.ac.uk Wed Oct 7 13:14:45 2020 From: jonathan.buzzard at strath.ac.uk (Jonathan Buzzard) Date: Wed, 7 Oct 2020 13:14:45 +0100 Subject: [gpfsug-discuss] Services on DSS/ESS nodes In-Reply-To: References: <138651.1602027946@turing-police> Message-ID: On 07/10/2020 11:28, Simon Thompson wrote: > Agreed ... > > Report to me a pdisk is failing in my monitoring dashboard we use for *everything else*. > Tell me that kswapd is having one of those days. > Tell me rsyslogd has stopped sending for some reason. > Tell me if there are long waiters on the hosts. > Read the ipmi status of the host to tell me an OS drive is failed, or the CMOS battery is flat or ... > > Whilst the GUI has a bunch of this stuff, in the real world the rest of us have reporting and dashboarding from many more systems... > The problem is the developers know as much about looking after a system in the real world as a tea leaf knows the history of the East India Company. IMHO to even ask the question shows a total lack of understanding of the issue. Consequently developers in their ivory towers have a habit of developing things that are as useful as a chocolate tea pot. Which putting it bluntly a competent sysadmins makes them look like a bunch of twits. I would note this is not a problem unique to IBM, it's developers in general. The appropriate course of action would be not for IBM to develop a monitoring tool of their own but to provide a bunch of plugins for the popular monitoring tools that customers will already be using to monitor their whole IT estate. Heaven forbid they could even run a poll to find out which ones the actual customers of their products are interested in rather than wasting effort developing software their customers are not actually interested in. For my purposes there is I think an alternative. The actual routing of the IP packets is not a service, it's a kernel configuration to have the kernel route that packets :-) Keepalived just manages a floating IP address. There are other options to achieve this. They are clunkier but they side step IBM's silly rules. I would however note at this point that at lots of sites all routing in the data centre is done using BGP. It comes in part out of the zero trust paradigm. I guess apparently running fail2ban is not permitted either. Can I even run firewalld? As you can seen a nothing else policy quickly becomes unsustainable IMHO. There is a disjuncture between the developers in their ivory towers and the real world. JAB. -- Jonathan A. Buzzard Tel: +44141-5483420 HPC System Administrator, ARCHIE-WeSt. University of Strathclyde, John Anderson Building, Glasgow. G4 0NG From kkr at lbl.gov Tue Oct 13 22:34:23 2020 From: kkr at lbl.gov (Kristy Kallback-Rose) Date: Tue, 13 Oct 2020 14:34:23 -0700 Subject: [gpfsug-discuss] SC20 Planning - What questions would you ask a panel? Message-ID: Hi all, By now you know SC will be digital this year. We are working towards some SC events for the Spectrum Scale User Group, and using our usual slot of Sunday did not seem like a great idea. So, we?re planning a couple 90-minute sessions and would like to do a panel during one of them. We?ll hope to do live Q&A, like an in-person Ask Me Anything session, but it?s probably a good idea to have a bank of questions ready as well, Also, that way your question may get asked, even if you can?t make the live session ?we?ll record these sessions for later viewing. So, please send your questions for the panel and we?ll get a list built up. Better yet, attend the sessions live! Details to come, but for now, hold these time slots: November 16th - 8:00 AM Pacific/3:00 PM UTC November 18th - 8:00 AM Pacific/3:00 PM UTC Best, Kristy Kristy Kallback-Rose Senior HPC Storage Systems Analyst National Energy Research Scientific Computing Center Lawrence Berkeley National Laboratory -------------- next part -------------- An HTML attachment was scrubbed... URL: From juergen.hannappel at desy.de Wed Oct 21 17:13:01 2020 From: juergen.hannappel at desy.de (Hannappel, Juergen) Date: Wed, 21 Oct 2020 18:13:01 +0200 (CEST) Subject: [gpfsug-discuss] Mounting an nfs share on a CES node Message-ID: <1195503772.13156505.1603296781279.JavaMail.zimbra@desy.de> Hi, I have a CES node exporting some filesystems vis smb and ganesha in a standard CES setup. Now I want to mount a nfs share from a different, non-CES server on this CES node. This did not work: mount -o -fstype=nfs4,minorversion=1,rw,rsize=65536,wsize=65536 some.other.server:/some/path /mnt/ mount.nfs: mount to NFS server 'some.other.server:/some/path' failed: RPC Error: Program unavailable Does the CES software stack interfere with the nfs client setup? It seems that at least with rpc-statd there is some conflict: systemctl status rpc-statd ? rpc-statd.service - NFS status monitor for NFSv2/3 locking. Loaded: loaded (/usr/lib/systemd/system/rpc-statd.service; static; vendor preset: disabled) Active: failed (Result: exit-code) since Wed 2020-10-21 17:48:21 CEST; 22min ago Process: 19896 ExecStart=/usr/sbin/rpc.statd $STATDARGS (code=exited, status=1/FAILURE) Oct 21 17:48:21 mynode systemd[1]: Starting NFS status monitor for NFSv2/3 locking.... Oct 21 17:48:21 mynode rpc.statd[19896]: Statd service already running! Oct 21 17:48:21 mynode systemd[1]: rpc-statd.service: control process exited, code=exited status=1 Oct 21 17:48:21 mynode systemd[1]: Failed to start NFS status monitor for NFSv2/3 locking.. Oct 21 17:48:21 mynode systemd[1]: Unit rpc-statd.service entered failed state. Oct 21 17:48:21 mynode systemd[1]: rpc-statd.service failed. -- Dr. J?rgen Hannappel DESY/IT Tel. : +49 40 8998-4616 From mnaineni at in.ibm.com Thu Oct 22 04:38:59 2020 From: mnaineni at in.ibm.com (Malahal R Naineni) Date: Thu, 22 Oct 2020 03:38:59 +0000 Subject: [gpfsug-discuss] Mounting an nfs share on a CES node In-Reply-To: <1195503772.13156505.1603296781279.JavaMail.zimbra@desy.de> References: <1195503772.13156505.1603296781279.JavaMail.zimbra@desy.de> Message-ID: An HTML attachment was scrubbed... URL: From andi at christiansen.xxx Tue Oct 27 11:46:02 2020 From: andi at christiansen.xxx (Andi Christiansen) Date: Tue, 27 Oct 2020 12:46:02 +0100 (CET) Subject: [gpfsug-discuss] Alternative to Scale S3 API. Message-ID: <1109480230.484366.1603799162955@privateemail.com> Hi all, We have over a longer period used the S3 API within spectrum Scale.. And that has shown that it does not support very many applications because of limitations of the API.. Has anyone got any experience with any other product we can deploy on-top of Spectrum Scale that will give us a true S3 API with full functionalities and able to answer on port 443? As of now we use HAProxy to forware ssl request back and forth from Scale S3 API. We have looked at MinIO which seems to be fairly simple and maybe might solve a lot of incompatibilities with clients software. But the product seems to be very badly documented at least for me. The idea is basically that a client uses their backup application(rubrik, veeam etc.) to connect to a domain(for example backup.mycompany.com) with their access and secret key and have access to their bucket only. and it must be over https/ssl. If someone has any knowledge to minio or any other product that might solve our problem I will be glad to hear from you! ? Thank you in advance! -------------- next part -------------- An HTML attachment was scrubbed... URL: From NISHAAN at za.ibm.com Tue Oct 27 13:38:01 2020 From: NISHAAN at za.ibm.com (Nishaan Docrat) Date: Tue, 27 Oct 2020 15:38:01 +0200 Subject: [gpfsug-discuss] Alternative to Scale S3 API. In-Reply-To: <1109480230.484366.1603799162955@privateemail.com> References: <1109480230.484366.1603799162955@privateemail.com> Message-ID: Hi Andi The current S3 compatibility in Spectrum Scale is delivered via the Swift3 middleware. This middleware has since been replaced by s3api in later versions of Swift. Spectrum Scale 5.1 will make use of Swift Train release which will include the new s3api middleware. I've tested the S3 compatibility with a few applications including Spectrum Scale itself (i.e. Cloud Data Sharing to another Scale Object store using S3 API) and Spectrum Protect etc. and haven't had any issue. I've also ran a few application tools to test for an S3 compliant object stores and again had no issues. You can use s3compat to test the current compatibility.. Or you can check here for the current compatibility.. https://docs.openstack.org/swift/latest/s3_compat.html Not sure if there is any other way to talk HTTPS without using HAProxy. In any case, I've documented the process to setup an S3 compliant object store including supporting virtual-hosted style bucket addressing which you can find here.. https://www.linkedin.com/feed/update/urn:li:activity:6720227398756909056/ Most storage vendors including minio would not support the full S3 API stack as alot of the calls are specific to AWS (like billing stuff etc.). Anyway, good luck with your testing. Kind Regards Nishaan Docrat Client Technical Specialist - Storage Systems IBM Systems Hardware Work: +27 (0)11 302 5001 Mobile: +27 (0)81 040 3793 Email: nishaan at za.ibm.com From: Andi Christiansen To: "gpfsug-discuss at spectrumscale.org" Date: 2020/10/27 13:59 Subject: [EXTERNAL] [gpfsug-discuss] Alternative to Scale S3 API. Sent by: gpfsug-discuss-bounces at spectrumscale.org Hi all, We have over a longer period used the S3 API within spectrum Scale.. And that has shown that it does not support very many applications because of limitations of the API.. Has anyone got any experience with any other product we can deploy on-top of Spectrum Scale that will give us a true S3 API with full functionalities and able to answer on port 443? As of now we use HAProxy to forware ssl request back and forth from Scale S3 API. We have looked at MinIO which seems to be fairly simple and maybe might solve a lot of incompatibilities with clients software. But the product seems to be very badly documented at least for me. The idea is basically that a client uses their backup application(rubrik, veeam etc.) to connect to a domain(for example backup.mycompany.com) with their access and secret key and have access to their bucket only. and it must be over https/ssl. If someone has any knowledge to minio or any other product that might solve our problem I will be glad to hear from you! ? Thank you in advance! _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: 52733301.jpg Type: image/jpeg Size: 35643 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: From andi at christiansen.xxx Wed Oct 28 06:24:52 2020 From: andi at christiansen.xxx (Andi Christiansen) Date: Wed, 28 Oct 2020 07:24:52 +0100 (CET) Subject: [gpfsug-discuss] Alternative to Scale S3 API. In-Reply-To: References: <1109480230.484366.1603799162955@privateemail.com> Message-ID: <2126571944.509878.1603866292369@privateemail.com> Hi Nishaan, Thanks for you reply. When you say 5.1? is that 5.1.x.x or 5.0.5.1? Some of the limitations we have encountered is the multipart upload not being supported and some md5sum that the s3 api does that veeam actually dont like. also interms of the management on the Scale GUI, that has to be on one of the S3 CES nodes in order to be able to show project, container etc... but when you have a HAProxy for enabling SSL then a GUI is not available as they both use port 443? i know min.io is not the full stack of S3 API commands but as far as i can read it comes with more features out of the box than Scale S3 does, multipart for an example... I looked through your documentation and its very close to what we have set up today and found to not work... If multipart uploads would be supported today on scale S3 i would think about still using scale for the s3 part but as i expect that you talk about 5.1.x.x i dont see that being released any time soon? and dont know if that is actually going to be supported in that release then i cant wait for that to happen.. Best Regards Andi Christiansen > On 10/27/2020 2:38 PM Nishaan Docrat wrote: > > > > Hi Andi > > The current S3 compatibility in Spectrum Scale is delivered via the Swift3 middleware. This middleware has since been replaced by s3api in later versions of Swift. Spectrum Scale 5.1 will make use of Swift Train release which will include the new s3api middleware. > > I've tested the S3 compatibility with a few applications including Spectrum Scale itself (i.e. Cloud Data Sharing to another Scale Object store using S3 API) and Spectrum Protect etc. and haven't had any issue. I've also ran a few application tools to test for an S3 compliant object stores and again had no issues. > > You can use s3compat to test the current compatibility.. Or you can check here for the current compatibility.. https://docs.openstack.org/swift/latest/s3_compat.html https://docs.openstack.org/swift/latest/s3_compat.html > > Not sure if there is any other way to talk HTTPS without using HAProxy. > > In any case, I've documented the process to setup an S3 compliant object store including supporting virtual-hosted style bucket addressing which you can find here.. > > https://www.linkedin.com/feed/update/urn:li:activity:6720227398756909056/ https://www.linkedin.com/feed/update/urn:li:activity:6720227398756909056/ > > Most storage vendors including minio would not support the full S3 API stack as alot of the calls are specific to AWS (like billing stuff etc.). > > Anyway, good luck with your testing. > > Kind Regards > > Nishaan Docrat > Client Technical Specialist - Storage Systems > IBM Systems Hardware > > Work: +27 (0)11 302 5001 > Mobile: +27 (0)81 040 3793 > Email: nishaan at za.ibm.com http://www.ibm.com/storage > > > > [Inactive hide details for Andi Christiansen ---2020/10/27 13:59:30---Hi all, We have over a longer period used the S3 API withi]Andi Christiansen ---2020/10/27 13:59:30---Hi all, We have over a longer period used the S3 API within spectrum Scale.. And that has shown that > > From: Andi Christiansen > To: "gpfsug-discuss at spectrumscale.org" > Date: 2020/10/27 13:59 > Subject: [EXTERNAL] [gpfsug-discuss] Alternative to Scale S3 API. > Sent by: gpfsug-discuss-bounces at spectrumscale.org > > > --------------------------------------------- > > > > Hi all, > > > > We have over a longer period used the S3 API within spectrum Scale.. And that has shown that it does not support very many applications because of limitations of the API.. > > > > Has anyone got any experience with any other product we can deploy on-top of Spectrum Scale that will give us a true S3 API with full functionalities and able to answer on port 443? As of now we use HAProxy to forware ssl request back and forth from Scale S3 API. > > > > We have looked at MinIO which seems to be fairly simple and maybe might solve a lot of incompatibilities with clients software. But the product seems to be very badly documented at least for me. > > The idea is basically that a client uses their backup application(rubrik, veeam etc.) to connect to a domain(for example backup.mycompany.com) with their access and secret key and have access to their bucket only. and it must be over https/ssl. > > > > If someone has any knowledge to minio or any other product that might solve our problem I will be glad to hear from you! ? > > Thank you in advance! > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: 52733301.jpg Type: image/jpeg Size: 35643 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: From NISHAAN at za.ibm.com Wed Oct 28 06:45:29 2020 From: NISHAAN at za.ibm.com (Nishaan Docrat) Date: Wed, 28 Oct 2020 08:45:29 +0200 Subject: [gpfsug-discuss] Alternative to Scale S3 API. In-Reply-To: <2126571944.509878.1603866292369@privateemail.com> References: <1109480230.484366.1603799162955@privateemail.com> <2126571944.509878.1603866292369@privateemail.com> Message-ID: Hi Andi The s3api middleware does support multipart uploads.. https://docs.openstack.org/swift/latest/s3_compat.html The current version of Swift (PIKE) that is bundled with Spectrum Scale 5.0.X doesn't.. https://www.ibm.com/support/knowledgecenter/en/STXKQY_5.0.5/com.ibm.spectrum.scale.v5r05.doc/bl1adm_ManagingOpenStackACLsviaAmazonS3API.htm According to the Spectrum Scale Roadmap, 5.1 is due out 2H20.. Not sure if someone from development can confirm the GA date. Does Veeam have a test utility? You could always test it using the current Swift AIO or if you can provide me with a test utility I can test that for you. Kind Regards Nishaan Docrat Client Technical Specialist - Storage Systems IBM Systems Hardware Work: +27 (0)11 302 5001 Mobile: +27 (0)81 040 3793 Email: nishaan at za.ibm.com From: Andi Christiansen To: gpfsug main discussion list , Nishaan Docrat Date: 2020/10/28 08:24 Subject: [EXTERNAL] Re: [gpfsug-discuss] Alternative to Scale S3 API. Hi Nishaan, Thanks for you reply. When you say 5.1? is that 5.1.x.x or 5.0.5.1? Some... This Message Is From an External Sender This message came from outside your organization. Hi Nishaan, Thanks for you reply. When you say 5.1? is that 5.1.x.x or 5.0.5.1? Some of the limitations we have encountered is the multipart upload not being supported and some md5sum that the s3 api does that veeam actually dont like. also interms of the management on the Scale GUI, that has to be on one of the S3 CES nodes in order to be able to show project, container etc... but when you have a HAProxy for enabling SSL then a GUI is not available as they both use port 443? i know min.io is not the full stack of S3 API commands but as far as i can read it comes with more features out of the box than Scale S3 does, multipart for an example... I looked through your documentation and its very close to what we have set up today and found to not work... If multipart uploads would be supported today on scale S3 i would think about still using scale for the s3 part but as i expect that you talk about 5.1.x.x i dont see that being released any time soon? and dont know if that is actually going to be supported in that release then i cant wait for that to happen.. Best Regards Andi Christiansen On 10/27/2020 2:38 PM Nishaan Docrat wrote: Hi Andi The current S3 compatibility in Spectrum Scale is delivered via the Swift3 middleware. This middleware has since been replaced by s3api in later versions of Swift. Spectrum Scale 5.1 will make use of Swift Train release which will include the new s3api middleware. I've tested the S3 compatibility with a few applications including Spectrum Scale itself (i.e. Cloud Data Sharing to another Scale Object store using S3 API) and Spectrum Protect etc. and haven't had any issue. I've also ran a few application tools to test for an S3 compliant object stores and again had no issues. You can use s3compat to test the current compatibility.. Or you can check here for the current compatibility.. https://docs.openstack.org/swift/latest/s3_compat.html Not sure if there is any other way to talk HTTPS without using HAProxy. In any case, I've documented the process to setup an S3 compliant object store including supporting virtual-hosted style bucket addressing which you can find here.. https://www.linkedin.com/feed/update/urn:li:activity:6720227398756909056/ Most storage vendors including minio would not support the full S3 API stack as alot of the calls are specific to AWS (like billing stuff etc.). Anyway, good luck with your testing. Kind Regards Nishaan Docrat Client Technical Specialist - Storage Systems IBM Systems Hardware Work: +27 (0)11 302 5001 Mobile: +27 (0)81 040 3793 Email: nishaan at za.ibm.com Inactive hide details for Andi Christiansen ---2020/10/27 13:59:30---Hi all, We have over a longer period used the S3 API withi Andi Christiansen ---2020/10/27 13:59:30---Hi all, We have over a longer period used the S3 API within spectrum Scale.. And that has shown that From: Andi Christiansen To: "gpfsug-discuss at spectrumscale.org" Date: 2020/10/27 13:59 Subject: [EXTERNAL] [gpfsug-discuss] Alternative to Scale S3 API. Sent by: gpfsug-discuss-bounces at spectrumscale.org Hi all, We have over a longer period used the S3 API within spectrum Scale.. And that has shown that it does not support very many applications because of limitations of the API.. Has anyone got any experience with any other product we can deploy on-top of Spectrum Scale that will give us a true S3 API with full functionalities and able to answer on port 443? As of now we use HAProxy to forware ssl request back and forth from Scale S3 API. We have looked at MinIO which seems to be fairly simple and maybe might solve a lot of incompatibilities with clients software. But the product seems to be very badly documented at least for me. The idea is basically that a client uses their backup application (rubrik, veeam etc.) to connect to a domain(for example backup.mycompany.com) with their access and secret key and have access to their bucket only. and it must be over https/ssl. If someone has any knowledge to minio or any other product that might solve our problem I will be glad to hear from you! ? Thank you in advance! _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: 19991351.jpg Type: image/jpeg Size: 35643 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: From NISHAAN at za.ibm.com Wed Oct 28 07:12:55 2020 From: NISHAAN at za.ibm.com (Nishaan Docrat) Date: Wed, 28 Oct 2020 09:12:55 +0200 Subject: [gpfsug-discuss] Alternative to Scale S3 API. In-Reply-To: <2126571944.509878.1603866292369@privateemail.com> References: <1109480230.484366.1603799162955@privateemail.com> <2126571944.509878.1603866292369@privateemail.com> Message-ID: Hi Andi Sorry forgot to mention that I was told 5.1 will include the Swift Train release (2.23). The change from swift3 middleware to s3api was done in the Queens release (2.18) so 5.1 will definitely have multipart support. Kind Regards Nishaan Docrat Client Technical Specialist - Storage Systems IBM Systems Hardware Work: +27 (0)11 302 5001 Mobile: +27 (0)81 040 3793 Email: nishaan at za.ibm.com From: Andi Christiansen To: gpfsug main discussion list , Nishaan Docrat Date: 2020/10/28 08:24 Subject: [EXTERNAL] Re: [gpfsug-discuss] Alternative to Scale S3 API. Hi Nishaan, Thanks for you reply. When you say 5.1? is that 5.1.x.x or 5.0.5.1? Some... This Message Is From an External Sender This message came from outside your organization. Hi Nishaan, Thanks for you reply. When you say 5.1? is that 5.1.x.x or 5.0.5.1? Some of the limitations we have encountered is the multipart upload not being supported and some md5sum that the s3 api does that veeam actually dont like. also interms of the management on the Scale GUI, that has to be on one of the S3 CES nodes in order to be able to show project, container etc... but when you have a HAProxy for enabling SSL then a GUI is not available as they both use port 443? i know min.io is not the full stack of S3 API commands but as far as i can read it comes with more features out of the box than Scale S3 does, multipart for an example... I looked through your documentation and its very close to what we have set up today and found to not work... If multipart uploads would be supported today on scale S3 i would think about still using scale for the s3 part but as i expect that you talk about 5.1.x.x i dont see that being released any time soon? and dont know if that is actually going to be supported in that release then i cant wait for that to happen.. Best Regards Andi Christiansen On 10/27/2020 2:38 PM Nishaan Docrat wrote: Hi Andi The current S3 compatibility in Spectrum Scale is delivered via the Swift3 middleware. This middleware has since been replaced by s3api in later versions of Swift. Spectrum Scale 5.1 will make use of Swift Train release which will include the new s3api middleware. I've tested the S3 compatibility with a few applications including Spectrum Scale itself (i.e. Cloud Data Sharing to another Scale Object store using S3 API) and Spectrum Protect etc. and haven't had any issue. I've also ran a few application tools to test for an S3 compliant object stores and again had no issues. You can use s3compat to test the current compatibility.. Or you can check here for the current compatibility.. https://docs.openstack.org/swift/latest/s3_compat.html Not sure if there is any other way to talk HTTPS without using HAProxy. In any case, I've documented the process to setup an S3 compliant object store including supporting virtual-hosted style bucket addressing which you can find here.. https://www.linkedin.com/feed/update/urn:li:activity:6720227398756909056/ Most storage vendors including minio would not support the full S3 API stack as alot of the calls are specific to AWS (like billing stuff etc.). Anyway, good luck with your testing. Kind Regards Nishaan Docrat Client Technical Specialist - Storage Systems IBM Systems Hardware Work: +27 (0)11 302 5001 Mobile: +27 (0)81 040 3793 Email: nishaan at za.ibm.com Inactive hide details for Andi Christiansen ---2020/10/27 13:59:30---Hi all, We have over a longer period used the S3 API withi Andi Christiansen ---2020/10/27 13:59:30---Hi all, We have over a longer period used the S3 API within spectrum Scale.. And that has shown that From: Andi Christiansen To: "gpfsug-discuss at spectrumscale.org" Date: 2020/10/27 13:59 Subject: [EXTERNAL] [gpfsug-discuss] Alternative to Scale S3 API. Sent by: gpfsug-discuss-bounces at spectrumscale.org Hi all, We have over a longer period used the S3 API within spectrum Scale.. And that has shown that it does not support very many applications because of limitations of the API.. Has anyone got any experience with any other product we can deploy on-top of Spectrum Scale that will give us a true S3 API with full functionalities and able to answer on port 443? As of now we use HAProxy to forware ssl request back and forth from Scale S3 API. We have looked at MinIO which seems to be fairly simple and maybe might solve a lot of incompatibilities with clients software. But the product seems to be very badly documented at least for me. The idea is basically that a client uses their backup application (rubrik, veeam etc.) to connect to a domain(for example backup.mycompany.com) with their access and secret key and have access to their bucket only. and it must be over https/ssl. If someone has any knowledge to minio or any other product that might solve our problem I will be glad to hear from you! ? Thank you in advance! _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: 17810834.jpg Type: image/jpeg Size: 35643 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: From luis.bolinches at fi.ibm.com Wed Oct 28 07:15:21 2020 From: luis.bolinches at fi.ibm.com (Luis Bolinches) Date: Wed, 28 Oct 2020 07:15:21 +0000 Subject: [gpfsug-discuss] Alternative to Scale S3 API. In-Reply-To: Message-ID: An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: Image.1__=CDBB0C9CDFB04CE98f9e8a93df938690918cCDB at .jpg Type: image/jpeg Size: 35643 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: Image.2__=CDBB0C9CDFB04CE98f9e8a93df938690918cCDB at .gif Type: image/gif Size: 105 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: Image.1__=CDBB0C9CDFB04CE98f9e8a93df938690918cCDB at .jpg Type: image/jpeg Size: 35643 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: Image.2__=CDBB0C9CDFB04CE98f9e8a93df938690918cCDB at .gif Type: image/gif Size: 105 bytes Desc: not available URL: From NISHAAN at za.ibm.com Wed Oct 28 07:45:45 2020 From: NISHAAN at za.ibm.com (Nishaan Docrat) Date: Wed, 28 Oct 2020 09:45:45 +0200 Subject: [gpfsug-discuss] Alternative to Scale S3 API. In-Reply-To: References: Message-ID: Hi Luis Thanks for your reply.. It should address Andi's issue as the underlying Swift version is what is important and the functionality he needs is in the latest releases (I was told 5.1 includes Swift Train which is the latest version). Am sure there is a beta program for Spectrum Scale.. Perhaps Andi should speak to his software sales rep and ask to be included on it to get access so that he can test. Kind Regards Nishaan Docrat Client Technical Specialist - Storage Systems IBM Systems Hardware Work: +27 (0)11 302 5001 Mobile: +27 (0)81 040 3793 Email: nishaan at za.ibm.com From: "Luis Bolinches" To: gpfsug-discuss at spectrumscale.org Cc: gpfsug-discuss at spectrumscale.org Date: 2020/10/28 09:29 Subject: [EXTERNAL] Re: [gpfsug-discuss] Alternative to Scale S3 API. Sent by: gpfsug-discuss-bounces at spectrumscale.org Hi 5.1.x is going GA very soon (TM). Would it address the issues Andi sees on his environment or not I cannot say. I can take it with Andi for more details on the GA date -- Yst?v?llisin terveisin / Kind regards / Saludos cordiales / Salutations / Salutacions Luis Bolinches Consultant IT Specialist IBM Spectrum Scale development Mobile Phone: +358503112585 https://www.youracclaim.com/user/luis-bolinches Ab IBM Finland Oy Laajalahdentie 23 00330 Helsinki Uusimaa - Finland "If you always give you will always have" -- Anonymous ----- Original message ----- From: "Nishaan Docrat" Sent by: gpfsug-discuss-bounces at spectrumscale.org To: Andi Christiansen Cc: gpfsug main discussion list Subject: [EXTERNAL] Re: [gpfsug-discuss] Alternative to Scale S3 API. Date: Wed, Oct 28, 2020 08:47 Hi Andi The s3api middleware does support multipart uploads.. https://docs.openstack.org/swift/latest/s3_compat.html The current version of Swift (PIKE) that is bundled with Spectrum Scale 5.0.X doesn't.. https://www.ibm.com/support/knowledgecenter/en/STXKQY_5.0.5/com.ibm.spectrum.scale.v5r05.doc/bl1adm_ManagingOpenStackACLsviaAmazonS3API.htm According to the Spectrum Scale Roadmap, 5.1 is due out 2H20.. Not sure if someone from development can confirm the GA date. Does Veeam have a test utility? You could always test it using the current Swift AIO or if you can provide me with a test utility I can test that for you. Kind Regards Nishaan Docrat Client Technical Specialist - Storage Systems IBM Systems Hardware Work: +27 (0)11 302 5001 Mobile: +27 (0)81 040 3793 Email: nishaan at za.ibm.com Inactive hide details for Andi Christiansen ---2020/10/28 08:24:55---Hi Nishaan, Thanks for you reply.Andi Christiansen ---2020/10/28 08:24:55---Hi Nishaan, Thanks for you reply. From: Andi Christiansen To: gpfsug main discussion list , Nishaan Docrat Date: 2020/10/28 08:24 Subject: [EXTERNAL] Re: [gpfsug-discuss] Alternative to Scale S3 API. Hi Nishaan, Thanks for you reply. When you say 5.1? is that 5.1.x.x or 5.0.5.1? Some... This Message Is From an External Sender This message came from outside your organization. Hi Nishaan, Thanks for you reply. When you say 5.1? is that 5.1.x.x or 5.0.5.1? Some of the limitations we have encountered is the multipart upload not being supported and some md5sum that the s3 api does that veeam actually dont like. also interms of the management on the Scale GUI, that has to be on one of the S3 CES nodes in order to be able to show project, container etc... but when you have a HAProxy for enabling SSL then a GUI is not available as they both use port 443? i know min.io is not the full stack of S3 API commands but as far as i can read it comes with more features out of the box than Scale S3 does, multipart for an example... I looked through your documentation and its very close to what we have set up today and found to not work... If multipart uploads would be supported today on scale S3 i would think about still using scale for the s3 part but as i expect that you talk about 5.1.x.x i dont see that being released any time soon? and dont know if that is actually going to be supported in that release then i cant wait for that to happen.. Best Regards Andi Christiansen On 10/27/2020 2:38 PM Nishaan Docrat wrote: Hi Andi The current S3 compatibility in Spectrum Scale is delivered via the Swift3 middleware. This middleware has since been replaced by s3api in later versions of Swift. Spectrum Scale 5.1 will make use of Swift Train release which will include the new s3api middleware. I've tested the S3 compatibility with a few applications including Spectrum Scale itself (i.e. Cloud Data Sharing to another Scale Object store using S3 API) and Spectrum Protect etc. and haven't had any issue. I've also ran a few application tools to test for an S3 compliant object stores and again had no issues. You can use s3compat to test the current compatibility.. Or you can check here for the current compatibility.. https://docs.openstack.org/swift/latest/s3_compat.html Not sure if there is any other way to talk HTTPS without using HAProxy. In any case, I've documented the process to setup an S3 compliant object store including supporting virtual-hosted style bucket addressing which you can find here.. https://www.linkedin.com/feed/update/urn:li:activity:6720227398756909056/ Most storage vendors including minio would not support the full S3 API stack as alot of the calls are specific to AWS (like billing stuff etc.). Anyway, good luck with your testing. Kind Regards Nishaan Docrat Client Technical Specialist - Storage Systems IBM Systems Hardware Work: +27 (0)11 302 5001 Mobile: +27 (0)81 040 3793 Email: nishaan at za.ibm.com Inactive hide details for Andi Christiansen ---2020/10/27 13:59:30---Hi all, We have over a longer period used the S3 API withiAndi Christiansen ---2020/10/27 13:59:30---Hi all, We have over a longer period used the S3 API within spectrum Scale.. And that has shown that From: Andi Christiansen To: "gpfsug-discuss at spectrumscale.org" Date: 2020/10/27 13:59 Subject: [EXTERNAL] [gpfsug-discuss] Alternative to Scale S3 API. Sent by: gpfsug-discuss-bounces at spectrumscale.org Hi all, We have over a longer period used the S3 API within spectrum Scale.. And that has shown that it does not support very many applications because of limitations of the API.. Has anyone got any experience with any other product we can deploy on-top of Spectrum Scale that will give us a true S3 API with full functionalities and able to answer on port 443? As of now we use HAProxy to forware ssl request back and forth from Scale S3 API. We have looked at MinIO which seems to be fairly simple and maybe might solve a lot of incompatibilities with clients software. But the product seems to be very badly documented at least for me. The idea is basically that a client uses their backup application(rubrik, veeam etc.) to connect to a domain(for example backup.mycompany.com) with their access and secret key and have access to their bucket only. and it must be over https/ssl. If someone has any knowledge to minio or any other product that might solve our problem I will be glad to hear from you! ? Thank you in advance! _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss Ellei edell? ole toisin mainittu: / Unless stated otherwise above: Oy IBM Finland Ab PL 265, 00101 Helsinki, Finland Business ID, Y-tunnus: 0195876-3 Registered in Finland _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: 16781831.jpg Type: image/jpeg Size: 35643 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: From luis.bolinches at fi.ibm.com Wed Oct 28 07:51:30 2020 From: luis.bolinches at fi.ibm.com (Luis Bolinches) Date: Wed, 28 Oct 2020 07:51:30 +0000 Subject: [gpfsug-discuss] Alternative to Scale S3 API. In-Reply-To: References: , Message-ID: An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: Image.1__=CDBB0C9CDFB9D9C48f9e8a93df938690918cCDB at .jpg Type: image/jpeg Size: 35643 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: Image.2__=CDBB0C9CDFB9D9C48f9e8a93df938690918cCDB at .gif Type: image/gif Size: 105 bytes Desc: not available URL: From Robert.Oesterlin at nuance.com Thu Oct 29 11:16:13 2020 From: Robert.Oesterlin at nuance.com (Oesterlin, Robert) Date: Thu, 29 Oct 2020 11:16:13 +0000 Subject: [gpfsug-discuss] SSUG Digital Expert Talk: 11/4 - AI workloads on NVIDIA DGX, Red Hat OpenShift and IBM Spectrum Scale Message-ID: <77EA43ED-C430-42CA-872E-D2307F244775@nuance.com> Reminder for our upcoming expert talk: SSUG::Digital: Scalable multi-node training for AI workloads on NVIDIA DGX, Red Hat OpenShift and IBM Spectrum Scale November 4 @ 16:15 - 17:45 GMT Nvidia and IBM did a complex proof-of-concept to demonstrate the scaling of AI workload using Nvidia DGX, Red Hat OpenShift and IBM Spectrum Scale at the example of ResNet-50 and the segmentation of images using the Audi A2D2 dataset. The project team published an IBM Redpaper with all the technical details and will present the key learnings and results. Registration link for Webex session: https://www.spectrumscaleug.org/event/ssugdigital-multi-node-training-for-ai-workloads/ Bob Oesterlin Sr Principal Storage Engineer, Nuance -------------- next part -------------- An HTML attachment was scrubbed... URL: From kkr at lbl.gov Thu Oct 29 21:43:02 2020 From: kkr at lbl.gov (Kristy Kallback-Rose) Date: Thu, 29 Oct 2020 14:43:02 -0700 Subject: [gpfsug-discuss] SC20 Planning - What questions would you ask a panel? In-Reply-To: References: Message-ID: <3D8238FB-F8A5-48F1-BA5C-57AC93DCDE35@lbl.gov> Really? There?s nothing you want to ask about GPFS/Spectrum Scale? There will be access to developers and management alike, so I have to imagine you have something to ask? Don?t be shy. Please help make this a lively discussion by submitting a question, or two. Best, Kristy > On Oct 13, 2020, at 2:34 PM, Kristy Kallback-Rose wrote: > > Hi all, > > By now you know SC will be digital this year. We are working towards some SC events for the Spectrum Scale User Group, and using our usual slot of Sunday did not seem like a great idea. So, we?re planning a couple 90-minute sessions and would like to do a panel during one of them. We?ll hope to do live Q&A, like an in-person Ask Me Anything session, but it?s probably a good idea to have a bank of questions ready as well, Also, that way your question may get asked, even if you can?t make the live session ?we?ll record these sessions for later viewing. > > So, please send your questions for the panel and we?ll get a list built up. Better yet, attend the sessions live! Details to come, but for now, hold these time slots: > > November 16th - 8:00 AM Pacific/3:00 PM UTC > > November 18th - 8:00 AM Pacific/3:00 PM UTC > > Best, > Kristy > > Kristy Kallback-Rose > Senior HPC Storage Systems Analyst > National Energy Research Scientific Computing Center > Lawrence Berkeley National Laboratory > -------------- next part -------------- An HTML attachment was scrubbed... URL: From kkr at lbl.gov Thu Oct 29 21:49:34 2020 From: kkr at lbl.gov (Kristy Kallback-Rose) Date: Thu, 29 Oct 2020 14:49:34 -0700 Subject: [gpfsug-discuss] SC20 Sessions - Dates and times are settled, please join us! Message-ID: <8BECE369-B5B4-404F-B4C0-07EE02DE6295@lbl.gov> Hi all, The Spectrum Scale User Group will be hosting two 90 minute sessions at SC20 this year and we hope you can join us. The first one is: "Storage for AI" and will be held Monday, Nov. 16th, from 11:00-12:30 EST and the second one is "What's new in Spectrum Scale 5.1?" and will be held Wednesday, Nov. 18th from 11:00-12:30 EST. Please see the calendar at https://www.spectrumscaleug.org/eventslist/2020-11/ and register by clicking on a session on the calendar and then the "Please register here to join the session" link. Best, Kristy Kristy Kallback-Rose Senior HPC Storage Systems Analyst National Energy Research Scientific Computing Center Lawrence Berkeley National Laboratory From heinrich.billich at id.ethz.ch Fri Oct 30 12:21:58 2020 From: heinrich.billich at id.ethz.ch (Billich Heinrich Rainer (ID SD)) Date: Fri, 30 Oct 2020 12:21:58 +0000 Subject: [gpfsug-discuss] 'ganesha_mgr display_export - client not listed Message-ID: <660DD807-C723-44EF-BC51-57EFB296FFC4@id.ethz.ch> Hello, Some nfsv4 client of ganesha does not show up in the output of 'ganesha_mgr display_export'. The client has an active mount, but also shows some nfs issues, some commands did hang, the process just stays in state D (uninterruptible sleep) according to 'ps', but not the whole mount. I just wonder if the client's IP should always show up in the output of display_export once the client did issue a mount call and if the absence indicates that something is broken. Gutr,gut, Putting it the other way round: When is a client listed in the output of display_export and when is it removed from the list? We do collect more debug data, this is just something that catched my eye. Thank you, Heiner We run ganesha 2.7.5-ibm058.05 on a spectrum scale system on RedHat 7.7. I crosspost to the gpfsug mailing list. -- ======================= Heinrich Billich ETH Z?rich Informatikdienste Tel.: +41 44 632 72 56 heinrich.billich at id.ethz.ch ======================== # ganesha_mgr display_export 37 Display export with id 37 export 37: path = /xxxx/yyy, pseudo = /xxx/yyy , tag = /xxx/yyy Client type, CIDR version, CIDR address, CIDR mask, CIDR proto, Anonymous UID, Anonymous GID, Attribute timeout, Options, Set a.b.c.198/32, 0, 0, 255, 1, 4294967294, 4294967294, 0, 1126195680, 1081209831 a.b.c.143/32, 0, 0, 255, 1, 4294967294, 4294967294, 0, 1126195680, 1081209831 a.b.c.236/32, 0, 0, 255, 1, 4294967294, 4294967294, 0, 1126195680, 1081209831 a.b.c.34/32, 0, 0, 255, 1, 4294967294, 4294967294, 0, 1126195680, 1081209831 a.b.c.70/32, 0, 0, 255, 1, 4294967294, 4294967294, 0, 1126195680, 1081209831 a.b.c.71/32, 0, 0, 255, 1, 4294967294, 4294967294, 0, 1126195680, 1081209831 *, 0, 0, 0, 0, 4294967294, 4294967294, 0, 1126187490, 1081209831 From skylar2 at uw.edu Fri Oct 30 14:01:37 2020 From: skylar2 at uw.edu (Skylar Thompson) Date: Fri, 30 Oct 2020 07:01:37 -0700 Subject: [gpfsug-discuss] SC20 Planning - What questions would you ask a panel? In-Reply-To: <3D8238FB-F8A5-48F1-BA5C-57AC93DCDE35@lbl.gov> References: <3D8238FB-F8A5-48F1-BA5C-57AC93DCDE35@lbl.gov> Message-ID: <20201030140137.hakhxwppcmaoixy6@thargelion> Here's one: How is IBM working to improve the integration between TSM and GPFS? We're in the biomedical space and have some overlapping regulatory requirements around retention, which translate to complicated INCLUDE/EXCLUDE rules that mmbackup has always had trouble processing. In particular, we need to be able to INCLUDE particular paths to set a management class, but then EXCLUDE particular paths, which results in mmbackup generating file lists for dsmc including those excluded paths, which dsmc can exclude but it logs every single one every time it runs. On Thu, Oct 29, 2020 at 02:43:02PM -0700, Kristy Kallback-Rose wrote: > Really? There???s nothing you want to ask about GPFS/Spectrum Scale? There will be access to developers and management alike, so I have to imagine you have something to ask??? Don???t be shy. > > Please help make this a lively discussion by submitting a question, or two. > > Best, > Kristy > > > On Oct 13, 2020, at 2:34 PM, Kristy Kallback-Rose wrote: > > > > Hi all, > > > > By now you know SC will be digital this year. We are working towards some SC events for the Spectrum Scale User Group, and using our usual slot of Sunday did not seem like a great idea. So, we???re planning a couple 90-minute sessions and would like to do a panel during one of them. We???ll hope to do live Q&A, like an in-person Ask Me Anything session, but it???s probably a good idea to have a bank of questions ready as well, Also, that way your question may get asked, even if you can???t make the live session ???we???ll record these sessions for later viewing. > > > > So, please send your questions for the panel and we???ll get a list built up. Better yet, attend the sessions live! Details to come, but for now, hold these time slots: > > > > November 16th - 8:00 AM Pacific/3:00 PM UTC > > > > November 18th - 8:00 AM Pacific/3:00 PM UTC > > > > Best, > > Kristy > > > > Kristy Kallback-Rose > > Senior HPC Storage Systems Analyst > > National Energy Research Scientific Computing Center > > Lawrence Berkeley National Laboratory > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss -- -- Skylar Thompson (skylar2 at u.washington.edu) -- Genome Sciences Department (UW Medicine), System Administrator -- Foege Building S046, (206)-685-7354 -- Pronouns: He/Him/His From cblack at nygenome.org Fri Oct 30 14:19:24 2020 From: cblack at nygenome.org (Christopher Black) Date: Fri, 30 Oct 2020 14:19:24 +0000 Subject: [gpfsug-discuss] SC20 Sessions - Dates and times are settled, please join us! In-Reply-To: <8BECE369-B5B4-404F-B4C0-07EE02DE6295@lbl.gov> References: <8BECE369-B5B4-404F-B4C0-07EE02DE6295@lbl.gov> Message-ID: <62E7471D-02B9-4C27-B0F0-4038CCB2C66E@nygenome.org> Could you talk about upcoming work to address excessive prefetch when reading small fractions of many large files? Some bioinformatics workloads have a client node reading relatively small regions of multiple 50GB+ files. We've seen this trigger excessive prefetch bandwidth (especially on 16MB block filesystem). Investigation shows that much of the prefetched data is never read, but cache gets full, evicts blocks, then more prefetch happens. We can avoid this by turning prefetch off, but that reduces speed of other workloads that read full files sequentially. Turning prefetch on and off based on job won't work well for our users. We've heard this would be addressed in gpfs 5.1 at the earliest and have provided an example workload to devs. They've done some great analysis and determined the problem is worse on large (16M) block filesystems (which are now the recommended and default on new ess filesystems with sub-block allocation enabled). Best, Chris ?On 10/29/20, 5:49 PM, "gpfsug-discuss-bounces at spectrumscale.org on behalf of Kristy Kallback-Rose" wrote: Hi all, The Spectrum Scale User Group will be hosting two 90 minute sessions at SC20 this year and we hope you can join us. The first one is: "Storage for AI" and will be held Monday, Nov. 16th, from 11:00-12:30 EST and the second one is "What's new in Spectrum Scale 5.1?" and will be held Wednesday, Nov. 18th from 11:00-12:30 EST. Please see the calendar at https://urldefense.com/v3/__https://www.spectrumscaleug.org/eventslist/2020-11/__;!!C6sPl7C9qQ!G0wT65UH3HoMnjBM6_ZAVfZwWwJz5SoLE5gpB_LM0N8SNSU3TXItF31dfxG_8Pow$ and register by clicking on a session on the calendar and then the "Please register here to join the session" link. Best, Kristy Kristy Kallback-Rose Senior HPC Storage Systems Analyst National Energy Research Scientific Computing Center Lawrence Berkeley National Laboratory _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://urldefense.com/v3/__http://gpfsug.org/mailman/listinfo/gpfsug-discuss__;!!C6sPl7C9qQ!G0wT65UH3HoMnjBM6_ZAVfZwWwJz5SoLE5gpB_LM0N8SNSU3TXItF31df0lybvoA$ ________________________________ This message is for the recipient?s use only, and may contain confidential, privileged or protected information. Any unauthorized use or dissemination of this communication is prohibited. If you received this message in error, please immediately notify the sender and destroy all copies of this message. The recipient should check this email and any attachments for the presence of viruses, as we accept no liability for any damage caused by any virus transmitted by this email. From jonathan.buzzard at strath.ac.uk Fri Oct 2 17:14:12 2020 From: jonathan.buzzard at strath.ac.uk (Jonathan Buzzard) Date: Fri, 2 Oct 2020 17:14:12 +0100 Subject: [gpfsug-discuss] Services on DSS/ESS nodes Message-ID: <38bbcd24-4cc2-2abe-754f-3017d71c1eaa@strath.ac.uk> What if any are the rules around running additional services on DSS/ESS nodes with regard to support? Let me outline our scenario Our main cluster uses 10Gbps ethernet for storage with the DSS-G nodes hooked up with redundant 40Gbps ethernet. However we have an older cluster that is used for undergraduate teaching that only has 1Gbps ethernet and QDR Infiniband. With no money to upgrade this to 10Gbps ethernet to support this we flipped one of the ports on the ConnectX4 cards on each DSS-G node to Infiniband and using IPoIB run the teaching nodes in this way. However it means that we need an Ethernet to Infiniband gateway as the ethernet only connected nodes want to talk to the Infiniband connected ones on their Infiniband address. Not a problem we grabbed an old spare machine installed CentOS and configured it up to act as a bridge, and deploy a custom route to all the ethernet only connected nodes. It has been working fine for a couple of years now. The problem is that this becomes firstly a single point of failure, on hardware that is six years old now. Secondly to apply updates on the gateway machine means all the teaching nodes have to be drained and GPFS umounted to reboot the machine after updates have been installed. It is currently not getting patched as frequently as I would like (and required by the Scottish government). So thinking about it I have come to the conclusion that the ideal situation would be to use the DSS-G nodes as the gateway and run keepalived to move the gateway ethernet IP address between the two machines. It is idea because as long as one DSS-G node is up then there is a functioning gateway and nodes don't get ejected from the cluster. If both DSS-G nodes are down then there is no GPFS to mount anyway and lack of a gateway is a moot point. I grabbed a couple of the teaching compute nodes in the summer and trialed it out. It works a treat. I now need to check IBM are not going to throw a wobbler down the line if I need to get support before deploying it to the DSS-G nodes :-) JAB. -- Jonathan A. Buzzard Tel: +44141-5483420 HPC System Administrator, ARCHIE-WeSt. University of Strathclyde, John Anderson Building, Glasgow. G4 0NG From abeattie at au1.ibm.com Fri Oct 2 23:19:15 2020 From: abeattie at au1.ibm.com (Andrew Beattie) Date: Fri, 2 Oct 2020 22:19:15 +0000 Subject: [gpfsug-discuss] Services on DSS/ESS nodes In-Reply-To: <38bbcd24-4cc2-2abe-754f-3017d71c1eaa@strath.ac.uk> Message-ID: Jonathan, I suggest you get a formal statement from Lenovo as the DSS-G Platform is no longer an IBM platform. But for ESS based platforms the answer would be, it is not supported to run anything on the IO Servers other than GNR and the relevant Scale management services, due to the fact that if you lose an IO Server, or if you in an extended maintenance window the Server needs to host all the work that would be being performed by both IO servers. I don't know if Lenovo have different point if view. Regards, Andrew Sent from my iPhone > On 3 Oct 2020, at 02:14, Jonathan Buzzard wrote: > > > What if any are the rules around running additional services on DSS/ESS > nodes with regard to support? Let me outline our scenario > > Our main cluster uses 10Gbps ethernet for storage with the DSS-G nodes > hooked up with redundant 40Gbps ethernet. > > However we have an older cluster that is used for undergraduate teaching > that only has 1Gbps ethernet and QDR Infiniband. With no money to > upgrade this to 10Gbps ethernet to support this we flipped one of the > ports on the ConnectX4 cards on each DSS-G node to Infiniband and using > IPoIB run the teaching nodes in this way. > > However it means that we need an Ethernet to Infiniband gateway as the > ethernet only connected nodes want to talk to the Infiniband connected > ones on their Infiniband address. Not a problem we grabbed an old spare > machine installed CentOS and configured it up to act as a bridge, and > deploy a custom route to all the ethernet only connected nodes. It has > been working fine for a couple of years now. > > The problem is that this becomes firstly a single point of failure, on > hardware that is six years old now. Secondly to apply updates on the > gateway machine means all the teaching nodes have to be drained and GPFS > umounted to reboot the machine after updates have been installed. It is > currently not getting patched as frequently as I would like (and > required by the Scottish government). > > So thinking about it I have come to the conclusion that the ideal > situation would be to use the DSS-G nodes as the gateway and run > keepalived to move the gateway ethernet IP address between the two > machines. It is idea because as long as one DSS-G node is up then there > is a functioning gateway and nodes don't get ejected from the cluster. > If both DSS-G nodes are down then there is no GPFS to mount anyway and > lack of a gateway is a moot point. > > I grabbed a couple of the teaching compute nodes in the summer and > trialed it out. It works a treat. > > I now need to check IBM are not going to throw a wobbler down the line > if I need to get support before deploying it to the DSS-G nodes :-) > > > JAB. > > -- > Jonathan A. Buzzard Tel: +44141-5483420 > HPC System Administrator, ARCHIE-WeSt. > University of Strathclyde, John Anderson Building, Glasgow. G4 0NG > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jonathan.buzzard at strath.ac.uk Sat Oct 3 11:06:41 2020 From: jonathan.buzzard at strath.ac.uk (Jonathan Buzzard) Date: Sat, 3 Oct 2020 11:06:41 +0100 Subject: [gpfsug-discuss] Services on DSS/ESS nodes In-Reply-To: References: Message-ID: <7af0ac41-a280-5ecd-3658-7af761a4bf9b@strath.ac.uk> On 02/10/2020 23:19, Andrew Beattie wrote: > Jonathan, > I suggest you get a formal statement from Lenovo as the DSS-G Platform > is no longer an IBM platform. > > But for ESS based platforms the answer would be, it is not supported to > run anything on the IO Servers other than GNR and the relevant Scale > management services, due to the fact that if you lose an IO Server, or > if you in an extended maintenance window the Server needs to host all > the work that would be being performed by both IO servers. > In the past ~500 days the Infiniband to Ethernet gateway has shifted ~13GB of data, or about 25MB a day. Meanwhile in the last 470 days the DSS-G nodes have each shifted several PB. The proposed additional traffic is a drop in the ocean. On my actual routers which shift much more data (over 300TB externally) with an uptime of ~180 days at the moment the CPU time consumed by keepalived is just under 31 minutes or about 8 seconds a day. These are much punier CPU's too. The proposed additional CPU usage is another drop in the ocean. Given Lenovo sold the *same* configuration with x3650's and SR650's the "need all the CPU grunt" is somewhat fishy. Between the bid being submitted and actual tender award the SR650's came out and we paid a bit extra to uplift to the newer server hardware with exactly the same disk configuration. I believe IBM have done the same with the ESS/GNR servers too over time the same applies there too. IMHO given keepalived is a base RHEL package, IBM/Lenovo should be offering running Infiniband to Ethernet gateways on the DSS/ESS nodes as a supported configuration for mixed network technology clusters :-) Running a couple extra servers for this purpose is obnoxious from an environmental standpoint. That's IBM's green credentials out the window if you ask me. I would note under those rules running a Nagios, Zabbix etc. client on the nodes is not permitted either. I would suggest that most sites would be rather unhappy about that :-) > I don't know if Lenovo have different point if view. > Problem is when I ring up for support on my DSS-G I speak to an IBM employee not a Lenovo one :-) JAB. -- Jonathan A. Buzzard Tel: +44141-5483420 HPC System Administrator, ARCHIE-WeSt. University of Strathclyde, John Anderson Building, Glasgow. G4 0NG From abeattie at au1.ibm.com Sat Oct 3 11:55:05 2020 From: abeattie at au1.ibm.com (Andrew Beattie) Date: Sat, 3 Oct 2020 10:55:05 +0000 Subject: [gpfsug-discuss] Services on DSS/ESS nodes In-Reply-To: <7af0ac41-a280-5ecd-3658-7af761a4bf9b@strath.ac.uk> Message-ID: Why do you need to run any kind of monitoring client on an IO server the GUI / performance monitor already does all of that work for you and collects the data on the dedicated EMS server. If you have a small storage environment the. Yes the processor and memory may feel like overkill, but tuned appropriately an IO server will use all the memory you can give it to drive IO performance, If you want to run a hybrid / non standard architecture then the IBM ESS / DGSS platform may not be the right platform in comparison to a build your own architecture, how ever you then take all the support issues onto your self rather than it being the vendors problem. Sent from my iPhone > On 3 Oct 2020, at 20:06, Jonathan Buzzard wrote: > > On 02/10/2020 23:19, Andrew Beattie wrote: >> Jonathan, >> I suggest you get a formal statement from Lenovo as the DSS-G Platform >> is no longer an IBM platform. >> >> But for ESS based platforms the answer would be, it is not supported to >> run anything on the IO Servers other than GNR and the relevant Scale >> management services, due to the fact that if you lose an IO Server, or >> if you in an extended maintenance window the Server needs to host all >> the work that would be being performed by both IO servers. >> > > In the past ~500 days the Infiniband to Ethernet gateway has shifted > ~13GB of data, or about 25MB a day. Meanwhile in the last 470 days the > DSS-G nodes have each shifted several PB. The proposed additional > traffic is a drop in the ocean. > > On my actual routers which shift much more data (over 300TB externally) > with an uptime of ~180 days at the moment the CPU time consumed by > keepalived is just under 31 minutes or about 8 seconds a day. These are > much punier CPU's too. The proposed additional CPU usage is another drop > in the ocean. > > Given Lenovo sold the *same* configuration with x3650's and SR650's the > "need all the CPU grunt" is somewhat fishy. Between the bid being > submitted and actual tender award the SR650's came out and we paid a bit > extra to uplift to the newer server hardware with exactly the same disk > configuration. I believe IBM have done the same with the ESS/GNR servers > too over time the same applies there too. > > IMHO given keepalived is a base RHEL package, IBM/Lenovo should be > offering running Infiniband to Ethernet gateways on the DSS/ESS nodes as > a supported configuration for mixed network technology clusters :-) > > Running a couple extra servers for this purpose is obnoxious from an > environmental standpoint. That's IBM's green credentials out the window > if you ask me. > > I would note under those rules running a Nagios, Zabbix etc. client on > the nodes is not permitted either. I would suggest that most sites would > be rather unhappy about that :-) > > >> I don't know if Lenovo have different point if view. >> > > Problem is when I ring up for support on my DSS-G I speak to an IBM > employee not a Lenovo one :-) > > > JAB. > > -- > Jonathan A. Buzzard Tel: +44141-5483420 > HPC System Administrator, ARCHIE-WeSt. > University of Strathclyde, John Anderson Building, Glasgow. G4 0NG > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: From luis.bolinches at fi.ibm.com Sat Oct 3 12:19:36 2020 From: luis.bolinches at fi.ibm.com (Luis Bolinches) Date: Sat, 3 Oct 2020 11:19:36 +0000 Subject: [gpfsug-discuss] Services on DSS/ESS nodes In-Reply-To: Message-ID: Are you mixing those ESS DSS in the same cluster? Or you are only running DSS https://www.ibm.com/support/knowledgecenter/SSYSP8/gnrfaq.html?view=kc#supportqs__building Mixing DSS and ESS in the same cluster is not a supported configuration. You really need to talk with Lenovo as is your vendor. The fact that in your region your support is being given by an IBMer or not is not a relevant point. High enough in the chain always will end at IBM on any region as GNR is IBM tech for 17 years (yes 17) so if weird enough even on regions where Lenovo might do even third level it might end on development and/or research. But that is a Lenovo/IBM agreement not you and IBM. So please get the support statement from Lenovo about this and pls share it if you want/can so we all learn their position. Thanks. -- Cheers > On 3. Oct 2020, at 13.55, Andrew Beattie wrote: > > ? > Why do you need to run any kind of monitoring client on an IO server the GUI / performance monitor already does all of that work for you and collects the data on the dedicated EMS server. > > If you have a small storage environment the. Yes the processor and memory may feel like overkill, but tuned appropriately an IO server will use all the memory you can give it to drive IO performance, > > If you want to run a hybrid / non standard architecture then the IBM ESS / DGSS platform may not be the right platform in comparison to a build your own architecture, how ever you then take all the support issues onto your self rather than it being the vendors problem. > > Sent from my iPhone > > > On 3 Oct 2020, at 20:06, Jonathan Buzzard wrote: > > > > On 02/10/2020 23:19, Andrew Beattie wrote: > >> Jonathan, > >> I suggest you get a formal statement from Lenovo as the DSS-G Platform > >> is no longer an IBM platform. > >> > >> But for ESS based platforms the answer would be, it is not supported to > >> run anything on the IO Servers other than GNR and the relevant Scale > >> management services, due to the fact that if you lose an IO Server, or > >> if you in an extended maintenance window the Server needs to host all > >> the work that would be being performed by both IO servers. > >> > > > > In the past ~500 days the Infiniband to Ethernet gateway has shifted > > ~13GB of data, or about 25MB a day. Meanwhile in the last 470 days the > > DSS-G nodes have each shifted several PB. The proposed additional > > traffic is a drop in the ocean. > > > > On my actual routers which shift much more data (over 300TB externally) > > with an uptime of ~180 days at the moment the CPU time consumed by > > keepalived is just under 31 minutes or about 8 seconds a day. These are > > much punier CPU's too. The proposed additional CPU usage is another drop > > in the ocean. > > > > Given Lenovo sold the *same* configuration with x3650's and SR650's the > > "need all the CPU grunt" is somewhat fishy. Between the bid being > > submitted and actual tender award the SR650's came out and we paid a bit > > extra to uplift to the newer server hardware with exactly the same disk > > configuration. I believe IBM have done the same with the ESS/GNR servers > > too over time the same applies there too. > > > > IMHO given keepalived is a base RHEL package, IBM/Lenovo should be > > offering running Infiniband to Ethernet gateways on the DSS/ESS nodes as > > a supported configuration for mixed network technology clusters :-) > > > > Running a couple extra servers for this purpose is obnoxious from an > > environmental standpoint. That's IBM's green credentials out the window > > if you ask me. > > > > I would note under those rules running a Nagios, Zabbix etc. client on > > the nodes is not permitted either. I would suggest that most sites would > > be rather unhappy about that :-) > > > > > >> I don't know if Lenovo have different point if view. > >> > > > > Problem is when I ring up for support on my DSS-G I speak to an IBM > > employee not a Lenovo one :-) > > > > > > JAB. > > > > -- > > Jonathan A. Buzzard Tel: +44141-5483420 > > HPC System Administrator, ARCHIE-WeSt. > > University of Strathclyde, John Anderson Building, Glasgow. G4 0NG > > _______________________________________________ > > gpfsug-discuss mailing list > > gpfsug-discuss at spectrumscale.org > > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > Ellei edell? ole toisin mainittu: / Unless stated otherwise above: Oy IBM Finland Ab PL 265, 00101 Helsinki, Finland Business ID, Y-tunnus: 0195876-3 Registered in Finland -------------- next part -------------- An HTML attachment was scrubbed... URL: From jonathan.buzzard at strath.ac.uk Sat Oct 3 18:16:33 2020 From: jonathan.buzzard at strath.ac.uk (Jonathan Buzzard) Date: Sat, 3 Oct 2020 18:16:33 +0100 Subject: [gpfsug-discuss] Services on DSS/ESS nodes In-Reply-To: References: Message-ID: On 03/10/2020 11:55, Andrew Beattie wrote: > Why do you need to run any kind of monitoring client on an IO server the > GUI / performance monitor already does all of that work for you and > collects the data on the dedicated EMS server. Because any remotely sensible admin demands a single pane service monitoring system. If I have to look at A for everything but my DSS-G and B for my DSS-G that's an epic fail. I often feel there is a huge disjuncture between the people that develop systems and those that look after them; they think the world revolves around them. It is clear this is one of those cases. > > If you have a small storage environment the. Yes the processor and > memory may feel like overkill, but tuned appropriately an IO server will > use all the memory you can give it to drive IO performance, Right but the SR650's came with not only more CPU but more RAM than the x3650's. In which case why only 192GB of RAM? The SR650's can take much more than that. Why not 384GB of RAM :-) Right now we have a shade over 50GB of RAM being unused. Been way for like ever because we naughtily have a influx DB client setup for a Grafana dashboard. We also presumably naughtily have remote syslog to Splunk. > > If you want to run a hybrid / non standard architecture then the IBM ESS > / DGSS platform may not be the right platform in comparison to a build > your own architecture, how ever you then take all the support issues > onto your self rather than it being the vendors problem. > I don't see anything that says you can't have some clients ethernet connected and some Infiniband connected. That of course requires a gateway, and the most logical place to put it is on the ESS or DSS nodes IMHO. I will see what Lenovo has to say, but looks like the IBM position is decidedly let's burn the planet, who gives a dam. JAB. -- Jonathan A. Buzzard Tel: +44141-5483420 HPC System Administrator, ARCHIE-WeSt. University of Strathclyde, John Anderson Building, Glasgow. G4 0NG From jonathan.buzzard at strath.ac.uk Sat Oct 3 18:16:39 2020 From: jonathan.buzzard at strath.ac.uk (Jonathan Buzzard) Date: Sat, 3 Oct 2020 18:16:39 +0100 Subject: [gpfsug-discuss] Services on DSS/ESS nodes In-Reply-To: References: Message-ID: On 03/10/2020 12:19, Luis Bolinches wrote: > Are you mixing those ESS DSS in the same cluster? Or you are only > running DSS > Only running DSS. We are too far down the rabbit hole to ever switch to ESS now. > > Mixing DSS and ESS in the same cluster is not a supported configuration. > I know, it means you can never ever migrate your storage from DSS to ESS without a full backup and restore. Who with any significant amount of storage is going to want to do that? The logic behind this escapes me, or perhaps in that scenario IBM might relax the rules for the migration period. > You really need to talk with Lenovo as is your vendor. The fact that in > your region your support is being given by an IBMer or not is not a > relevant point. High enough in the chain always will end at IBM on any > region as GNR is IBM tech for 17 years (yes 17) so if weird enough even > on regions where Lenovo might do even third level it might end on > development and/or research. But that is a Lenovo/IBM agreement not you > and IBM. > > So please get the support statement from Lenovo about this and pls share > it if you want/can so we all learn their position. > Will attempt that, though I do think it should be a supported config out the box :-) JAB. -- Jonathan A. Buzzard Tel: +44141-5483420 HPC System Administrator, ARCHIE-WeSt. University of Strathclyde, John Anderson Building, Glasgow. G4 0NG From luis.bolinches at fi.ibm.com Sun Oct 4 10:29:34 2020 From: luis.bolinches at fi.ibm.com (Luis Bolinches) Date: Sun, 4 Oct 2020 09:29:34 +0000 Subject: [gpfsug-discuss] Services on DSS/ESS nodes In-Reply-To: Message-ID: Hi As stated on the same link you can do remote mounts from each other and be a supported setup. ? You can use the remote mount feature of IBM Spectrum Scale to share file system data across clusters.? -- Cheers > On 3. Oct 2020, at 20.16, Jonathan Buzzard wrote: > > ?On 03/10/2020 12:19, Luis Bolinches wrote: >> Are you mixing those ESS DSS in the same cluster? Or you are only >> running DSS > > Only running DSS. We are too far down the rabbit hole to ever switch to > ESS now. > >> Mixing DSS and ESS in the same cluster is not a supported configuration. > > I know, it means you can never ever migrate your storage from DSS to ESS > without a full backup and restore. Who with any significant amount of > storage is going to want to do that? The logic behind this escapes me, > or perhaps in that scenario IBM might relax the rules for the migration > period. > > >> You really need to talk with Lenovo as is your vendor. The fact that in >> your region your support is being given by an IBMer or not is not a >> relevant point. High enough in the chain always will end at IBM on any >> region as GNR is IBM tech for 17 years (yes 17) so if weird enough even >> on regions where Lenovo might do even third level it might end on >> development and/or research. But that is a Lenovo/IBM agreement not you >> and IBM. >> So please get the support statement from Lenovo about this and pls share >> it if you want/can so we all learn their position. > > Will attempt that, though I do think it should be a supported config out > the box :-) > > > JAB. > > -- > Jonathan A. Buzzard Tel: +44141-5483420 > HPC System Administrator, ARCHIE-WeSt. > University of Strathclyde, John Anderson Building, Glasgow. G4 0NG > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss Ellei edell? ole toisin mainittu: / Unless stated otherwise above: Oy IBM Finland Ab PL 265, 00101 Helsinki, Finland Business ID, Y-tunnus: 0195876-3 Registered in Finland -------------- next part -------------- An HTML attachment was scrubbed... URL: From jonathan.buzzard at strath.ac.uk Sun Oct 4 11:17:30 2020 From: jonathan.buzzard at strath.ac.uk (Jonathan Buzzard) Date: Sun, 4 Oct 2020 11:17:30 +0100 Subject: [gpfsug-discuss] Services on DSS/ESS nodes In-Reply-To: References: Message-ID: <7ef58f7d-1c70-97d7-100d-395c403d6199@strath.ac.uk> On 04/10/2020 10:29, Luis Bolinches wrote: > Hi > > As stated on the same link you can do remote mounts from each other and > be a supported setup. > > ??You can use the remote mount feature of IBM Spectrum Scale to share > file system data across clusters.? > You can, but imagine I have a DSS-G cluster, with 2PB of storage on it which is quite modest in 2020. It is now end of life and for whatever reason I decide I want to move to ESS instead. What any sane storage admin want to do at this stage is set the ESS, add the ESS nodes to the existing cluster on the DSS-G then do a bit of mmadddisk/mmdeldisk and sit back while the data is seemlessly moved from the DSS-G to the ESS. Admittedly this might take a while :-) Then once all the data is moved a bit of mmdelnode and bingo the storage has been migrated from DSS-G to ESS with zero downtime. As that is not allowed for what I presume are commercial reasons (you could do it in reverse and presumable that is what IBM don't want) then once you are down the rabbit hole of one type of storage the you are not going to switch to a different one. You need to look at it from the perspective of the users. They frankly could not give a monkeys what storage solution you are using. All they care about is having usable storage and large amounts of downtime to switch from one storage type to another is not really acceptable. JAB. -- Jonathan A. Buzzard Tel: +44141-5483420 HPC System Administrator, ARCHIE-WeSt. University of Strathclyde, John Anderson Building, Glasgow. G4 0NG From olaf.weiser at de.ibm.com Mon Oct 5 07:19:40 2020 From: olaf.weiser at de.ibm.com (Olaf Weiser) Date: Mon, 5 Oct 2020 06:19:40 +0000 Subject: [gpfsug-discuss] Services on DSS/ESS nodes In-Reply-To: <7ef58f7d-1c70-97d7-100d-395c403d6199@strath.ac.uk> References: <7ef58f7d-1c70-97d7-100d-395c403d6199@strath.ac.uk>, Message-ID: An HTML attachment was scrubbed... URL: From jordi.caubet at es.ibm.com Mon Oct 5 07:27:39 2020 From: jordi.caubet at es.ibm.com (Jordi Caubet Serrabou) Date: Mon, 5 Oct 2020 06:27:39 +0000 Subject: [gpfsug-discuss] Services on DSS/ESS nodes In-Reply-To: Message-ID: ?Coming to the routing point, is there any reason why you need it ? I mean, this is because GPFS trying to connect between compute nodes or a reason outside GPFS scope ? If the reason is GPFS, imho best approach - without knowledge of the licensing you have - would be to use separate clusters: a storage cluster and two compute clusters. Both compute clusters join using multicluster setup the storage cluster. There is no need both compute clusters see each other, they only need to see the storage cluster. One of the clusters using the 10G, the other cluster using the IPoIB interface. You need at least three quorum nodes in each compute cluster but if licensing is per drive on the DSS, it is covered. -- Jordi Caubet Serrabou IBM Software Defined Infrastructure (SDI) and Flash Technical Sales Specialist Technical Computing and HPC IT Specialist and Architect Ext. Phone: (+34) 679.79.17.84 (internal 55834) E-mail: jordi.caubet at es.ibm.com > On 5 Oct 2020, at 08:19, Olaf Weiser wrote: > > ? > let me add a few comments from some very successful large installations in Eruope > > # InterOP > Even though (as Luis pointed to) , there is no support statement to run intermix DSS/ESS in general, it was ~, and is, and will be, ~ allowed for short term purposes, such as e.g migration. > The reason to not support those DSS/ESS mixed configuration in general is simply driven by the fact, that different release version of DSS/ESS potentially (not in every release, but sometimes) comes with different driver levels, (e.g. MOFED), OS, RDMA-settings, GPFS tuning, etc... > Those changes can have an impact/multiple impacts and therefore, we do not support that in general. Of course -and this would be the advice for every one - if you are faced the need to run a mixed configuration for e.g. a migration and/or e.g. cause of you need to temporary provide space etc... contact you IBM representative and settle to plan that accordingly.. > There will be (likely) some additional requirements/dependencies defined like driver versions, OS, and/or Scale versions, but you'll get a chance to run mixed configuration - temporary limited to your specific scenario. > > # Monitoring > No doubt, monitoring is essential and absolutely needed. - and/but - IBM wants customers to be very sensitive, what kind of additional software (=workload) gets installed on the ESS-IO servers. BTW, this rule applies as well to any other important GPFS node with special roles (e.g. any other NSD server etc) > But given the fact, that customer's usually manage and monitor their server farms from a central point of control (any 3rd party software), it is common/ best practice , that additionally monitor software(clients/endpoints) has to run on GPFS nodes, so as on ESS nodes too. > > If that way of acceptance applies for DSS too, you may want to double check with Lenovo ?! > > > #additionally GW functions > It would be a hot iron, to general allow routing on IO nodes. Similar to the mixed support approach, the field variety for such a statement would be hard(==impossible) to manage. As we all agree, additional network traffic can (and in fact will) impact GPFS. > In your special case, the expected data rates seems to me more than ok and acceptable to go with your suggested config (as long workloads remain on that level / monitor it accordingly as you are already obviously doing) > Again,to be on the safe side.. contact your IBM representative and I'm sure you 'll find a way.. > > > > kind regards.... > olaf > > > ----- Original message ----- > From: Jonathan Buzzard > Sent by: gpfsug-discuss-bounces at spectrumscale.org > To: gpfsug-discuss at spectrumscale.org > Cc: > Subject: [EXTERNAL] Re: [gpfsug-discuss] Services on DSS/ESS nodes > Date: Sun, Oct 4, 2020 12:17 PM > > On 04/10/2020 10:29, Luis Bolinches wrote: > > Hi > > > > As stated on the same link you can do remote mounts from each other and > > be a supported setup. > > > > ? You can use the remote mount feature of IBM Spectrum Scale to share > > file system data across clusters.? > > > > You can, but imagine I have a DSS-G cluster, with 2PB of storage on it > which is quite modest in 2020. It is now end of life and for whatever > reason I decide I want to move to ESS instead. > > What any sane storage admin want to do at this stage is set the ESS, add > the ESS nodes to the existing cluster on the DSS-G then do a bit of > mmadddisk/mmdeldisk and sit back while the data is seemlessly moved from > the DSS-G to the ESS. Admittedly this might take a while :-) > > Then once all the data is moved a bit of mmdelnode and bingo the storage > has been migrated from DSS-G to ESS with zero downtime. > > As that is not allowed for what I presume are commercial reasons (you > could do it in reverse and presumable that is what IBM don't want) then > once you are down the rabbit hole of one type of storage the you are not > going to switch to a different one. > > You need to look at it from the perspective of the users. They frankly > could not give a monkeys what storage solution you are using. All they > care about is having usable storage and large amounts of downtime to > switch from one storage type to another is not really acceptable. > > > JAB. > > -- > Jonathan A. Buzzard Tel: +44141-5483420 > HPC System Administrator, ARCHIE-WeSt. > University of Strathclyde, John Anderson Building, Glasgow. G4 0NG > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > Salvo indicado de otro modo m?s arriba / Unless stated otherwise above: International Business Machines, S.A. Santa Hortensia, 26-28, 28002 Madrid Registro Mercantil de Madrid; Folio 1; Tomo 1525; Hoja M-28146 CIF A28-010791 -------------- next part -------------- An HTML attachment was scrubbed... URL: From S.J.Thompson at bham.ac.uk Mon Oct 5 09:40:56 2020 From: S.J.Thompson at bham.ac.uk (Simon Thompson) Date: Mon, 5 Oct 2020 08:40:56 +0000 Subject: [gpfsug-discuss] Services on DSS/ESS nodes In-Reply-To: <38bbcd24-4cc2-2abe-754f-3017d71c1eaa@strath.ac.uk> References: <38bbcd24-4cc2-2abe-754f-3017d71c1eaa@strath.ac.uk> Message-ID: <73AEC0BB-C67B-4863-A44F-6D32FDA56FEB@bham.ac.uk> > I now need to check IBM are not going to throw a wobbler down the line > if I need to get support before deploying it to the DSS-G nodes :-) I know there were a lot of other emails about this ... I think you maybe want to be careful doing this. Whilst it might work when you setup the DSS-G like this, remember that the memory usage you are seeing at this point in time may not be what you always need. For example if you fail-over the recovery groups, you need to have enough free memory to handle this. E.g. a node failure, or more likely you are upgrading the building blocks. Personally I wouldn't run other things like this on my DSS-G storage nodes. We do run e.g. nrpe monitoring to collect and report faults, but this is pretty lightweight compared to everything else. They even removed support for running the gui packages on the IO nodes - the early DSS-G builds used the IO nodes for this, but now you need separate systems for this. Simon From jonathan.buzzard at strath.ac.uk Mon Oct 5 12:44:48 2020 From: jonathan.buzzard at strath.ac.uk (Jonathan Buzzard) Date: Mon, 5 Oct 2020 12:44:48 +0100 Subject: [gpfsug-discuss] Services on DSS/ESS nodes In-Reply-To: <73AEC0BB-C67B-4863-A44F-6D32FDA56FEB@bham.ac.uk> References: <38bbcd24-4cc2-2abe-754f-3017d71c1eaa@strath.ac.uk> <73AEC0BB-C67B-4863-A44F-6D32FDA56FEB@bham.ac.uk> Message-ID: <905a0bdb-b6a1-90e4-bf57-ed8edae6fb7c@strath.ac.uk> On 05/10/2020 07:27, Jordi Caubet Serrabou wrote: > ?Coming to the routing point, is there any reason why you need it ? I > mean, this is because GPFS trying to connect between compute nodes or > a reason outside GPFS scope ? > If the reason is GPFS, imho best approach - without knowledge of the > licensing you have - would be to use separate clusters: a storage > cluster and two compute clusters. The issue is that individual nodes want to talk to one another on the data interface. Which caught me by surprise as the cluster is set to admin mode central. The admin interface runs over ethernet for all nodes on a specific VLAN which which is given 802.1p priority 5 (that's Voice, < 10 ms latency and jitter). That saved a bunch of switching and cabling as you don't need the extra interface for the admin traffic. The cabling already significantly restricts airflow for a compute rack as it is, without adding a whole bunch more for a barely used admin interface. It's like the people who wrote the best practice about separate interface for the admin traffic know very little about networking to be frankly honest. This is all last century technology. The nodes for undergraduate teaching only have a couple of 1Gb ethernet ports which would suck for storage usage. However they also have QDR Infiniband. That is because even though undergraduates can't run multinode jobs, on the old cluster the Lustre storage was delivered over Infiniband, so they got Infiniband cards. > Both compute clusters join using multicluster setup the storage > cluster. There is no need both compute clusters see each other, they > only need to see the storage cluster. One of the clusters using the > 10G, the other cluster using the IPoIB interface. > You need at least three quorum nodes in each compute cluster but if > licensing is per drive on the DSS, it is covered. Three clusters is starting to get complicated from an admin perspective. The biggest issue is coordinating maintenance and keep sufficient quorum nodes up. Maintenance on compute nodes is done via the job scheduler. I know some people think this is crazy, but it is in reality extremely elegant. We can schedule a reboot on a node as soon as the current job has finished (usually used for firmware upgrades). Or we can schedule a job to run as root (usually for applying updates) as soon as the current job has finished. As such we have no way of knowing when that will be for a given node, and there is a potential for all three quorum nodes to be down at once. Using this scheme we can seamlessly upgrade the nodes safe in the knowledge that a node is either busy and it's running on the current configuration or it has been upgraded and is running the new configuration. Consequently multinode jobs are guaranteed to have all nodes in the job running on the same configuration. The alternative is to drain the node, but there is only a 23% chance the node will become available during working hours leading to a significant loss of compute time when doing maintenance compared to our existing scheme where the loss of compute time is only as long as the upgrade takes to install. Pretty much the only time we have idle nodes is when the scheduler is reserving nodes ready to schedule a multi node job. Right now we have a single cluster with the quorum nodes being the two DSS-G nodes and the node used for backup. It is easy to ensure that quorum is maintained on these, they also all run real RHEL, where as the compute nodes run CentOS. JAB. -- Jonathan A. Buzzard Tel: +44141-5483420 HPC System Administrator, ARCHIE-WeSt. University of Strathclyde, John Anderson Building, Glasgow. G4 0NG From carlz at us.ibm.com Mon Oct 5 13:09:02 2020 From: carlz at us.ibm.com (Carl Zetie - carlz@us.ibm.com) Date: Mon, 5 Oct 2020 12:09:02 +0000 Subject: [gpfsug-discuss] Services on DSS/ESS nodes Message-ID: <714B599F-D06D-4D03-98F3-A2BF6F7360DB@us.ibm.com> Jordi wrote: ?Both compute clusters join using multicluster setup the storage cluster. There is no need both compute clusters see each other, they only need to see the storage cluster. One of the clusters using the 10G, the other cluster using the IPoIB interface. You need at least three quorum nodes in each compute cluster but if licensing is per drive on the DSS, it is covered.? As a side note: One of the reasons we designed capacity (per Disk or per TB) licensing the way we did was specifically so that you could make this kind of architectural decision on its own merits, without worrying about a licensing penalty. Carl Zetie Program Director Offering Management Spectrum Scale ---- (919) 473 3318 ][ Research Triangle Park carlz at us.ibm.com [signature_1243111775] -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image001.png Type: image/png Size: 69558 bytes Desc: image001.png URL: From carlz at us.ibm.com Mon Oct 5 13:20:25 2020 From: carlz at us.ibm.com (Carl Zetie - carlz@us.ibm.com) Date: Mon, 5 Oct 2020 12:20:25 +0000 Subject: [gpfsug-discuss] Services on DSS/ESS nodes Message-ID: <288C3527-32BA-43E2-B5EF-E79CC5765424@us.ibm.com> >> Mixing DSS and ESS in the same cluster is not a supported configuration. > > I know, it means you can never ever migrate your storage from DSS to ESS > without a full backup and restore. Who with any significant amount of > storage is going to want to do that? The logic behind this escapes me, > or perhaps in that scenario IBM might relax the rules for the migration > period. > We do indeed relax the rules temporarily for a migration. The reasoning behind this rule is for support. Many Scale support issues - often the toughest ones - are not about a single node, but about the cluster or network as a whole. So if you have a mix of IBM systems with systems supported by an OEM (this applies to any OEM by the way, not just Lenovo) and a cluster-wide issue, who are you going to call. (Well, in practice you?re going to call IBM and we?ll do our best to help you despite limits on our knowledge of the OEM systems?). --CZ Carl Zetie Program Director Offering Management Spectrum Scale ---- (919) 473 3318 ][ Research Triangle Park carlz at us.ibm.com [signature_386371469] -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image001.png Type: image/png Size: 69558 bytes Desc: image001.png URL: From jonathan.buzzard at strath.ac.uk Mon Oct 5 14:39:12 2020 From: jonathan.buzzard at strath.ac.uk (Jonathan Buzzard) Date: Mon, 5 Oct 2020 14:39:12 +0100 Subject: [gpfsug-discuss] Services on DSS/ESS nodes In-Reply-To: <73AEC0BB-C67B-4863-A44F-6D32FDA56FEB@bham.ac.uk> References: <38bbcd24-4cc2-2abe-754f-3017d71c1eaa@strath.ac.uk> <73AEC0BB-C67B-4863-A44F-6D32FDA56FEB@bham.ac.uk> Message-ID: On 05/10/2020 09:40, Simon Thompson wrote: >> I now need to check IBM are not going to throw a wobbler down the >> line if I need to get support before deploying it to the DSS-G >> nodes :-) > > I know there were a lot of other emails about this ... > > I think you maybe want to be careful doing this. Whilst it might work > when you setup the DSS-G like this, remember that the memory usage > you are seeing at this point in time may not be what you always need. > For example if you fail-over the recovery groups, you need to have > enough free memory to handle this. E.g. a node failure, or more > likely you are upgrading the building blocks. I think there is a lack of understanding on exactly how light weight keepalived is. It's the same code as on my routers which are admittedly different CPU's (MIPS to be precise) but memory usage (taking out shared memory usage - libc for example is loaded anyway) is under 200KB. A bash shell uses more memory... > > Personally I wouldn't run other things like this on my DSS-G storage > nodes. We do run e.g. nrpe monitoring to collect and report faults, > but this is pretty lightweight compared to everything else. They even > removed support for running the gui packages on the IO nodes - the > early DSS-G builds used the IO nodes for this, but now you need > separate systems for this. > And keepalived is in the same range as nrpe, which you do run :-) I have seen nrpe get out of hand and consume significant amounts of resources on a machine; the machine was ground to halt due to nrpe. One of the standard plugins was failing and sitting their busy waiting. Every five minutes it ran again. It of course decided to wait till ~7pm on a Friday to go wonky. By mid morning on Saturday it was virtually unresponsive, several minutes to get a shell... I would note that you can run keepalived quite happily on an Ubiquiti EdgeRouter X which has a dual core 880 MHz MIPS CPU with 256MB of RAM. Mikrotik have models with similar specs that run it too. On a dual Xeon Gold 6142 machine the usage of RAM and CPU by keepalived is noise. JAB. -- Jonathan A. Buzzard Tel: +44141-5483420 HPC System Administrator, ARCHIE-WeSt. University of Strathclyde, John Anderson Building, Glasgow. G4 0NG From committee at io500.org Thu Oct 1 17:40:00 2020 From: committee at io500.org (committee at io500.org) Date: Thu, 01 Oct 2020 10:40:00 -0600 Subject: [gpfsug-discuss] IO500 SC20 Call for Submission Message-ID: <4a20ed6ae985a25c69d953e1ea633d62@io500.org> CALL FOR IO500 SUBMISSION Deadline: 30 October 2020 AoE Stabilization period: 1st October -- 9th October 2020 AoE The IO500 [1] is now accepting and encouraging submissions for the upcoming 7th IO500 list, to be revealed at the IO500 Virtual BOF during SC20. Once again, we are also accepting submissions to the 10 Node I/O Challenge to encourage submission of small scale results. The new ranked lists will be announced at our Virtual SC20 BoF. We hope to see you, and your results, there. A new change for the upcoming submission procedure is the introduction of a stabilization period that aims to harden the benchmark. The final benchmark is released at the end of this period. During the stabilization we encourage the community to test the proper execution of the benchmark and provide us with feedback. We will apply bug fixes to the code base and expect that results obtained will be valid as full submission. We also continue with another list for the Student Cluster Competition, since IO500 is used during this competition. Also new this year is that we have partnered with Anthony Kougkas' team at Illinois Institute of Technology to evaluate the submission metadata describing the storage system on which the test was run to improve the quality and usefulness of the data IO500 collects. You may be contacted by one of his students to clarify one or more of the metadata items from your submission(s). We would appreciate, but do not require, your cooperation to help improve the submission metadata quality. Results from their work will be fed back to improve our submission process for future lists. The IO500 benchmark suite is designed to be easy to run, and the community has multiple active support channels to help with any questions. Please submit results from your system, and we look forward to seeing many of you at SC20! Please note that submissions of all sizes are welcome, including multiple submissions from different storage systems/tiers at a single site. The website has customizable sorting so it is possible to submit on a small system and still get a very good per-client score, for example. Additionally, the list is about much more than just the raw rank; all submissions help the community by collecting and publishing a wider corpus of data. More details below. Following the success of the Top500 in collecting and analyzing historical trends in supercomputer technology and evolution, the IO500 [1] was created in 2017, published its first list at SC17, and has grown continuously since then. The need for such an initiative has long been known within High-Performance Computing; however, defining appropriate benchmarks had long been challenging. Despite this challenge, the community, after long and spirited discussion, finally reached consensus on a suite of benchmarks and a metric for resolving the scores into a single ranking. The multi-fold goals of the benchmark suite are as follows: * Maximizing simplicity in running the benchmark suite * Encouraging complexity in tuning for performance * Allowing submitters to highlight their "hero run" performance numbers * Forcing submitters to simultaneously report performance for challenging IO patterns. Specifically, the benchmark suite includes a hero-run of both IOR and mdtest configured however possible to maximize performance and establish an upper-bound for performance. It also includes an IOR and mdtest run with highly prescribed parameters in an attempt to determine a lower-bound on the performance. Finally, it includes a namespace search, as this has been determined to be a highly sought-after feature in HPC storage systems that have historically not been well-measured. Submitters are encouraged to share their tuning insights for publication. The goals of the community are also multi-fold: * Gather historical data for the sake of analysis and to aid predictions of storage futures * Collect tuning information to share valuable performance optimizations across the community * Encourage vendors and designers to optimize for workloads beyond "hero runs" * Establish bounded expectations for users, procurers, and administrators 10 NODE I/O CHALLENGE The 10 Node Challenge is conducted using the regular IO500 benchmark, however, with the rule that exactly 10 client nodes must be used to run the benchmark. You may use any shared storage with, e.g., any number of servers. When submitting for the IO500 list, you can opt-in for "Participate in the 10 compute node challenge only", then we will not include the results into the ranked list. Other 10-node node submissions will be included in the full list and in the ranked list. We will announce the result in a separate derived list and in the full list but not on the ranked IO500 list at https://io500.org/ [2] BIRDS-OF-A-FEATHER Once again, we encourage you to submit [1], to join our community, and to attend our virtual BoF "The IO500 and the Virtual Institute of I/O" at SC20, where we will announce the new IO500 list, the 10 node challenge list, and the Student Cluster Competition list. We look forward to answering any questions or concerns you might have. * [1] http://www.vi4io.org/io500/submission [3] Thanks, The IO500 Committee Links: ------ [1] http://io500.org/ [2] https://io500.org/ [3] http://www.vi4io.org/io500/submission -------------- next part -------------- An HTML attachment was scrubbed... URL: From valdis.kletnieks at vt.edu Wed Oct 7 00:45:46 2020 From: valdis.kletnieks at vt.edu (Valdis Kl=?utf-8?Q?=c4=93?=tnieks) Date: Tue, 06 Oct 2020 19:45:46 -0400 Subject: [gpfsug-discuss] Services on DSS/ESS nodes In-Reply-To: References: Message-ID: <138651.1602027946@turing-police> On Sat, 03 Oct 2020 10:55:05 -0000, "Andrew Beattie" said: > Why do you need to run any kind of monitoring client on an IO server the > GUI / performance monitor already does all of that work for you and > collects the data on the dedicated EMS server. Does *ALL* that work for me? Will it toss you an alert if your sshd goes away, or if somebody's tossing packets that iptables is blocking for good reasons, or any of the many other things that a competent sysadmin wants to be alerted on that aren't GPFS, but which are things that Nagios and Zabbix and similar tools were invented to track? -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 832 bytes Desc: not available URL: From S.J.Thompson at bham.ac.uk Wed Oct 7 11:28:55 2020 From: S.J.Thompson at bham.ac.uk (Simon Thompson) Date: Wed, 7 Oct 2020 10:28:55 +0000 Subject: [gpfsug-discuss] Services on DSS/ESS nodes In-Reply-To: <138651.1602027946@turing-police> References: <138651.1602027946@turing-police> Message-ID: Agreed ... Report to me a pdisk is failing in my monitoring dashboard we use for *everything else*. Tell me that kswapd is having one of those days. Tell me rsyslogd has stopped sending for some reason. Tell me if there are long waiters on the hosts. Read the ipmi status of the host to tell me an OS drive is failed, or the CMOS battery is flat or ... Whilst the GUI has a bunch of this stuff, in the real world the rest of us have reporting and dashboarding from many more systems... Simon ?On 07/10/2020, 00:45, "gpfsug-discuss-bounces at spectrumscale.org on behalf of Valdis Kl?tnieks" wrote: On Sat, 03 Oct 2020 10:55:05 -0000, "Andrew Beattie" said: > Why do you need to run any kind of monitoring client on an IO server the > GUI / performance monitor already does all of that work for you and > collects the data on the dedicated EMS server. Does *ALL* that work for me? Will it toss you an alert if your sshd goes away, or if somebody's tossing packets that iptables is blocking for good reasons, or any of the many other things that a competent sysadmin wants to be alerted on that aren't GPFS, but which are things that Nagios and Zabbix and similar tools were invented to track? From jonathan.buzzard at strath.ac.uk Wed Oct 7 13:14:45 2020 From: jonathan.buzzard at strath.ac.uk (Jonathan Buzzard) Date: Wed, 7 Oct 2020 13:14:45 +0100 Subject: [gpfsug-discuss] Services on DSS/ESS nodes In-Reply-To: References: <138651.1602027946@turing-police> Message-ID: On 07/10/2020 11:28, Simon Thompson wrote: > Agreed ... > > Report to me a pdisk is failing in my monitoring dashboard we use for *everything else*. > Tell me that kswapd is having one of those days. > Tell me rsyslogd has stopped sending for some reason. > Tell me if there are long waiters on the hosts. > Read the ipmi status of the host to tell me an OS drive is failed, or the CMOS battery is flat or ... > > Whilst the GUI has a bunch of this stuff, in the real world the rest of us have reporting and dashboarding from many more systems... > The problem is the developers know as much about looking after a system in the real world as a tea leaf knows the history of the East India Company. IMHO to even ask the question shows a total lack of understanding of the issue. Consequently developers in their ivory towers have a habit of developing things that are as useful as a chocolate tea pot. Which putting it bluntly a competent sysadmins makes them look like a bunch of twits. I would note this is not a problem unique to IBM, it's developers in general. The appropriate course of action would be not for IBM to develop a monitoring tool of their own but to provide a bunch of plugins for the popular monitoring tools that customers will already be using to monitor their whole IT estate. Heaven forbid they could even run a poll to find out which ones the actual customers of their products are interested in rather than wasting effort developing software their customers are not actually interested in. For my purposes there is I think an alternative. The actual routing of the IP packets is not a service, it's a kernel configuration to have the kernel route that packets :-) Keepalived just manages a floating IP address. There are other options to achieve this. They are clunkier but they side step IBM's silly rules. I would however note at this point that at lots of sites all routing in the data centre is done using BGP. It comes in part out of the zero trust paradigm. I guess apparently running fail2ban is not permitted either. Can I even run firewalld? As you can seen a nothing else policy quickly becomes unsustainable IMHO. There is a disjuncture between the developers in their ivory towers and the real world. JAB. -- Jonathan A. Buzzard Tel: +44141-5483420 HPC System Administrator, ARCHIE-WeSt. University of Strathclyde, John Anderson Building, Glasgow. G4 0NG From kkr at lbl.gov Tue Oct 13 22:34:23 2020 From: kkr at lbl.gov (Kristy Kallback-Rose) Date: Tue, 13 Oct 2020 14:34:23 -0700 Subject: [gpfsug-discuss] SC20 Planning - What questions would you ask a panel? Message-ID: Hi all, By now you know SC will be digital this year. We are working towards some SC events for the Spectrum Scale User Group, and using our usual slot of Sunday did not seem like a great idea. So, we?re planning a couple 90-minute sessions and would like to do a panel during one of them. We?ll hope to do live Q&A, like an in-person Ask Me Anything session, but it?s probably a good idea to have a bank of questions ready as well, Also, that way your question may get asked, even if you can?t make the live session ?we?ll record these sessions for later viewing. So, please send your questions for the panel and we?ll get a list built up. Better yet, attend the sessions live! Details to come, but for now, hold these time slots: November 16th - 8:00 AM Pacific/3:00 PM UTC November 18th - 8:00 AM Pacific/3:00 PM UTC Best, Kristy Kristy Kallback-Rose Senior HPC Storage Systems Analyst National Energy Research Scientific Computing Center Lawrence Berkeley National Laboratory -------------- next part -------------- An HTML attachment was scrubbed... URL: From juergen.hannappel at desy.de Wed Oct 21 17:13:01 2020 From: juergen.hannappel at desy.de (Hannappel, Juergen) Date: Wed, 21 Oct 2020 18:13:01 +0200 (CEST) Subject: [gpfsug-discuss] Mounting an nfs share on a CES node Message-ID: <1195503772.13156505.1603296781279.JavaMail.zimbra@desy.de> Hi, I have a CES node exporting some filesystems vis smb and ganesha in a standard CES setup. Now I want to mount a nfs share from a different, non-CES server on this CES node. This did not work: mount -o -fstype=nfs4,minorversion=1,rw,rsize=65536,wsize=65536 some.other.server:/some/path /mnt/ mount.nfs: mount to NFS server 'some.other.server:/some/path' failed: RPC Error: Program unavailable Does the CES software stack interfere with the nfs client setup? It seems that at least with rpc-statd there is some conflict: systemctl status rpc-statd ? rpc-statd.service - NFS status monitor for NFSv2/3 locking. Loaded: loaded (/usr/lib/systemd/system/rpc-statd.service; static; vendor preset: disabled) Active: failed (Result: exit-code) since Wed 2020-10-21 17:48:21 CEST; 22min ago Process: 19896 ExecStart=/usr/sbin/rpc.statd $STATDARGS (code=exited, status=1/FAILURE) Oct 21 17:48:21 mynode systemd[1]: Starting NFS status monitor for NFSv2/3 locking.... Oct 21 17:48:21 mynode rpc.statd[19896]: Statd service already running! Oct 21 17:48:21 mynode systemd[1]: rpc-statd.service: control process exited, code=exited status=1 Oct 21 17:48:21 mynode systemd[1]: Failed to start NFS status monitor for NFSv2/3 locking.. Oct 21 17:48:21 mynode systemd[1]: Unit rpc-statd.service entered failed state. Oct 21 17:48:21 mynode systemd[1]: rpc-statd.service failed. -- Dr. J?rgen Hannappel DESY/IT Tel. : +49 40 8998-4616 From mnaineni at in.ibm.com Thu Oct 22 04:38:59 2020 From: mnaineni at in.ibm.com (Malahal R Naineni) Date: Thu, 22 Oct 2020 03:38:59 +0000 Subject: [gpfsug-discuss] Mounting an nfs share on a CES node In-Reply-To: <1195503772.13156505.1603296781279.JavaMail.zimbra@desy.de> References: <1195503772.13156505.1603296781279.JavaMail.zimbra@desy.de> Message-ID: An HTML attachment was scrubbed... URL: From andi at christiansen.xxx Tue Oct 27 11:46:02 2020 From: andi at christiansen.xxx (Andi Christiansen) Date: Tue, 27 Oct 2020 12:46:02 +0100 (CET) Subject: [gpfsug-discuss] Alternative to Scale S3 API. Message-ID: <1109480230.484366.1603799162955@privateemail.com> Hi all, We have over a longer period used the S3 API within spectrum Scale.. And that has shown that it does not support very many applications because of limitations of the API.. Has anyone got any experience with any other product we can deploy on-top of Spectrum Scale that will give us a true S3 API with full functionalities and able to answer on port 443? As of now we use HAProxy to forware ssl request back and forth from Scale S3 API. We have looked at MinIO which seems to be fairly simple and maybe might solve a lot of incompatibilities with clients software. But the product seems to be very badly documented at least for me. The idea is basically that a client uses their backup application(rubrik, veeam etc.) to connect to a domain(for example backup.mycompany.com) with their access and secret key and have access to their bucket only. and it must be over https/ssl. If someone has any knowledge to minio or any other product that might solve our problem I will be glad to hear from you! ? Thank you in advance! -------------- next part -------------- An HTML attachment was scrubbed... URL: From NISHAAN at za.ibm.com Tue Oct 27 13:38:01 2020 From: NISHAAN at za.ibm.com (Nishaan Docrat) Date: Tue, 27 Oct 2020 15:38:01 +0200 Subject: [gpfsug-discuss] Alternative to Scale S3 API. In-Reply-To: <1109480230.484366.1603799162955@privateemail.com> References: <1109480230.484366.1603799162955@privateemail.com> Message-ID: Hi Andi The current S3 compatibility in Spectrum Scale is delivered via the Swift3 middleware. This middleware has since been replaced by s3api in later versions of Swift. Spectrum Scale 5.1 will make use of Swift Train release which will include the new s3api middleware. I've tested the S3 compatibility with a few applications including Spectrum Scale itself (i.e. Cloud Data Sharing to another Scale Object store using S3 API) and Spectrum Protect etc. and haven't had any issue. I've also ran a few application tools to test for an S3 compliant object stores and again had no issues. You can use s3compat to test the current compatibility.. Or you can check here for the current compatibility.. https://docs.openstack.org/swift/latest/s3_compat.html Not sure if there is any other way to talk HTTPS without using HAProxy. In any case, I've documented the process to setup an S3 compliant object store including supporting virtual-hosted style bucket addressing which you can find here.. https://www.linkedin.com/feed/update/urn:li:activity:6720227398756909056/ Most storage vendors including minio would not support the full S3 API stack as alot of the calls are specific to AWS (like billing stuff etc.). Anyway, good luck with your testing. Kind Regards Nishaan Docrat Client Technical Specialist - Storage Systems IBM Systems Hardware Work: +27 (0)11 302 5001 Mobile: +27 (0)81 040 3793 Email: nishaan at za.ibm.com From: Andi Christiansen To: "gpfsug-discuss at spectrumscale.org" Date: 2020/10/27 13:59 Subject: [EXTERNAL] [gpfsug-discuss] Alternative to Scale S3 API. Sent by: gpfsug-discuss-bounces at spectrumscale.org Hi all, We have over a longer period used the S3 API within spectrum Scale.. And that has shown that it does not support very many applications because of limitations of the API.. Has anyone got any experience with any other product we can deploy on-top of Spectrum Scale that will give us a true S3 API with full functionalities and able to answer on port 443? As of now we use HAProxy to forware ssl request back and forth from Scale S3 API. We have looked at MinIO which seems to be fairly simple and maybe might solve a lot of incompatibilities with clients software. But the product seems to be very badly documented at least for me. The idea is basically that a client uses their backup application(rubrik, veeam etc.) to connect to a domain(for example backup.mycompany.com) with their access and secret key and have access to their bucket only. and it must be over https/ssl. If someone has any knowledge to minio or any other product that might solve our problem I will be glad to hear from you! ? Thank you in advance! _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: 52733301.jpg Type: image/jpeg Size: 35643 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: From andi at christiansen.xxx Wed Oct 28 06:24:52 2020 From: andi at christiansen.xxx (Andi Christiansen) Date: Wed, 28 Oct 2020 07:24:52 +0100 (CET) Subject: [gpfsug-discuss] Alternative to Scale S3 API. In-Reply-To: References: <1109480230.484366.1603799162955@privateemail.com> Message-ID: <2126571944.509878.1603866292369@privateemail.com> Hi Nishaan, Thanks for you reply. When you say 5.1? is that 5.1.x.x or 5.0.5.1? Some of the limitations we have encountered is the multipart upload not being supported and some md5sum that the s3 api does that veeam actually dont like. also interms of the management on the Scale GUI, that has to be on one of the S3 CES nodes in order to be able to show project, container etc... but when you have a HAProxy for enabling SSL then a GUI is not available as they both use port 443? i know min.io is not the full stack of S3 API commands but as far as i can read it comes with more features out of the box than Scale S3 does, multipart for an example... I looked through your documentation and its very close to what we have set up today and found to not work... If multipart uploads would be supported today on scale S3 i would think about still using scale for the s3 part but as i expect that you talk about 5.1.x.x i dont see that being released any time soon? and dont know if that is actually going to be supported in that release then i cant wait for that to happen.. Best Regards Andi Christiansen > On 10/27/2020 2:38 PM Nishaan Docrat wrote: > > > > Hi Andi > > The current S3 compatibility in Spectrum Scale is delivered via the Swift3 middleware. This middleware has since been replaced by s3api in later versions of Swift. Spectrum Scale 5.1 will make use of Swift Train release which will include the new s3api middleware. > > I've tested the S3 compatibility with a few applications including Spectrum Scale itself (i.e. Cloud Data Sharing to another Scale Object store using S3 API) and Spectrum Protect etc. and haven't had any issue. I've also ran a few application tools to test for an S3 compliant object stores and again had no issues. > > You can use s3compat to test the current compatibility.. Or you can check here for the current compatibility.. https://docs.openstack.org/swift/latest/s3_compat.html https://docs.openstack.org/swift/latest/s3_compat.html > > Not sure if there is any other way to talk HTTPS without using HAProxy. > > In any case, I've documented the process to setup an S3 compliant object store including supporting virtual-hosted style bucket addressing which you can find here.. > > https://www.linkedin.com/feed/update/urn:li:activity:6720227398756909056/ https://www.linkedin.com/feed/update/urn:li:activity:6720227398756909056/ > > Most storage vendors including minio would not support the full S3 API stack as alot of the calls are specific to AWS (like billing stuff etc.). > > Anyway, good luck with your testing. > > Kind Regards > > Nishaan Docrat > Client Technical Specialist - Storage Systems > IBM Systems Hardware > > Work: +27 (0)11 302 5001 > Mobile: +27 (0)81 040 3793 > Email: nishaan at za.ibm.com http://www.ibm.com/storage > > > > [Inactive hide details for Andi Christiansen ---2020/10/27 13:59:30---Hi all, We have over a longer period used the S3 API withi]Andi Christiansen ---2020/10/27 13:59:30---Hi all, We have over a longer period used the S3 API within spectrum Scale.. And that has shown that > > From: Andi Christiansen > To: "gpfsug-discuss at spectrumscale.org" > Date: 2020/10/27 13:59 > Subject: [EXTERNAL] [gpfsug-discuss] Alternative to Scale S3 API. > Sent by: gpfsug-discuss-bounces at spectrumscale.org > > > --------------------------------------------- > > > > Hi all, > > > > We have over a longer period used the S3 API within spectrum Scale.. And that has shown that it does not support very many applications because of limitations of the API.. > > > > Has anyone got any experience with any other product we can deploy on-top of Spectrum Scale that will give us a true S3 API with full functionalities and able to answer on port 443? As of now we use HAProxy to forware ssl request back and forth from Scale S3 API. > > > > We have looked at MinIO which seems to be fairly simple and maybe might solve a lot of incompatibilities with clients software. But the product seems to be very badly documented at least for me. > > The idea is basically that a client uses their backup application(rubrik, veeam etc.) to connect to a domain(for example backup.mycompany.com) with their access and secret key and have access to their bucket only. and it must be over https/ssl. > > > > If someone has any knowledge to minio or any other product that might solve our problem I will be glad to hear from you! ? > > Thank you in advance! > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: 52733301.jpg Type: image/jpeg Size: 35643 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: From NISHAAN at za.ibm.com Wed Oct 28 06:45:29 2020 From: NISHAAN at za.ibm.com (Nishaan Docrat) Date: Wed, 28 Oct 2020 08:45:29 +0200 Subject: [gpfsug-discuss] Alternative to Scale S3 API. In-Reply-To: <2126571944.509878.1603866292369@privateemail.com> References: <1109480230.484366.1603799162955@privateemail.com> <2126571944.509878.1603866292369@privateemail.com> Message-ID: Hi Andi The s3api middleware does support multipart uploads.. https://docs.openstack.org/swift/latest/s3_compat.html The current version of Swift (PIKE) that is bundled with Spectrum Scale 5.0.X doesn't.. https://www.ibm.com/support/knowledgecenter/en/STXKQY_5.0.5/com.ibm.spectrum.scale.v5r05.doc/bl1adm_ManagingOpenStackACLsviaAmazonS3API.htm According to the Spectrum Scale Roadmap, 5.1 is due out 2H20.. Not sure if someone from development can confirm the GA date. Does Veeam have a test utility? You could always test it using the current Swift AIO or if you can provide me with a test utility I can test that for you. Kind Regards Nishaan Docrat Client Technical Specialist - Storage Systems IBM Systems Hardware Work: +27 (0)11 302 5001 Mobile: +27 (0)81 040 3793 Email: nishaan at za.ibm.com From: Andi Christiansen To: gpfsug main discussion list , Nishaan Docrat Date: 2020/10/28 08:24 Subject: [EXTERNAL] Re: [gpfsug-discuss] Alternative to Scale S3 API. Hi Nishaan, Thanks for you reply. When you say 5.1? is that 5.1.x.x or 5.0.5.1? Some... This Message Is From an External Sender This message came from outside your organization. Hi Nishaan, Thanks for you reply. When you say 5.1? is that 5.1.x.x or 5.0.5.1? Some of the limitations we have encountered is the multipart upload not being supported and some md5sum that the s3 api does that veeam actually dont like. also interms of the management on the Scale GUI, that has to be on one of the S3 CES nodes in order to be able to show project, container etc... but when you have a HAProxy for enabling SSL then a GUI is not available as they both use port 443? i know min.io is not the full stack of S3 API commands but as far as i can read it comes with more features out of the box than Scale S3 does, multipart for an example... I looked through your documentation and its very close to what we have set up today and found to not work... If multipart uploads would be supported today on scale S3 i would think about still using scale for the s3 part but as i expect that you talk about 5.1.x.x i dont see that being released any time soon? and dont know if that is actually going to be supported in that release then i cant wait for that to happen.. Best Regards Andi Christiansen On 10/27/2020 2:38 PM Nishaan Docrat wrote: Hi Andi The current S3 compatibility in Spectrum Scale is delivered via the Swift3 middleware. This middleware has since been replaced by s3api in later versions of Swift. Spectrum Scale 5.1 will make use of Swift Train release which will include the new s3api middleware. I've tested the S3 compatibility with a few applications including Spectrum Scale itself (i.e. Cloud Data Sharing to another Scale Object store using S3 API) and Spectrum Protect etc. and haven't had any issue. I've also ran a few application tools to test for an S3 compliant object stores and again had no issues. You can use s3compat to test the current compatibility.. Or you can check here for the current compatibility.. https://docs.openstack.org/swift/latest/s3_compat.html Not sure if there is any other way to talk HTTPS without using HAProxy. In any case, I've documented the process to setup an S3 compliant object store including supporting virtual-hosted style bucket addressing which you can find here.. https://www.linkedin.com/feed/update/urn:li:activity:6720227398756909056/ Most storage vendors including minio would not support the full S3 API stack as alot of the calls are specific to AWS (like billing stuff etc.). Anyway, good luck with your testing. Kind Regards Nishaan Docrat Client Technical Specialist - Storage Systems IBM Systems Hardware Work: +27 (0)11 302 5001 Mobile: +27 (0)81 040 3793 Email: nishaan at za.ibm.com Inactive hide details for Andi Christiansen ---2020/10/27 13:59:30---Hi all, We have over a longer period used the S3 API withi Andi Christiansen ---2020/10/27 13:59:30---Hi all, We have over a longer period used the S3 API within spectrum Scale.. And that has shown that From: Andi Christiansen To: "gpfsug-discuss at spectrumscale.org" Date: 2020/10/27 13:59 Subject: [EXTERNAL] [gpfsug-discuss] Alternative to Scale S3 API. Sent by: gpfsug-discuss-bounces at spectrumscale.org Hi all, We have over a longer period used the S3 API within spectrum Scale.. And that has shown that it does not support very many applications because of limitations of the API.. Has anyone got any experience with any other product we can deploy on-top of Spectrum Scale that will give us a true S3 API with full functionalities and able to answer on port 443? As of now we use HAProxy to forware ssl request back and forth from Scale S3 API. We have looked at MinIO which seems to be fairly simple and maybe might solve a lot of incompatibilities with clients software. But the product seems to be very badly documented at least for me. The idea is basically that a client uses their backup application (rubrik, veeam etc.) to connect to a domain(for example backup.mycompany.com) with their access and secret key and have access to their bucket only. and it must be over https/ssl. If someone has any knowledge to minio or any other product that might solve our problem I will be glad to hear from you! ? Thank you in advance! _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: 19991351.jpg Type: image/jpeg Size: 35643 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: From NISHAAN at za.ibm.com Wed Oct 28 07:12:55 2020 From: NISHAAN at za.ibm.com (Nishaan Docrat) Date: Wed, 28 Oct 2020 09:12:55 +0200 Subject: [gpfsug-discuss] Alternative to Scale S3 API. In-Reply-To: <2126571944.509878.1603866292369@privateemail.com> References: <1109480230.484366.1603799162955@privateemail.com> <2126571944.509878.1603866292369@privateemail.com> Message-ID: Hi Andi Sorry forgot to mention that I was told 5.1 will include the Swift Train release (2.23). The change from swift3 middleware to s3api was done in the Queens release (2.18) so 5.1 will definitely have multipart support. Kind Regards Nishaan Docrat Client Technical Specialist - Storage Systems IBM Systems Hardware Work: +27 (0)11 302 5001 Mobile: +27 (0)81 040 3793 Email: nishaan at za.ibm.com From: Andi Christiansen To: gpfsug main discussion list , Nishaan Docrat Date: 2020/10/28 08:24 Subject: [EXTERNAL] Re: [gpfsug-discuss] Alternative to Scale S3 API. Hi Nishaan, Thanks for you reply. When you say 5.1? is that 5.1.x.x or 5.0.5.1? Some... This Message Is From an External Sender This message came from outside your organization. Hi Nishaan, Thanks for you reply. When you say 5.1? is that 5.1.x.x or 5.0.5.1? Some of the limitations we have encountered is the multipart upload not being supported and some md5sum that the s3 api does that veeam actually dont like. also interms of the management on the Scale GUI, that has to be on one of the S3 CES nodes in order to be able to show project, container etc... but when you have a HAProxy for enabling SSL then a GUI is not available as they both use port 443? i know min.io is not the full stack of S3 API commands but as far as i can read it comes with more features out of the box than Scale S3 does, multipart for an example... I looked through your documentation and its very close to what we have set up today and found to not work... If multipart uploads would be supported today on scale S3 i would think about still using scale for the s3 part but as i expect that you talk about 5.1.x.x i dont see that being released any time soon? and dont know if that is actually going to be supported in that release then i cant wait for that to happen.. Best Regards Andi Christiansen On 10/27/2020 2:38 PM Nishaan Docrat wrote: Hi Andi The current S3 compatibility in Spectrum Scale is delivered via the Swift3 middleware. This middleware has since been replaced by s3api in later versions of Swift. Spectrum Scale 5.1 will make use of Swift Train release which will include the new s3api middleware. I've tested the S3 compatibility with a few applications including Spectrum Scale itself (i.e. Cloud Data Sharing to another Scale Object store using S3 API) and Spectrum Protect etc. and haven't had any issue. I've also ran a few application tools to test for an S3 compliant object stores and again had no issues. You can use s3compat to test the current compatibility.. Or you can check here for the current compatibility.. https://docs.openstack.org/swift/latest/s3_compat.html Not sure if there is any other way to talk HTTPS without using HAProxy. In any case, I've documented the process to setup an S3 compliant object store including supporting virtual-hosted style bucket addressing which you can find here.. https://www.linkedin.com/feed/update/urn:li:activity:6720227398756909056/ Most storage vendors including minio would not support the full S3 API stack as alot of the calls are specific to AWS (like billing stuff etc.). Anyway, good luck with your testing. Kind Regards Nishaan Docrat Client Technical Specialist - Storage Systems IBM Systems Hardware Work: +27 (0)11 302 5001 Mobile: +27 (0)81 040 3793 Email: nishaan at za.ibm.com Inactive hide details for Andi Christiansen ---2020/10/27 13:59:30---Hi all, We have over a longer period used the S3 API withi Andi Christiansen ---2020/10/27 13:59:30---Hi all, We have over a longer period used the S3 API within spectrum Scale.. And that has shown that From: Andi Christiansen To: "gpfsug-discuss at spectrumscale.org" Date: 2020/10/27 13:59 Subject: [EXTERNAL] [gpfsug-discuss] Alternative to Scale S3 API. Sent by: gpfsug-discuss-bounces at spectrumscale.org Hi all, We have over a longer period used the S3 API within spectrum Scale.. And that has shown that it does not support very many applications because of limitations of the API.. Has anyone got any experience with any other product we can deploy on-top of Spectrum Scale that will give us a true S3 API with full functionalities and able to answer on port 443? As of now we use HAProxy to forware ssl request back and forth from Scale S3 API. We have looked at MinIO which seems to be fairly simple and maybe might solve a lot of incompatibilities with clients software. But the product seems to be very badly documented at least for me. The idea is basically that a client uses their backup application (rubrik, veeam etc.) to connect to a domain(for example backup.mycompany.com) with their access and secret key and have access to their bucket only. and it must be over https/ssl. If someone has any knowledge to minio or any other product that might solve our problem I will be glad to hear from you! ? Thank you in advance! _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: 17810834.jpg Type: image/jpeg Size: 35643 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: From luis.bolinches at fi.ibm.com Wed Oct 28 07:15:21 2020 From: luis.bolinches at fi.ibm.com (Luis Bolinches) Date: Wed, 28 Oct 2020 07:15:21 +0000 Subject: [gpfsug-discuss] Alternative to Scale S3 API. In-Reply-To: Message-ID: An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: Image.1__=CDBB0C9CDFB04CE98f9e8a93df938690918cCDB at .jpg Type: image/jpeg Size: 35643 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: Image.2__=CDBB0C9CDFB04CE98f9e8a93df938690918cCDB at .gif Type: image/gif Size: 105 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: Image.1__=CDBB0C9CDFB04CE98f9e8a93df938690918cCDB at .jpg Type: image/jpeg Size: 35643 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: Image.2__=CDBB0C9CDFB04CE98f9e8a93df938690918cCDB at .gif Type: image/gif Size: 105 bytes Desc: not available URL: From NISHAAN at za.ibm.com Wed Oct 28 07:45:45 2020 From: NISHAAN at za.ibm.com (Nishaan Docrat) Date: Wed, 28 Oct 2020 09:45:45 +0200 Subject: [gpfsug-discuss] Alternative to Scale S3 API. In-Reply-To: References: Message-ID: Hi Luis Thanks for your reply.. It should address Andi's issue as the underlying Swift version is what is important and the functionality he needs is in the latest releases (I was told 5.1 includes Swift Train which is the latest version). Am sure there is a beta program for Spectrum Scale.. Perhaps Andi should speak to his software sales rep and ask to be included on it to get access so that he can test. Kind Regards Nishaan Docrat Client Technical Specialist - Storage Systems IBM Systems Hardware Work: +27 (0)11 302 5001 Mobile: +27 (0)81 040 3793 Email: nishaan at za.ibm.com From: "Luis Bolinches" To: gpfsug-discuss at spectrumscale.org Cc: gpfsug-discuss at spectrumscale.org Date: 2020/10/28 09:29 Subject: [EXTERNAL] Re: [gpfsug-discuss] Alternative to Scale S3 API. Sent by: gpfsug-discuss-bounces at spectrumscale.org Hi 5.1.x is going GA very soon (TM). Would it address the issues Andi sees on his environment or not I cannot say. I can take it with Andi for more details on the GA date -- Yst?v?llisin terveisin / Kind regards / Saludos cordiales / Salutations / Salutacions Luis Bolinches Consultant IT Specialist IBM Spectrum Scale development Mobile Phone: +358503112585 https://www.youracclaim.com/user/luis-bolinches Ab IBM Finland Oy Laajalahdentie 23 00330 Helsinki Uusimaa - Finland "If you always give you will always have" -- Anonymous ----- Original message ----- From: "Nishaan Docrat" Sent by: gpfsug-discuss-bounces at spectrumscale.org To: Andi Christiansen Cc: gpfsug main discussion list Subject: [EXTERNAL] Re: [gpfsug-discuss] Alternative to Scale S3 API. Date: Wed, Oct 28, 2020 08:47 Hi Andi The s3api middleware does support multipart uploads.. https://docs.openstack.org/swift/latest/s3_compat.html The current version of Swift (PIKE) that is bundled with Spectrum Scale 5.0.X doesn't.. https://www.ibm.com/support/knowledgecenter/en/STXKQY_5.0.5/com.ibm.spectrum.scale.v5r05.doc/bl1adm_ManagingOpenStackACLsviaAmazonS3API.htm According to the Spectrum Scale Roadmap, 5.1 is due out 2H20.. Not sure if someone from development can confirm the GA date. Does Veeam have a test utility? You could always test it using the current Swift AIO or if you can provide me with a test utility I can test that for you. Kind Regards Nishaan Docrat Client Technical Specialist - Storage Systems IBM Systems Hardware Work: +27 (0)11 302 5001 Mobile: +27 (0)81 040 3793 Email: nishaan at za.ibm.com Inactive hide details for Andi Christiansen ---2020/10/28 08:24:55---Hi Nishaan, Thanks for you reply.Andi Christiansen ---2020/10/28 08:24:55---Hi Nishaan, Thanks for you reply. From: Andi Christiansen To: gpfsug main discussion list , Nishaan Docrat Date: 2020/10/28 08:24 Subject: [EXTERNAL] Re: [gpfsug-discuss] Alternative to Scale S3 API. Hi Nishaan, Thanks for you reply. When you say 5.1? is that 5.1.x.x or 5.0.5.1? Some... This Message Is From an External Sender This message came from outside your organization. Hi Nishaan, Thanks for you reply. When you say 5.1? is that 5.1.x.x or 5.0.5.1? Some of the limitations we have encountered is the multipart upload not being supported and some md5sum that the s3 api does that veeam actually dont like. also interms of the management on the Scale GUI, that has to be on one of the S3 CES nodes in order to be able to show project, container etc... but when you have a HAProxy for enabling SSL then a GUI is not available as they both use port 443? i know min.io is not the full stack of S3 API commands but as far as i can read it comes with more features out of the box than Scale S3 does, multipart for an example... I looked through your documentation and its very close to what we have set up today and found to not work... If multipart uploads would be supported today on scale S3 i would think about still using scale for the s3 part but as i expect that you talk about 5.1.x.x i dont see that being released any time soon? and dont know if that is actually going to be supported in that release then i cant wait for that to happen.. Best Regards Andi Christiansen On 10/27/2020 2:38 PM Nishaan Docrat wrote: Hi Andi The current S3 compatibility in Spectrum Scale is delivered via the Swift3 middleware. This middleware has since been replaced by s3api in later versions of Swift. Spectrum Scale 5.1 will make use of Swift Train release which will include the new s3api middleware. I've tested the S3 compatibility with a few applications including Spectrum Scale itself (i.e. Cloud Data Sharing to another Scale Object store using S3 API) and Spectrum Protect etc. and haven't had any issue. I've also ran a few application tools to test for an S3 compliant object stores and again had no issues. You can use s3compat to test the current compatibility.. Or you can check here for the current compatibility.. https://docs.openstack.org/swift/latest/s3_compat.html Not sure if there is any other way to talk HTTPS without using HAProxy. In any case, I've documented the process to setup an S3 compliant object store including supporting virtual-hosted style bucket addressing which you can find here.. https://www.linkedin.com/feed/update/urn:li:activity:6720227398756909056/ Most storage vendors including minio would not support the full S3 API stack as alot of the calls are specific to AWS (like billing stuff etc.). Anyway, good luck with your testing. Kind Regards Nishaan Docrat Client Technical Specialist - Storage Systems IBM Systems Hardware Work: +27 (0)11 302 5001 Mobile: +27 (0)81 040 3793 Email: nishaan at za.ibm.com Inactive hide details for Andi Christiansen ---2020/10/27 13:59:30---Hi all, We have over a longer period used the S3 API withiAndi Christiansen ---2020/10/27 13:59:30---Hi all, We have over a longer period used the S3 API within spectrum Scale.. And that has shown that From: Andi Christiansen To: "gpfsug-discuss at spectrumscale.org" Date: 2020/10/27 13:59 Subject: [EXTERNAL] [gpfsug-discuss] Alternative to Scale S3 API. Sent by: gpfsug-discuss-bounces at spectrumscale.org Hi all, We have over a longer period used the S3 API within spectrum Scale.. And that has shown that it does not support very many applications because of limitations of the API.. Has anyone got any experience with any other product we can deploy on-top of Spectrum Scale that will give us a true S3 API with full functionalities and able to answer on port 443? As of now we use HAProxy to forware ssl request back and forth from Scale S3 API. We have looked at MinIO which seems to be fairly simple and maybe might solve a lot of incompatibilities with clients software. But the product seems to be very badly documented at least for me. The idea is basically that a client uses their backup application(rubrik, veeam etc.) to connect to a domain(for example backup.mycompany.com) with their access and secret key and have access to their bucket only. and it must be over https/ssl. If someone has any knowledge to minio or any other product that might solve our problem I will be glad to hear from you! ? Thank you in advance! _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss Ellei edell? ole toisin mainittu: / Unless stated otherwise above: Oy IBM Finland Ab PL 265, 00101 Helsinki, Finland Business ID, Y-tunnus: 0195876-3 Registered in Finland _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: 16781831.jpg Type: image/jpeg Size: 35643 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: From luis.bolinches at fi.ibm.com Wed Oct 28 07:51:30 2020 From: luis.bolinches at fi.ibm.com (Luis Bolinches) Date: Wed, 28 Oct 2020 07:51:30 +0000 Subject: [gpfsug-discuss] Alternative to Scale S3 API. In-Reply-To: References: , Message-ID: An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: Image.1__=CDBB0C9CDFB9D9C48f9e8a93df938690918cCDB at .jpg Type: image/jpeg Size: 35643 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: Image.2__=CDBB0C9CDFB9D9C48f9e8a93df938690918cCDB at .gif Type: image/gif Size: 105 bytes Desc: not available URL: From Robert.Oesterlin at nuance.com Thu Oct 29 11:16:13 2020 From: Robert.Oesterlin at nuance.com (Oesterlin, Robert) Date: Thu, 29 Oct 2020 11:16:13 +0000 Subject: [gpfsug-discuss] SSUG Digital Expert Talk: 11/4 - AI workloads on NVIDIA DGX, Red Hat OpenShift and IBM Spectrum Scale Message-ID: <77EA43ED-C430-42CA-872E-D2307F244775@nuance.com> Reminder for our upcoming expert talk: SSUG::Digital: Scalable multi-node training for AI workloads on NVIDIA DGX, Red Hat OpenShift and IBM Spectrum Scale November 4 @ 16:15 - 17:45 GMT Nvidia and IBM did a complex proof-of-concept to demonstrate the scaling of AI workload using Nvidia DGX, Red Hat OpenShift and IBM Spectrum Scale at the example of ResNet-50 and the segmentation of images using the Audi A2D2 dataset. The project team published an IBM Redpaper with all the technical details and will present the key learnings and results. Registration link for Webex session: https://www.spectrumscaleug.org/event/ssugdigital-multi-node-training-for-ai-workloads/ Bob Oesterlin Sr Principal Storage Engineer, Nuance -------------- next part -------------- An HTML attachment was scrubbed... URL: From kkr at lbl.gov Thu Oct 29 21:43:02 2020 From: kkr at lbl.gov (Kristy Kallback-Rose) Date: Thu, 29 Oct 2020 14:43:02 -0700 Subject: [gpfsug-discuss] SC20 Planning - What questions would you ask a panel? In-Reply-To: References: Message-ID: <3D8238FB-F8A5-48F1-BA5C-57AC93DCDE35@lbl.gov> Really? There?s nothing you want to ask about GPFS/Spectrum Scale? There will be access to developers and management alike, so I have to imagine you have something to ask? Don?t be shy. Please help make this a lively discussion by submitting a question, or two. Best, Kristy > On Oct 13, 2020, at 2:34 PM, Kristy Kallback-Rose wrote: > > Hi all, > > By now you know SC will be digital this year. We are working towards some SC events for the Spectrum Scale User Group, and using our usual slot of Sunday did not seem like a great idea. So, we?re planning a couple 90-minute sessions and would like to do a panel during one of them. We?ll hope to do live Q&A, like an in-person Ask Me Anything session, but it?s probably a good idea to have a bank of questions ready as well, Also, that way your question may get asked, even if you can?t make the live session ?we?ll record these sessions for later viewing. > > So, please send your questions for the panel and we?ll get a list built up. Better yet, attend the sessions live! Details to come, but for now, hold these time slots: > > November 16th - 8:00 AM Pacific/3:00 PM UTC > > November 18th - 8:00 AM Pacific/3:00 PM UTC > > Best, > Kristy > > Kristy Kallback-Rose > Senior HPC Storage Systems Analyst > National Energy Research Scientific Computing Center > Lawrence Berkeley National Laboratory > -------------- next part -------------- An HTML attachment was scrubbed... URL: From kkr at lbl.gov Thu Oct 29 21:49:34 2020 From: kkr at lbl.gov (Kristy Kallback-Rose) Date: Thu, 29 Oct 2020 14:49:34 -0700 Subject: [gpfsug-discuss] SC20 Sessions - Dates and times are settled, please join us! Message-ID: <8BECE369-B5B4-404F-B4C0-07EE02DE6295@lbl.gov> Hi all, The Spectrum Scale User Group will be hosting two 90 minute sessions at SC20 this year and we hope you can join us. The first one is: "Storage for AI" and will be held Monday, Nov. 16th, from 11:00-12:30 EST and the second one is "What's new in Spectrum Scale 5.1?" and will be held Wednesday, Nov. 18th from 11:00-12:30 EST. Please see the calendar at https://www.spectrumscaleug.org/eventslist/2020-11/ and register by clicking on a session on the calendar and then the "Please register here to join the session" link. Best, Kristy Kristy Kallback-Rose Senior HPC Storage Systems Analyst National Energy Research Scientific Computing Center Lawrence Berkeley National Laboratory From heinrich.billich at id.ethz.ch Fri Oct 30 12:21:58 2020 From: heinrich.billich at id.ethz.ch (Billich Heinrich Rainer (ID SD)) Date: Fri, 30 Oct 2020 12:21:58 +0000 Subject: [gpfsug-discuss] 'ganesha_mgr display_export - client not listed Message-ID: <660DD807-C723-44EF-BC51-57EFB296FFC4@id.ethz.ch> Hello, Some nfsv4 client of ganesha does not show up in the output of 'ganesha_mgr display_export'. The client has an active mount, but also shows some nfs issues, some commands did hang, the process just stays in state D (uninterruptible sleep) according to 'ps', but not the whole mount. I just wonder if the client's IP should always show up in the output of display_export once the client did issue a mount call and if the absence indicates that something is broken. Gutr,gut, Putting it the other way round: When is a client listed in the output of display_export and when is it removed from the list? We do collect more debug data, this is just something that catched my eye. Thank you, Heiner We run ganesha 2.7.5-ibm058.05 on a spectrum scale system on RedHat 7.7. I crosspost to the gpfsug mailing list. -- ======================= Heinrich Billich ETH Z?rich Informatikdienste Tel.: +41 44 632 72 56 heinrich.billich at id.ethz.ch ======================== # ganesha_mgr display_export 37 Display export with id 37 export 37: path = /xxxx/yyy, pseudo = /xxx/yyy , tag = /xxx/yyy Client type, CIDR version, CIDR address, CIDR mask, CIDR proto, Anonymous UID, Anonymous GID, Attribute timeout, Options, Set a.b.c.198/32, 0, 0, 255, 1, 4294967294, 4294967294, 0, 1126195680, 1081209831 a.b.c.143/32, 0, 0, 255, 1, 4294967294, 4294967294, 0, 1126195680, 1081209831 a.b.c.236/32, 0, 0, 255, 1, 4294967294, 4294967294, 0, 1126195680, 1081209831 a.b.c.34/32, 0, 0, 255, 1, 4294967294, 4294967294, 0, 1126195680, 1081209831 a.b.c.70/32, 0, 0, 255, 1, 4294967294, 4294967294, 0, 1126195680, 1081209831 a.b.c.71/32, 0, 0, 255, 1, 4294967294, 4294967294, 0, 1126195680, 1081209831 *, 0, 0, 0, 0, 4294967294, 4294967294, 0, 1126187490, 1081209831 From skylar2 at uw.edu Fri Oct 30 14:01:37 2020 From: skylar2 at uw.edu (Skylar Thompson) Date: Fri, 30 Oct 2020 07:01:37 -0700 Subject: [gpfsug-discuss] SC20 Planning - What questions would you ask a panel? In-Reply-To: <3D8238FB-F8A5-48F1-BA5C-57AC93DCDE35@lbl.gov> References: <3D8238FB-F8A5-48F1-BA5C-57AC93DCDE35@lbl.gov> Message-ID: <20201030140137.hakhxwppcmaoixy6@thargelion> Here's one: How is IBM working to improve the integration between TSM and GPFS? We're in the biomedical space and have some overlapping regulatory requirements around retention, which translate to complicated INCLUDE/EXCLUDE rules that mmbackup has always had trouble processing. In particular, we need to be able to INCLUDE particular paths to set a management class, but then EXCLUDE particular paths, which results in mmbackup generating file lists for dsmc including those excluded paths, which dsmc can exclude but it logs every single one every time it runs. On Thu, Oct 29, 2020 at 02:43:02PM -0700, Kristy Kallback-Rose wrote: > Really? There???s nothing you want to ask about GPFS/Spectrum Scale? There will be access to developers and management alike, so I have to imagine you have something to ask??? Don???t be shy. > > Please help make this a lively discussion by submitting a question, or two. > > Best, > Kristy > > > On Oct 13, 2020, at 2:34 PM, Kristy Kallback-Rose wrote: > > > > Hi all, > > > > By now you know SC will be digital this year. We are working towards some SC events for the Spectrum Scale User Group, and using our usual slot of Sunday did not seem like a great idea. So, we???re planning a couple 90-minute sessions and would like to do a panel during one of them. We???ll hope to do live Q&A, like an in-person Ask Me Anything session, but it???s probably a good idea to have a bank of questions ready as well, Also, that way your question may get asked, even if you can???t make the live session ???we???ll record these sessions for later viewing. > > > > So, please send your questions for the panel and we???ll get a list built up. Better yet, attend the sessions live! Details to come, but for now, hold these time slots: > > > > November 16th - 8:00 AM Pacific/3:00 PM UTC > > > > November 18th - 8:00 AM Pacific/3:00 PM UTC > > > > Best, > > Kristy > > > > Kristy Kallback-Rose > > Senior HPC Storage Systems Analyst > > National Energy Research Scientific Computing Center > > Lawrence Berkeley National Laboratory > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss -- -- Skylar Thompson (skylar2 at u.washington.edu) -- Genome Sciences Department (UW Medicine), System Administrator -- Foege Building S046, (206)-685-7354 -- Pronouns: He/Him/His From cblack at nygenome.org Fri Oct 30 14:19:24 2020 From: cblack at nygenome.org (Christopher Black) Date: Fri, 30 Oct 2020 14:19:24 +0000 Subject: [gpfsug-discuss] SC20 Sessions - Dates and times are settled, please join us! In-Reply-To: <8BECE369-B5B4-404F-B4C0-07EE02DE6295@lbl.gov> References: <8BECE369-B5B4-404F-B4C0-07EE02DE6295@lbl.gov> Message-ID: <62E7471D-02B9-4C27-B0F0-4038CCB2C66E@nygenome.org> Could you talk about upcoming work to address excessive prefetch when reading small fractions of many large files? Some bioinformatics workloads have a client node reading relatively small regions of multiple 50GB+ files. We've seen this trigger excessive prefetch bandwidth (especially on 16MB block filesystem). Investigation shows that much of the prefetched data is never read, but cache gets full, evicts blocks, then more prefetch happens. We can avoid this by turning prefetch off, but that reduces speed of other workloads that read full files sequentially. Turning prefetch on and off based on job won't work well for our users. We've heard this would be addressed in gpfs 5.1 at the earliest and have provided an example workload to devs. They've done some great analysis and determined the problem is worse on large (16M) block filesystems (which are now the recommended and default on new ess filesystems with sub-block allocation enabled). Best, Chris ?On 10/29/20, 5:49 PM, "gpfsug-discuss-bounces at spectrumscale.org on behalf of Kristy Kallback-Rose" wrote: Hi all, The Spectrum Scale User Group will be hosting two 90 minute sessions at SC20 this year and we hope you can join us. The first one is: "Storage for AI" and will be held Monday, Nov. 16th, from 11:00-12:30 EST and the second one is "What's new in Spectrum Scale 5.1?" and will be held Wednesday, Nov. 18th from 11:00-12:30 EST. Please see the calendar at https://urldefense.com/v3/__https://www.spectrumscaleug.org/eventslist/2020-11/__;!!C6sPl7C9qQ!G0wT65UH3HoMnjBM6_ZAVfZwWwJz5SoLE5gpB_LM0N8SNSU3TXItF31dfxG_8Pow$ and register by clicking on a session on the calendar and then the "Please register here to join the session" link. Best, Kristy Kristy Kallback-Rose Senior HPC Storage Systems Analyst National Energy Research Scientific Computing Center Lawrence Berkeley National Laboratory _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://urldefense.com/v3/__http://gpfsug.org/mailman/listinfo/gpfsug-discuss__;!!C6sPl7C9qQ!G0wT65UH3HoMnjBM6_ZAVfZwWwJz5SoLE5gpB_LM0N8SNSU3TXItF31df0lybvoA$ ________________________________ This message is for the recipient?s use only, and may contain confidential, privileged or protected information. Any unauthorized use or dissemination of this communication is prohibited. If you received this message in error, please immediately notify the sender and destroy all copies of this message. The recipient should check this email and any attachments for the presence of viruses, as we accept no liability for any damage caused by any virus transmitted by this email. From jonathan.buzzard at strath.ac.uk Fri Oct 2 17:14:12 2020 From: jonathan.buzzard at strath.ac.uk (Jonathan Buzzard) Date: Fri, 2 Oct 2020 17:14:12 +0100 Subject: [gpfsug-discuss] Services on DSS/ESS nodes Message-ID: <38bbcd24-4cc2-2abe-754f-3017d71c1eaa@strath.ac.uk> What if any are the rules around running additional services on DSS/ESS nodes with regard to support? Let me outline our scenario Our main cluster uses 10Gbps ethernet for storage with the DSS-G nodes hooked up with redundant 40Gbps ethernet. However we have an older cluster that is used for undergraduate teaching that only has 1Gbps ethernet and QDR Infiniband. With no money to upgrade this to 10Gbps ethernet to support this we flipped one of the ports on the ConnectX4 cards on each DSS-G node to Infiniband and using IPoIB run the teaching nodes in this way. However it means that we need an Ethernet to Infiniband gateway as the ethernet only connected nodes want to talk to the Infiniband connected ones on their Infiniband address. Not a problem we grabbed an old spare machine installed CentOS and configured it up to act as a bridge, and deploy a custom route to all the ethernet only connected nodes. It has been working fine for a couple of years now. The problem is that this becomes firstly a single point of failure, on hardware that is six years old now. Secondly to apply updates on the gateway machine means all the teaching nodes have to be drained and GPFS umounted to reboot the machine after updates have been installed. It is currently not getting patched as frequently as I would like (and required by the Scottish government). So thinking about it I have come to the conclusion that the ideal situation would be to use the DSS-G nodes as the gateway and run keepalived to move the gateway ethernet IP address between the two machines. It is idea because as long as one DSS-G node is up then there is a functioning gateway and nodes don't get ejected from the cluster. If both DSS-G nodes are down then there is no GPFS to mount anyway and lack of a gateway is a moot point. I grabbed a couple of the teaching compute nodes in the summer and trialed it out. It works a treat. I now need to check IBM are not going to throw a wobbler down the line if I need to get support before deploying it to the DSS-G nodes :-) JAB. -- Jonathan A. Buzzard Tel: +44141-5483420 HPC System Administrator, ARCHIE-WeSt. University of Strathclyde, John Anderson Building, Glasgow. G4 0NG From abeattie at au1.ibm.com Fri Oct 2 23:19:15 2020 From: abeattie at au1.ibm.com (Andrew Beattie) Date: Fri, 2 Oct 2020 22:19:15 +0000 Subject: [gpfsug-discuss] Services on DSS/ESS nodes In-Reply-To: <38bbcd24-4cc2-2abe-754f-3017d71c1eaa@strath.ac.uk> Message-ID: Jonathan, I suggest you get a formal statement from Lenovo as the DSS-G Platform is no longer an IBM platform. But for ESS based platforms the answer would be, it is not supported to run anything on the IO Servers other than GNR and the relevant Scale management services, due to the fact that if you lose an IO Server, or if you in an extended maintenance window the Server needs to host all the work that would be being performed by both IO servers. I don't know if Lenovo have different point if view. Regards, Andrew Sent from my iPhone > On 3 Oct 2020, at 02:14, Jonathan Buzzard wrote: > > > What if any are the rules around running additional services on DSS/ESS > nodes with regard to support? Let me outline our scenario > > Our main cluster uses 10Gbps ethernet for storage with the DSS-G nodes > hooked up with redundant 40Gbps ethernet. > > However we have an older cluster that is used for undergraduate teaching > that only has 1Gbps ethernet and QDR Infiniband. With no money to > upgrade this to 10Gbps ethernet to support this we flipped one of the > ports on the ConnectX4 cards on each DSS-G node to Infiniband and using > IPoIB run the teaching nodes in this way. > > However it means that we need an Ethernet to Infiniband gateway as the > ethernet only connected nodes want to talk to the Infiniband connected > ones on their Infiniband address. Not a problem we grabbed an old spare > machine installed CentOS and configured it up to act as a bridge, and > deploy a custom route to all the ethernet only connected nodes. It has > been working fine for a couple of years now. > > The problem is that this becomes firstly a single point of failure, on > hardware that is six years old now. Secondly to apply updates on the > gateway machine means all the teaching nodes have to be drained and GPFS > umounted to reboot the machine after updates have been installed. It is > currently not getting patched as frequently as I would like (and > required by the Scottish government). > > So thinking about it I have come to the conclusion that the ideal > situation would be to use the DSS-G nodes as the gateway and run > keepalived to move the gateway ethernet IP address between the two > machines. It is idea because as long as one DSS-G node is up then there > is a functioning gateway and nodes don't get ejected from the cluster. > If both DSS-G nodes are down then there is no GPFS to mount anyway and > lack of a gateway is a moot point. > > I grabbed a couple of the teaching compute nodes in the summer and > trialed it out. It works a treat. > > I now need to check IBM are not going to throw a wobbler down the line > if I need to get support before deploying it to the DSS-G nodes :-) > > > JAB. > > -- > Jonathan A. Buzzard Tel: +44141-5483420 > HPC System Administrator, ARCHIE-WeSt. > University of Strathclyde, John Anderson Building, Glasgow. G4 0NG > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jonathan.buzzard at strath.ac.uk Sat Oct 3 11:06:41 2020 From: jonathan.buzzard at strath.ac.uk (Jonathan Buzzard) Date: Sat, 3 Oct 2020 11:06:41 +0100 Subject: [gpfsug-discuss] Services on DSS/ESS nodes In-Reply-To: References: Message-ID: <7af0ac41-a280-5ecd-3658-7af761a4bf9b@strath.ac.uk> On 02/10/2020 23:19, Andrew Beattie wrote: > Jonathan, > I suggest you get a formal statement from Lenovo as the DSS-G Platform > is no longer an IBM platform. > > But for ESS based platforms the answer would be, it is not supported to > run anything on the IO Servers other than GNR and the relevant Scale > management services, due to the fact that if you lose an IO Server, or > if you in an extended maintenance window the Server needs to host all > the work that would be being performed by both IO servers. > In the past ~500 days the Infiniband to Ethernet gateway has shifted ~13GB of data, or about 25MB a day. Meanwhile in the last 470 days the DSS-G nodes have each shifted several PB. The proposed additional traffic is a drop in the ocean. On my actual routers which shift much more data (over 300TB externally) with an uptime of ~180 days at the moment the CPU time consumed by keepalived is just under 31 minutes or about 8 seconds a day. These are much punier CPU's too. The proposed additional CPU usage is another drop in the ocean. Given Lenovo sold the *same* configuration with x3650's and SR650's the "need all the CPU grunt" is somewhat fishy. Between the bid being submitted and actual tender award the SR650's came out and we paid a bit extra to uplift to the newer server hardware with exactly the same disk configuration. I believe IBM have done the same with the ESS/GNR servers too over time the same applies there too. IMHO given keepalived is a base RHEL package, IBM/Lenovo should be offering running Infiniband to Ethernet gateways on the DSS/ESS nodes as a supported configuration for mixed network technology clusters :-) Running a couple extra servers for this purpose is obnoxious from an environmental standpoint. That's IBM's green credentials out the window if you ask me. I would note under those rules running a Nagios, Zabbix etc. client on the nodes is not permitted either. I would suggest that most sites would be rather unhappy about that :-) > I don't know if Lenovo have different point if view. > Problem is when I ring up for support on my DSS-G I speak to an IBM employee not a Lenovo one :-) JAB. -- Jonathan A. Buzzard Tel: +44141-5483420 HPC System Administrator, ARCHIE-WeSt. University of Strathclyde, John Anderson Building, Glasgow. G4 0NG From abeattie at au1.ibm.com Sat Oct 3 11:55:05 2020 From: abeattie at au1.ibm.com (Andrew Beattie) Date: Sat, 3 Oct 2020 10:55:05 +0000 Subject: [gpfsug-discuss] Services on DSS/ESS nodes In-Reply-To: <7af0ac41-a280-5ecd-3658-7af761a4bf9b@strath.ac.uk> Message-ID: Why do you need to run any kind of monitoring client on an IO server the GUI / performance monitor already does all of that work for you and collects the data on the dedicated EMS server. If you have a small storage environment the. Yes the processor and memory may feel like overkill, but tuned appropriately an IO server will use all the memory you can give it to drive IO performance, If you want to run a hybrid / non standard architecture then the IBM ESS / DGSS platform may not be the right platform in comparison to a build your own architecture, how ever you then take all the support issues onto your self rather than it being the vendors problem. Sent from my iPhone > On 3 Oct 2020, at 20:06, Jonathan Buzzard wrote: > > On 02/10/2020 23:19, Andrew Beattie wrote: >> Jonathan, >> I suggest you get a formal statement from Lenovo as the DSS-G Platform >> is no longer an IBM platform. >> >> But for ESS based platforms the answer would be, it is not supported to >> run anything on the IO Servers other than GNR and the relevant Scale >> management services, due to the fact that if you lose an IO Server, or >> if you in an extended maintenance window the Server needs to host all >> the work that would be being performed by both IO servers. >> > > In the past ~500 days the Infiniband to Ethernet gateway has shifted > ~13GB of data, or about 25MB a day. Meanwhile in the last 470 days the > DSS-G nodes have each shifted several PB. The proposed additional > traffic is a drop in the ocean. > > On my actual routers which shift much more data (over 300TB externally) > with an uptime of ~180 days at the moment the CPU time consumed by > keepalived is just under 31 minutes or about 8 seconds a day. These are > much punier CPU's too. The proposed additional CPU usage is another drop > in the ocean. > > Given Lenovo sold the *same* configuration with x3650's and SR650's the > "need all the CPU grunt" is somewhat fishy. Between the bid being > submitted and actual tender award the SR650's came out and we paid a bit > extra to uplift to the newer server hardware with exactly the same disk > configuration. I believe IBM have done the same with the ESS/GNR servers > too over time the same applies there too. > > IMHO given keepalived is a base RHEL package, IBM/Lenovo should be > offering running Infiniband to Ethernet gateways on the DSS/ESS nodes as > a supported configuration for mixed network technology clusters :-) > > Running a couple extra servers for this purpose is obnoxious from an > environmental standpoint. That's IBM's green credentials out the window > if you ask me. > > I would note under those rules running a Nagios, Zabbix etc. client on > the nodes is not permitted either. I would suggest that most sites would > be rather unhappy about that :-) > > >> I don't know if Lenovo have different point if view. >> > > Problem is when I ring up for support on my DSS-G I speak to an IBM > employee not a Lenovo one :-) > > > JAB. > > -- > Jonathan A. Buzzard Tel: +44141-5483420 > HPC System Administrator, ARCHIE-WeSt. > University of Strathclyde, John Anderson Building, Glasgow. G4 0NG > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: From luis.bolinches at fi.ibm.com Sat Oct 3 12:19:36 2020 From: luis.bolinches at fi.ibm.com (Luis Bolinches) Date: Sat, 3 Oct 2020 11:19:36 +0000 Subject: [gpfsug-discuss] Services on DSS/ESS nodes In-Reply-To: Message-ID: Are you mixing those ESS DSS in the same cluster? Or you are only running DSS https://www.ibm.com/support/knowledgecenter/SSYSP8/gnrfaq.html?view=kc#supportqs__building Mixing DSS and ESS in the same cluster is not a supported configuration. You really need to talk with Lenovo as is your vendor. The fact that in your region your support is being given by an IBMer or not is not a relevant point. High enough in the chain always will end at IBM on any region as GNR is IBM tech for 17 years (yes 17) so if weird enough even on regions where Lenovo might do even third level it might end on development and/or research. But that is a Lenovo/IBM agreement not you and IBM. So please get the support statement from Lenovo about this and pls share it if you want/can so we all learn their position. Thanks. -- Cheers > On 3. Oct 2020, at 13.55, Andrew Beattie wrote: > > ? > Why do you need to run any kind of monitoring client on an IO server the GUI / performance monitor already does all of that work for you and collects the data on the dedicated EMS server. > > If you have a small storage environment the. Yes the processor and memory may feel like overkill, but tuned appropriately an IO server will use all the memory you can give it to drive IO performance, > > If you want to run a hybrid / non standard architecture then the IBM ESS / DGSS platform may not be the right platform in comparison to a build your own architecture, how ever you then take all the support issues onto your self rather than it being the vendors problem. > > Sent from my iPhone > > > On 3 Oct 2020, at 20:06, Jonathan Buzzard wrote: > > > > On 02/10/2020 23:19, Andrew Beattie wrote: > >> Jonathan, > >> I suggest you get a formal statement from Lenovo as the DSS-G Platform > >> is no longer an IBM platform. > >> > >> But for ESS based platforms the answer would be, it is not supported to > >> run anything on the IO Servers other than GNR and the relevant Scale > >> management services, due to the fact that if you lose an IO Server, or > >> if you in an extended maintenance window the Server needs to host all > >> the work that would be being performed by both IO servers. > >> > > > > In the past ~500 days the Infiniband to Ethernet gateway has shifted > > ~13GB of data, or about 25MB a day. Meanwhile in the last 470 days the > > DSS-G nodes have each shifted several PB. The proposed additional > > traffic is a drop in the ocean. > > > > On my actual routers which shift much more data (over 300TB externally) > > with an uptime of ~180 days at the moment the CPU time consumed by > > keepalived is just under 31 minutes or about 8 seconds a day. These are > > much punier CPU's too. The proposed additional CPU usage is another drop > > in the ocean. > > > > Given Lenovo sold the *same* configuration with x3650's and SR650's the > > "need all the CPU grunt" is somewhat fishy. Between the bid being > > submitted and actual tender award the SR650's came out and we paid a bit > > extra to uplift to the newer server hardware with exactly the same disk > > configuration. I believe IBM have done the same with the ESS/GNR servers > > too over time the same applies there too. > > > > IMHO given keepalived is a base RHEL package, IBM/Lenovo should be > > offering running Infiniband to Ethernet gateways on the DSS/ESS nodes as > > a supported configuration for mixed network technology clusters :-) > > > > Running a couple extra servers for this purpose is obnoxious from an > > environmental standpoint. That's IBM's green credentials out the window > > if you ask me. > > > > I would note under those rules running a Nagios, Zabbix etc. client on > > the nodes is not permitted either. I would suggest that most sites would > > be rather unhappy about that :-) > > > > > >> I don't know if Lenovo have different point if view. > >> > > > > Problem is when I ring up for support on my DSS-G I speak to an IBM > > employee not a Lenovo one :-) > > > > > > JAB. > > > > -- > > Jonathan A. Buzzard Tel: +44141-5483420 > > HPC System Administrator, ARCHIE-WeSt. > > University of Strathclyde, John Anderson Building, Glasgow. G4 0NG > > _______________________________________________ > > gpfsug-discuss mailing list > > gpfsug-discuss at spectrumscale.org > > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > Ellei edell? ole toisin mainittu: / Unless stated otherwise above: Oy IBM Finland Ab PL 265, 00101 Helsinki, Finland Business ID, Y-tunnus: 0195876-3 Registered in Finland -------------- next part -------------- An HTML attachment was scrubbed... URL: From jonathan.buzzard at strath.ac.uk Sat Oct 3 18:16:33 2020 From: jonathan.buzzard at strath.ac.uk (Jonathan Buzzard) Date: Sat, 3 Oct 2020 18:16:33 +0100 Subject: [gpfsug-discuss] Services on DSS/ESS nodes In-Reply-To: References: Message-ID: On 03/10/2020 11:55, Andrew Beattie wrote: > Why do you need to run any kind of monitoring client on an IO server the > GUI / performance monitor already does all of that work for you and > collects the data on the dedicated EMS server. Because any remotely sensible admin demands a single pane service monitoring system. If I have to look at A for everything but my DSS-G and B for my DSS-G that's an epic fail. I often feel there is a huge disjuncture between the people that develop systems and those that look after them; they think the world revolves around them. It is clear this is one of those cases. > > If you have a small storage environment the. Yes the processor and > memory may feel like overkill, but tuned appropriately an IO server will > use all the memory you can give it to drive IO performance, Right but the SR650's came with not only more CPU but more RAM than the x3650's. In which case why only 192GB of RAM? The SR650's can take much more than that. Why not 384GB of RAM :-) Right now we have a shade over 50GB of RAM being unused. Been way for like ever because we naughtily have a influx DB client setup for a Grafana dashboard. We also presumably naughtily have remote syslog to Splunk. > > If you want to run a hybrid / non standard architecture then the IBM ESS > / DGSS platform may not be the right platform in comparison to a build > your own architecture, how ever you then take all the support issues > onto your self rather than it being the vendors problem. > I don't see anything that says you can't have some clients ethernet connected and some Infiniband connected. That of course requires a gateway, and the most logical place to put it is on the ESS or DSS nodes IMHO. I will see what Lenovo has to say, but looks like the IBM position is decidedly let's burn the planet, who gives a dam. JAB. -- Jonathan A. Buzzard Tel: +44141-5483420 HPC System Administrator, ARCHIE-WeSt. University of Strathclyde, John Anderson Building, Glasgow. G4 0NG From jonathan.buzzard at strath.ac.uk Sat Oct 3 18:16:39 2020 From: jonathan.buzzard at strath.ac.uk (Jonathan Buzzard) Date: Sat, 3 Oct 2020 18:16:39 +0100 Subject: [gpfsug-discuss] Services on DSS/ESS nodes In-Reply-To: References: Message-ID: On 03/10/2020 12:19, Luis Bolinches wrote: > Are you mixing those ESS DSS in the same cluster? Or you are only > running DSS > Only running DSS. We are too far down the rabbit hole to ever switch to ESS now. > > Mixing DSS and ESS in the same cluster is not a supported configuration. > I know, it means you can never ever migrate your storage from DSS to ESS without a full backup and restore. Who with any significant amount of storage is going to want to do that? The logic behind this escapes me, or perhaps in that scenario IBM might relax the rules for the migration period. > You really need to talk with Lenovo as is your vendor. The fact that in > your region your support is being given by an IBMer or not is not a > relevant point. High enough in the chain always will end at IBM on any > region as GNR is IBM tech for 17 years (yes 17) so if weird enough even > on regions where Lenovo might do even third level it might end on > development and/or research. But that is a Lenovo/IBM agreement not you > and IBM. > > So please get the support statement from Lenovo about this and pls share > it if you want/can so we all learn their position. > Will attempt that, though I do think it should be a supported config out the box :-) JAB. -- Jonathan A. Buzzard Tel: +44141-5483420 HPC System Administrator, ARCHIE-WeSt. University of Strathclyde, John Anderson Building, Glasgow. G4 0NG From luis.bolinches at fi.ibm.com Sun Oct 4 10:29:34 2020 From: luis.bolinches at fi.ibm.com (Luis Bolinches) Date: Sun, 4 Oct 2020 09:29:34 +0000 Subject: [gpfsug-discuss] Services on DSS/ESS nodes In-Reply-To: Message-ID: Hi As stated on the same link you can do remote mounts from each other and be a supported setup. ? You can use the remote mount feature of IBM Spectrum Scale to share file system data across clusters.? -- Cheers > On 3. Oct 2020, at 20.16, Jonathan Buzzard wrote: > > ?On 03/10/2020 12:19, Luis Bolinches wrote: >> Are you mixing those ESS DSS in the same cluster? Or you are only >> running DSS > > Only running DSS. We are too far down the rabbit hole to ever switch to > ESS now. > >> Mixing DSS and ESS in the same cluster is not a supported configuration. > > I know, it means you can never ever migrate your storage from DSS to ESS > without a full backup and restore. Who with any significant amount of > storage is going to want to do that? The logic behind this escapes me, > or perhaps in that scenario IBM might relax the rules for the migration > period. > > >> You really need to talk with Lenovo as is your vendor. The fact that in >> your region your support is being given by an IBMer or not is not a >> relevant point. High enough in the chain always will end at IBM on any >> region as GNR is IBM tech for 17 years (yes 17) so if weird enough even >> on regions where Lenovo might do even third level it might end on >> development and/or research. But that is a Lenovo/IBM agreement not you >> and IBM. >> So please get the support statement from Lenovo about this and pls share >> it if you want/can so we all learn their position. > > Will attempt that, though I do think it should be a supported config out > the box :-) > > > JAB. > > -- > Jonathan A. Buzzard Tel: +44141-5483420 > HPC System Administrator, ARCHIE-WeSt. > University of Strathclyde, John Anderson Building, Glasgow. G4 0NG > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss Ellei edell? ole toisin mainittu: / Unless stated otherwise above: Oy IBM Finland Ab PL 265, 00101 Helsinki, Finland Business ID, Y-tunnus: 0195876-3 Registered in Finland -------------- next part -------------- An HTML attachment was scrubbed... URL: From jonathan.buzzard at strath.ac.uk Sun Oct 4 11:17:30 2020 From: jonathan.buzzard at strath.ac.uk (Jonathan Buzzard) Date: Sun, 4 Oct 2020 11:17:30 +0100 Subject: [gpfsug-discuss] Services on DSS/ESS nodes In-Reply-To: References: Message-ID: <7ef58f7d-1c70-97d7-100d-395c403d6199@strath.ac.uk> On 04/10/2020 10:29, Luis Bolinches wrote: > Hi > > As stated on the same link you can do remote mounts from each other and > be a supported setup. > > ??You can use the remote mount feature of IBM Spectrum Scale to share > file system data across clusters.? > You can, but imagine I have a DSS-G cluster, with 2PB of storage on it which is quite modest in 2020. It is now end of life and for whatever reason I decide I want to move to ESS instead. What any sane storage admin want to do at this stage is set the ESS, add the ESS nodes to the existing cluster on the DSS-G then do a bit of mmadddisk/mmdeldisk and sit back while the data is seemlessly moved from the DSS-G to the ESS. Admittedly this might take a while :-) Then once all the data is moved a bit of mmdelnode and bingo the storage has been migrated from DSS-G to ESS with zero downtime. As that is not allowed for what I presume are commercial reasons (you could do it in reverse and presumable that is what IBM don't want) then once you are down the rabbit hole of one type of storage the you are not going to switch to a different one. You need to look at it from the perspective of the users. They frankly could not give a monkeys what storage solution you are using. All they care about is having usable storage and large amounts of downtime to switch from one storage type to another is not really acceptable. JAB. -- Jonathan A. Buzzard Tel: +44141-5483420 HPC System Administrator, ARCHIE-WeSt. University of Strathclyde, John Anderson Building, Glasgow. G4 0NG From olaf.weiser at de.ibm.com Mon Oct 5 07:19:40 2020 From: olaf.weiser at de.ibm.com (Olaf Weiser) Date: Mon, 5 Oct 2020 06:19:40 +0000 Subject: [gpfsug-discuss] Services on DSS/ESS nodes In-Reply-To: <7ef58f7d-1c70-97d7-100d-395c403d6199@strath.ac.uk> References: <7ef58f7d-1c70-97d7-100d-395c403d6199@strath.ac.uk>, Message-ID: An HTML attachment was scrubbed... URL: From jordi.caubet at es.ibm.com Mon Oct 5 07:27:39 2020 From: jordi.caubet at es.ibm.com (Jordi Caubet Serrabou) Date: Mon, 5 Oct 2020 06:27:39 +0000 Subject: [gpfsug-discuss] Services on DSS/ESS nodes In-Reply-To: Message-ID: ?Coming to the routing point, is there any reason why you need it ? I mean, this is because GPFS trying to connect between compute nodes or a reason outside GPFS scope ? If the reason is GPFS, imho best approach - without knowledge of the licensing you have - would be to use separate clusters: a storage cluster and two compute clusters. Both compute clusters join using multicluster setup the storage cluster. There is no need both compute clusters see each other, they only need to see the storage cluster. One of the clusters using the 10G, the other cluster using the IPoIB interface. You need at least three quorum nodes in each compute cluster but if licensing is per drive on the DSS, it is covered. -- Jordi Caubet Serrabou IBM Software Defined Infrastructure (SDI) and Flash Technical Sales Specialist Technical Computing and HPC IT Specialist and Architect Ext. Phone: (+34) 679.79.17.84 (internal 55834) E-mail: jordi.caubet at es.ibm.com > On 5 Oct 2020, at 08:19, Olaf Weiser wrote: > > ? > let me add a few comments from some very successful large installations in Eruope > > # InterOP > Even though (as Luis pointed to) , there is no support statement to run intermix DSS/ESS in general, it was ~, and is, and will be, ~ allowed for short term purposes, such as e.g migration. > The reason to not support those DSS/ESS mixed configuration in general is simply driven by the fact, that different release version of DSS/ESS potentially (not in every release, but sometimes) comes with different driver levels, (e.g. MOFED), OS, RDMA-settings, GPFS tuning, etc... > Those changes can have an impact/multiple impacts and therefore, we do not support that in general. Of course -and this would be the advice for every one - if you are faced the need to run a mixed configuration for e.g. a migration and/or e.g. cause of you need to temporary provide space etc... contact you IBM representative and settle to plan that accordingly.. > There will be (likely) some additional requirements/dependencies defined like driver versions, OS, and/or Scale versions, but you'll get a chance to run mixed configuration - temporary limited to your specific scenario. > > # Monitoring > No doubt, monitoring is essential and absolutely needed. - and/but - IBM wants customers to be very sensitive, what kind of additional software (=workload) gets installed on the ESS-IO servers. BTW, this rule applies as well to any other important GPFS node with special roles (e.g. any other NSD server etc) > But given the fact, that customer's usually manage and monitor their server farms from a central point of control (any 3rd party software), it is common/ best practice , that additionally monitor software(clients/endpoints) has to run on GPFS nodes, so as on ESS nodes too. > > If that way of acceptance applies for DSS too, you may want to double check with Lenovo ?! > > > #additionally GW functions > It would be a hot iron, to general allow routing on IO nodes. Similar to the mixed support approach, the field variety for such a statement would be hard(==impossible) to manage. As we all agree, additional network traffic can (and in fact will) impact GPFS. > In your special case, the expected data rates seems to me more than ok and acceptable to go with your suggested config (as long workloads remain on that level / monitor it accordingly as you are already obviously doing) > Again,to be on the safe side.. contact your IBM representative and I'm sure you 'll find a way.. > > > > kind regards.... > olaf > > > ----- Original message ----- > From: Jonathan Buzzard > Sent by: gpfsug-discuss-bounces at spectrumscale.org > To: gpfsug-discuss at spectrumscale.org > Cc: > Subject: [EXTERNAL] Re: [gpfsug-discuss] Services on DSS/ESS nodes > Date: Sun, Oct 4, 2020 12:17 PM > > On 04/10/2020 10:29, Luis Bolinches wrote: > > Hi > > > > As stated on the same link you can do remote mounts from each other and > > be a supported setup. > > > > ? You can use the remote mount feature of IBM Spectrum Scale to share > > file system data across clusters.? > > > > You can, but imagine I have a DSS-G cluster, with 2PB of storage on it > which is quite modest in 2020. It is now end of life and for whatever > reason I decide I want to move to ESS instead. > > What any sane storage admin want to do at this stage is set the ESS, add > the ESS nodes to the existing cluster on the DSS-G then do a bit of > mmadddisk/mmdeldisk and sit back while the data is seemlessly moved from > the DSS-G to the ESS. Admittedly this might take a while :-) > > Then once all the data is moved a bit of mmdelnode and bingo the storage > has been migrated from DSS-G to ESS with zero downtime. > > As that is not allowed for what I presume are commercial reasons (you > could do it in reverse and presumable that is what IBM don't want) then > once you are down the rabbit hole of one type of storage the you are not > going to switch to a different one. > > You need to look at it from the perspective of the users. They frankly > could not give a monkeys what storage solution you are using. All they > care about is having usable storage and large amounts of downtime to > switch from one storage type to another is not really acceptable. > > > JAB. > > -- > Jonathan A. Buzzard Tel: +44141-5483420 > HPC System Administrator, ARCHIE-WeSt. > University of Strathclyde, John Anderson Building, Glasgow. G4 0NG > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > Salvo indicado de otro modo m?s arriba / Unless stated otherwise above: International Business Machines, S.A. Santa Hortensia, 26-28, 28002 Madrid Registro Mercantil de Madrid; Folio 1; Tomo 1525; Hoja M-28146 CIF A28-010791 -------------- next part -------------- An HTML attachment was scrubbed... URL: From S.J.Thompson at bham.ac.uk Mon Oct 5 09:40:56 2020 From: S.J.Thompson at bham.ac.uk (Simon Thompson) Date: Mon, 5 Oct 2020 08:40:56 +0000 Subject: [gpfsug-discuss] Services on DSS/ESS nodes In-Reply-To: <38bbcd24-4cc2-2abe-754f-3017d71c1eaa@strath.ac.uk> References: <38bbcd24-4cc2-2abe-754f-3017d71c1eaa@strath.ac.uk> Message-ID: <73AEC0BB-C67B-4863-A44F-6D32FDA56FEB@bham.ac.uk> > I now need to check IBM are not going to throw a wobbler down the line > if I need to get support before deploying it to the DSS-G nodes :-) I know there were a lot of other emails about this ... I think you maybe want to be careful doing this. Whilst it might work when you setup the DSS-G like this, remember that the memory usage you are seeing at this point in time may not be what you always need. For example if you fail-over the recovery groups, you need to have enough free memory to handle this. E.g. a node failure, or more likely you are upgrading the building blocks. Personally I wouldn't run other things like this on my DSS-G storage nodes. We do run e.g. nrpe monitoring to collect and report faults, but this is pretty lightweight compared to everything else. They even removed support for running the gui packages on the IO nodes - the early DSS-G builds used the IO nodes for this, but now you need separate systems for this. Simon From jonathan.buzzard at strath.ac.uk Mon Oct 5 12:44:48 2020 From: jonathan.buzzard at strath.ac.uk (Jonathan Buzzard) Date: Mon, 5 Oct 2020 12:44:48 +0100 Subject: [gpfsug-discuss] Services on DSS/ESS nodes In-Reply-To: <73AEC0BB-C67B-4863-A44F-6D32FDA56FEB@bham.ac.uk> References: <38bbcd24-4cc2-2abe-754f-3017d71c1eaa@strath.ac.uk> <73AEC0BB-C67B-4863-A44F-6D32FDA56FEB@bham.ac.uk> Message-ID: <905a0bdb-b6a1-90e4-bf57-ed8edae6fb7c@strath.ac.uk> On 05/10/2020 07:27, Jordi Caubet Serrabou wrote: > ?Coming to the routing point, is there any reason why you need it ? I > mean, this is because GPFS trying to connect between compute nodes or > a reason outside GPFS scope ? > If the reason is GPFS, imho best approach - without knowledge of the > licensing you have - would be to use separate clusters: a storage > cluster and two compute clusters. The issue is that individual nodes want to talk to one another on the data interface. Which caught me by surprise as the cluster is set to admin mode central. The admin interface runs over ethernet for all nodes on a specific VLAN which which is given 802.1p priority 5 (that's Voice, < 10 ms latency and jitter). That saved a bunch of switching and cabling as you don't need the extra interface for the admin traffic. The cabling already significantly restricts airflow for a compute rack as it is, without adding a whole bunch more for a barely used admin interface. It's like the people who wrote the best practice about separate interface for the admin traffic know very little about networking to be frankly honest. This is all last century technology. The nodes for undergraduate teaching only have a couple of 1Gb ethernet ports which would suck for storage usage. However they also have QDR Infiniband. That is because even though undergraduates can't run multinode jobs, on the old cluster the Lustre storage was delivered over Infiniband, so they got Infiniband cards. > Both compute clusters join using multicluster setup the storage > cluster. There is no need both compute clusters see each other, they > only need to see the storage cluster. One of the clusters using the > 10G, the other cluster using the IPoIB interface. > You need at least three quorum nodes in each compute cluster but if > licensing is per drive on the DSS, it is covered. Three clusters is starting to get complicated from an admin perspective. The biggest issue is coordinating maintenance and keep sufficient quorum nodes up. Maintenance on compute nodes is done via the job scheduler. I know some people think this is crazy, but it is in reality extremely elegant. We can schedule a reboot on a node as soon as the current job has finished (usually used for firmware upgrades). Or we can schedule a job to run as root (usually for applying updates) as soon as the current job has finished. As such we have no way of knowing when that will be for a given node, and there is a potential for all three quorum nodes to be down at once. Using this scheme we can seamlessly upgrade the nodes safe in the knowledge that a node is either busy and it's running on the current configuration or it has been upgraded and is running the new configuration. Consequently multinode jobs are guaranteed to have all nodes in the job running on the same configuration. The alternative is to drain the node, but there is only a 23% chance the node will become available during working hours leading to a significant loss of compute time when doing maintenance compared to our existing scheme where the loss of compute time is only as long as the upgrade takes to install. Pretty much the only time we have idle nodes is when the scheduler is reserving nodes ready to schedule a multi node job. Right now we have a single cluster with the quorum nodes being the two DSS-G nodes and the node used for backup. It is easy to ensure that quorum is maintained on these, they also all run real RHEL, where as the compute nodes run CentOS. JAB. -- Jonathan A. Buzzard Tel: +44141-5483420 HPC System Administrator, ARCHIE-WeSt. University of Strathclyde, John Anderson Building, Glasgow. G4 0NG From carlz at us.ibm.com Mon Oct 5 13:09:02 2020 From: carlz at us.ibm.com (Carl Zetie - carlz@us.ibm.com) Date: Mon, 5 Oct 2020 12:09:02 +0000 Subject: [gpfsug-discuss] Services on DSS/ESS nodes Message-ID: <714B599F-D06D-4D03-98F3-A2BF6F7360DB@us.ibm.com> Jordi wrote: ?Both compute clusters join using multicluster setup the storage cluster. There is no need both compute clusters see each other, they only need to see the storage cluster. One of the clusters using the 10G, the other cluster using the IPoIB interface. You need at least three quorum nodes in each compute cluster but if licensing is per drive on the DSS, it is covered.? As a side note: One of the reasons we designed capacity (per Disk or per TB) licensing the way we did was specifically so that you could make this kind of architectural decision on its own merits, without worrying about a licensing penalty. Carl Zetie Program Director Offering Management Spectrum Scale ---- (919) 473 3318 ][ Research Triangle Park carlz at us.ibm.com [signature_1243111775] -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image001.png Type: image/png Size: 69558 bytes Desc: image001.png URL: From carlz at us.ibm.com Mon Oct 5 13:20:25 2020 From: carlz at us.ibm.com (Carl Zetie - carlz@us.ibm.com) Date: Mon, 5 Oct 2020 12:20:25 +0000 Subject: [gpfsug-discuss] Services on DSS/ESS nodes Message-ID: <288C3527-32BA-43E2-B5EF-E79CC5765424@us.ibm.com> >> Mixing DSS and ESS in the same cluster is not a supported configuration. > > I know, it means you can never ever migrate your storage from DSS to ESS > without a full backup and restore. Who with any significant amount of > storage is going to want to do that? The logic behind this escapes me, > or perhaps in that scenario IBM might relax the rules for the migration > period. > We do indeed relax the rules temporarily for a migration. The reasoning behind this rule is for support. Many Scale support issues - often the toughest ones - are not about a single node, but about the cluster or network as a whole. So if you have a mix of IBM systems with systems supported by an OEM (this applies to any OEM by the way, not just Lenovo) and a cluster-wide issue, who are you going to call. (Well, in practice you?re going to call IBM and we?ll do our best to help you despite limits on our knowledge of the OEM systems?). --CZ Carl Zetie Program Director Offering Management Spectrum Scale ---- (919) 473 3318 ][ Research Triangle Park carlz at us.ibm.com [signature_386371469] -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image001.png Type: image/png Size: 69558 bytes Desc: image001.png URL: From jonathan.buzzard at strath.ac.uk Mon Oct 5 14:39:12 2020 From: jonathan.buzzard at strath.ac.uk (Jonathan Buzzard) Date: Mon, 5 Oct 2020 14:39:12 +0100 Subject: [gpfsug-discuss] Services on DSS/ESS nodes In-Reply-To: <73AEC0BB-C67B-4863-A44F-6D32FDA56FEB@bham.ac.uk> References: <38bbcd24-4cc2-2abe-754f-3017d71c1eaa@strath.ac.uk> <73AEC0BB-C67B-4863-A44F-6D32FDA56FEB@bham.ac.uk> Message-ID: On 05/10/2020 09:40, Simon Thompson wrote: >> I now need to check IBM are not going to throw a wobbler down the >> line if I need to get support before deploying it to the DSS-G >> nodes :-) > > I know there were a lot of other emails about this ... > > I think you maybe want to be careful doing this. Whilst it might work > when you setup the DSS-G like this, remember that the memory usage > you are seeing at this point in time may not be what you always need. > For example if you fail-over the recovery groups, you need to have > enough free memory to handle this. E.g. a node failure, or more > likely you are upgrading the building blocks. I think there is a lack of understanding on exactly how light weight keepalived is. It's the same code as on my routers which are admittedly different CPU's (MIPS to be precise) but memory usage (taking out shared memory usage - libc for example is loaded anyway) is under 200KB. A bash shell uses more memory... > > Personally I wouldn't run other things like this on my DSS-G storage > nodes. We do run e.g. nrpe monitoring to collect and report faults, > but this is pretty lightweight compared to everything else. They even > removed support for running the gui packages on the IO nodes - the > early DSS-G builds used the IO nodes for this, but now you need > separate systems for this. > And keepalived is in the same range as nrpe, which you do run :-) I have seen nrpe get out of hand and consume significant amounts of resources on a machine; the machine was ground to halt due to nrpe. One of the standard plugins was failing and sitting their busy waiting. Every five minutes it ran again. It of course decided to wait till ~7pm on a Friday to go wonky. By mid morning on Saturday it was virtually unresponsive, several minutes to get a shell... I would note that you can run keepalived quite happily on an Ubiquiti EdgeRouter X which has a dual core 880 MHz MIPS CPU with 256MB of RAM. Mikrotik have models with similar specs that run it too. On a dual Xeon Gold 6142 machine the usage of RAM and CPU by keepalived is noise. JAB. -- Jonathan A. Buzzard Tel: +44141-5483420 HPC System Administrator, ARCHIE-WeSt. University of Strathclyde, John Anderson Building, Glasgow. G4 0NG From committee at io500.org Thu Oct 1 17:40:00 2020 From: committee at io500.org (committee at io500.org) Date: Thu, 01 Oct 2020 10:40:00 -0600 Subject: [gpfsug-discuss] IO500 SC20 Call for Submission Message-ID: <4a20ed6ae985a25c69d953e1ea633d62@io500.org> CALL FOR IO500 SUBMISSION Deadline: 30 October 2020 AoE Stabilization period: 1st October -- 9th October 2020 AoE The IO500 [1] is now accepting and encouraging submissions for the upcoming 7th IO500 list, to be revealed at the IO500 Virtual BOF during SC20. Once again, we are also accepting submissions to the 10 Node I/O Challenge to encourage submission of small scale results. The new ranked lists will be announced at our Virtual SC20 BoF. We hope to see you, and your results, there. A new change for the upcoming submission procedure is the introduction of a stabilization period that aims to harden the benchmark. The final benchmark is released at the end of this period. During the stabilization we encourage the community to test the proper execution of the benchmark and provide us with feedback. We will apply bug fixes to the code base and expect that results obtained will be valid as full submission. We also continue with another list for the Student Cluster Competition, since IO500 is used during this competition. Also new this year is that we have partnered with Anthony Kougkas' team at Illinois Institute of Technology to evaluate the submission metadata describing the storage system on which the test was run to improve the quality and usefulness of the data IO500 collects. You may be contacted by one of his students to clarify one or more of the metadata items from your submission(s). We would appreciate, but do not require, your cooperation to help improve the submission metadata quality. Results from their work will be fed back to improve our submission process for future lists. The IO500 benchmark suite is designed to be easy to run, and the community has multiple active support channels to help with any questions. Please submit results from your system, and we look forward to seeing many of you at SC20! Please note that submissions of all sizes are welcome, including multiple submissions from different storage systems/tiers at a single site. The website has customizable sorting so it is possible to submit on a small system and still get a very good per-client score, for example. Additionally, the list is about much more than just the raw rank; all submissions help the community by collecting and publishing a wider corpus of data. More details below. Following the success of the Top500 in collecting and analyzing historical trends in supercomputer technology and evolution, the IO500 [1] was created in 2017, published its first list at SC17, and has grown continuously since then. The need for such an initiative has long been known within High-Performance Computing; however, defining appropriate benchmarks had long been challenging. Despite this challenge, the community, after long and spirited discussion, finally reached consensus on a suite of benchmarks and a metric for resolving the scores into a single ranking. The multi-fold goals of the benchmark suite are as follows: * Maximizing simplicity in running the benchmark suite * Encouraging complexity in tuning for performance * Allowing submitters to highlight their "hero run" performance numbers * Forcing submitters to simultaneously report performance for challenging IO patterns. Specifically, the benchmark suite includes a hero-run of both IOR and mdtest configured however possible to maximize performance and establish an upper-bound for performance. It also includes an IOR and mdtest run with highly prescribed parameters in an attempt to determine a lower-bound on the performance. Finally, it includes a namespace search, as this has been determined to be a highly sought-after feature in HPC storage systems that have historically not been well-measured. Submitters are encouraged to share their tuning insights for publication. The goals of the community are also multi-fold: * Gather historical data for the sake of analysis and to aid predictions of storage futures * Collect tuning information to share valuable performance optimizations across the community * Encourage vendors and designers to optimize for workloads beyond "hero runs" * Establish bounded expectations for users, procurers, and administrators 10 NODE I/O CHALLENGE The 10 Node Challenge is conducted using the regular IO500 benchmark, however, with the rule that exactly 10 client nodes must be used to run the benchmark. You may use any shared storage with, e.g., any number of servers. When submitting for the IO500 list, you can opt-in for "Participate in the 10 compute node challenge only", then we will not include the results into the ranked list. Other 10-node node submissions will be included in the full list and in the ranked list. We will announce the result in a separate derived list and in the full list but not on the ranked IO500 list at https://io500.org/ [2] BIRDS-OF-A-FEATHER Once again, we encourage you to submit [1], to join our community, and to attend our virtual BoF "The IO500 and the Virtual Institute of I/O" at SC20, where we will announce the new IO500 list, the 10 node challenge list, and the Student Cluster Competition list. We look forward to answering any questions or concerns you might have. * [1] http://www.vi4io.org/io500/submission [3] Thanks, The IO500 Committee Links: ------ [1] http://io500.org/ [2] https://io500.org/ [3] http://www.vi4io.org/io500/submission -------------- next part -------------- An HTML attachment was scrubbed... URL: From valdis.kletnieks at vt.edu Wed Oct 7 00:45:46 2020 From: valdis.kletnieks at vt.edu (Valdis Kl=?utf-8?Q?=c4=93?=tnieks) Date: Tue, 06 Oct 2020 19:45:46 -0400 Subject: [gpfsug-discuss] Services on DSS/ESS nodes In-Reply-To: References: Message-ID: <138651.1602027946@turing-police> On Sat, 03 Oct 2020 10:55:05 -0000, "Andrew Beattie" said: > Why do you need to run any kind of monitoring client on an IO server the > GUI / performance monitor already does all of that work for you and > collects the data on the dedicated EMS server. Does *ALL* that work for me? Will it toss you an alert if your sshd goes away, or if somebody's tossing packets that iptables is blocking for good reasons, or any of the many other things that a competent sysadmin wants to be alerted on that aren't GPFS, but which are things that Nagios and Zabbix and similar tools were invented to track? -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 832 bytes Desc: not available URL: From S.J.Thompson at bham.ac.uk Wed Oct 7 11:28:55 2020 From: S.J.Thompson at bham.ac.uk (Simon Thompson) Date: Wed, 7 Oct 2020 10:28:55 +0000 Subject: [gpfsug-discuss] Services on DSS/ESS nodes In-Reply-To: <138651.1602027946@turing-police> References: <138651.1602027946@turing-police> Message-ID: Agreed ... Report to me a pdisk is failing in my monitoring dashboard we use for *everything else*. Tell me that kswapd is having one of those days. Tell me rsyslogd has stopped sending for some reason. Tell me if there are long waiters on the hosts. Read the ipmi status of the host to tell me an OS drive is failed, or the CMOS battery is flat or ... Whilst the GUI has a bunch of this stuff, in the real world the rest of us have reporting and dashboarding from many more systems... Simon ?On 07/10/2020, 00:45, "gpfsug-discuss-bounces at spectrumscale.org on behalf of Valdis Kl?tnieks" wrote: On Sat, 03 Oct 2020 10:55:05 -0000, "Andrew Beattie" said: > Why do you need to run any kind of monitoring client on an IO server the > GUI / performance monitor already does all of that work for you and > collects the data on the dedicated EMS server. Does *ALL* that work for me? Will it toss you an alert if your sshd goes away, or if somebody's tossing packets that iptables is blocking for good reasons, or any of the many other things that a competent sysadmin wants to be alerted on that aren't GPFS, but which are things that Nagios and Zabbix and similar tools were invented to track? From jonathan.buzzard at strath.ac.uk Wed Oct 7 13:14:45 2020 From: jonathan.buzzard at strath.ac.uk (Jonathan Buzzard) Date: Wed, 7 Oct 2020 13:14:45 +0100 Subject: [gpfsug-discuss] Services on DSS/ESS nodes In-Reply-To: References: <138651.1602027946@turing-police> Message-ID: On 07/10/2020 11:28, Simon Thompson wrote: > Agreed ... > > Report to me a pdisk is failing in my monitoring dashboard we use for *everything else*. > Tell me that kswapd is having one of those days. > Tell me rsyslogd has stopped sending for some reason. > Tell me if there are long waiters on the hosts. > Read the ipmi status of the host to tell me an OS drive is failed, or the CMOS battery is flat or ... > > Whilst the GUI has a bunch of this stuff, in the real world the rest of us have reporting and dashboarding from many more systems... > The problem is the developers know as much about looking after a system in the real world as a tea leaf knows the history of the East India Company. IMHO to even ask the question shows a total lack of understanding of the issue. Consequently developers in their ivory towers have a habit of developing things that are as useful as a chocolate tea pot. Which putting it bluntly a competent sysadmins makes them look like a bunch of twits. I would note this is not a problem unique to IBM, it's developers in general. The appropriate course of action would be not for IBM to develop a monitoring tool of their own but to provide a bunch of plugins for the popular monitoring tools that customers will already be using to monitor their whole IT estate. Heaven forbid they could even run a poll to find out which ones the actual customers of their products are interested in rather than wasting effort developing software their customers are not actually interested in. For my purposes there is I think an alternative. The actual routing of the IP packets is not a service, it's a kernel configuration to have the kernel route that packets :-) Keepalived just manages a floating IP address. There are other options to achieve this. They are clunkier but they side step IBM's silly rules. I would however note at this point that at lots of sites all routing in the data centre is done using BGP. It comes in part out of the zero trust paradigm. I guess apparently running fail2ban is not permitted either. Can I even run firewalld? As you can seen a nothing else policy quickly becomes unsustainable IMHO. There is a disjuncture between the developers in their ivory towers and the real world. JAB. -- Jonathan A. Buzzard Tel: +44141-5483420 HPC System Administrator, ARCHIE-WeSt. University of Strathclyde, John Anderson Building, Glasgow. G4 0NG From kkr at lbl.gov Tue Oct 13 22:34:23 2020 From: kkr at lbl.gov (Kristy Kallback-Rose) Date: Tue, 13 Oct 2020 14:34:23 -0700 Subject: [gpfsug-discuss] SC20 Planning - What questions would you ask a panel? Message-ID: Hi all, By now you know SC will be digital this year. We are working towards some SC events for the Spectrum Scale User Group, and using our usual slot of Sunday did not seem like a great idea. So, we?re planning a couple 90-minute sessions and would like to do a panel during one of them. We?ll hope to do live Q&A, like an in-person Ask Me Anything session, but it?s probably a good idea to have a bank of questions ready as well, Also, that way your question may get asked, even if you can?t make the live session ?we?ll record these sessions for later viewing. So, please send your questions for the panel and we?ll get a list built up. Better yet, attend the sessions live! Details to come, but for now, hold these time slots: November 16th - 8:00 AM Pacific/3:00 PM UTC November 18th - 8:00 AM Pacific/3:00 PM UTC Best, Kristy Kristy Kallback-Rose Senior HPC Storage Systems Analyst National Energy Research Scientific Computing Center Lawrence Berkeley National Laboratory -------------- next part -------------- An HTML attachment was scrubbed... URL: From juergen.hannappel at desy.de Wed Oct 21 17:13:01 2020 From: juergen.hannappel at desy.de (Hannappel, Juergen) Date: Wed, 21 Oct 2020 18:13:01 +0200 (CEST) Subject: [gpfsug-discuss] Mounting an nfs share on a CES node Message-ID: <1195503772.13156505.1603296781279.JavaMail.zimbra@desy.de> Hi, I have a CES node exporting some filesystems vis smb and ganesha in a standard CES setup. Now I want to mount a nfs share from a different, non-CES server on this CES node. This did not work: mount -o -fstype=nfs4,minorversion=1,rw,rsize=65536,wsize=65536 some.other.server:/some/path /mnt/ mount.nfs: mount to NFS server 'some.other.server:/some/path' failed: RPC Error: Program unavailable Does the CES software stack interfere with the nfs client setup? It seems that at least with rpc-statd there is some conflict: systemctl status rpc-statd ? rpc-statd.service - NFS status monitor for NFSv2/3 locking. Loaded: loaded (/usr/lib/systemd/system/rpc-statd.service; static; vendor preset: disabled) Active: failed (Result: exit-code) since Wed 2020-10-21 17:48:21 CEST; 22min ago Process: 19896 ExecStart=/usr/sbin/rpc.statd $STATDARGS (code=exited, status=1/FAILURE) Oct 21 17:48:21 mynode systemd[1]: Starting NFS status monitor for NFSv2/3 locking.... Oct 21 17:48:21 mynode rpc.statd[19896]: Statd service already running! Oct 21 17:48:21 mynode systemd[1]: rpc-statd.service: control process exited, code=exited status=1 Oct 21 17:48:21 mynode systemd[1]: Failed to start NFS status monitor for NFSv2/3 locking.. Oct 21 17:48:21 mynode systemd[1]: Unit rpc-statd.service entered failed state. Oct 21 17:48:21 mynode systemd[1]: rpc-statd.service failed. -- Dr. J?rgen Hannappel DESY/IT Tel. : +49 40 8998-4616 From mnaineni at in.ibm.com Thu Oct 22 04:38:59 2020 From: mnaineni at in.ibm.com (Malahal R Naineni) Date: Thu, 22 Oct 2020 03:38:59 +0000 Subject: [gpfsug-discuss] Mounting an nfs share on a CES node In-Reply-To: <1195503772.13156505.1603296781279.JavaMail.zimbra@desy.de> References: <1195503772.13156505.1603296781279.JavaMail.zimbra@desy.de> Message-ID: An HTML attachment was scrubbed... URL: From andi at christiansen.xxx Tue Oct 27 11:46:02 2020 From: andi at christiansen.xxx (Andi Christiansen) Date: Tue, 27 Oct 2020 12:46:02 +0100 (CET) Subject: [gpfsug-discuss] Alternative to Scale S3 API. Message-ID: <1109480230.484366.1603799162955@privateemail.com> Hi all, We have over a longer period used the S3 API within spectrum Scale.. And that has shown that it does not support very many applications because of limitations of the API.. Has anyone got any experience with any other product we can deploy on-top of Spectrum Scale that will give us a true S3 API with full functionalities and able to answer on port 443? As of now we use HAProxy to forware ssl request back and forth from Scale S3 API. We have looked at MinIO which seems to be fairly simple and maybe might solve a lot of incompatibilities with clients software. But the product seems to be very badly documented at least for me. The idea is basically that a client uses their backup application(rubrik, veeam etc.) to connect to a domain(for example backup.mycompany.com) with their access and secret key and have access to their bucket only. and it must be over https/ssl. If someone has any knowledge to minio or any other product that might solve our problem I will be glad to hear from you! ? Thank you in advance! -------------- next part -------------- An HTML attachment was scrubbed... URL: From NISHAAN at za.ibm.com Tue Oct 27 13:38:01 2020 From: NISHAAN at za.ibm.com (Nishaan Docrat) Date: Tue, 27 Oct 2020 15:38:01 +0200 Subject: [gpfsug-discuss] Alternative to Scale S3 API. In-Reply-To: <1109480230.484366.1603799162955@privateemail.com> References: <1109480230.484366.1603799162955@privateemail.com> Message-ID: Hi Andi The current S3 compatibility in Spectrum Scale is delivered via the Swift3 middleware. This middleware has since been replaced by s3api in later versions of Swift. Spectrum Scale 5.1 will make use of Swift Train release which will include the new s3api middleware. I've tested the S3 compatibility with a few applications including Spectrum Scale itself (i.e. Cloud Data Sharing to another Scale Object store using S3 API) and Spectrum Protect etc. and haven't had any issue. I've also ran a few application tools to test for an S3 compliant object stores and again had no issues. You can use s3compat to test the current compatibility.. Or you can check here for the current compatibility.. https://docs.openstack.org/swift/latest/s3_compat.html Not sure if there is any other way to talk HTTPS without using HAProxy. In any case, I've documented the process to setup an S3 compliant object store including supporting virtual-hosted style bucket addressing which you can find here.. https://www.linkedin.com/feed/update/urn:li:activity:6720227398756909056/ Most storage vendors including minio would not support the full S3 API stack as alot of the calls are specific to AWS (like billing stuff etc.). Anyway, good luck with your testing. Kind Regards Nishaan Docrat Client Technical Specialist - Storage Systems IBM Systems Hardware Work: +27 (0)11 302 5001 Mobile: +27 (0)81 040 3793 Email: nishaan at za.ibm.com From: Andi Christiansen To: "gpfsug-discuss at spectrumscale.org" Date: 2020/10/27 13:59 Subject: [EXTERNAL] [gpfsug-discuss] Alternative to Scale S3 API. Sent by: gpfsug-discuss-bounces at spectrumscale.org Hi all, We have over a longer period used the S3 API within spectrum Scale.. And that has shown that it does not support very many applications because of limitations of the API.. Has anyone got any experience with any other product we can deploy on-top of Spectrum Scale that will give us a true S3 API with full functionalities and able to answer on port 443? As of now we use HAProxy to forware ssl request back and forth from Scale S3 API. We have looked at MinIO which seems to be fairly simple and maybe might solve a lot of incompatibilities with clients software. But the product seems to be very badly documented at least for me. The idea is basically that a client uses their backup application(rubrik, veeam etc.) to connect to a domain(for example backup.mycompany.com) with their access and secret key and have access to their bucket only. and it must be over https/ssl. If someone has any knowledge to minio or any other product that might solve our problem I will be glad to hear from you! ? Thank you in advance! _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: 52733301.jpg Type: image/jpeg Size: 35643 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: From andi at christiansen.xxx Wed Oct 28 06:24:52 2020 From: andi at christiansen.xxx (Andi Christiansen) Date: Wed, 28 Oct 2020 07:24:52 +0100 (CET) Subject: [gpfsug-discuss] Alternative to Scale S3 API. In-Reply-To: References: <1109480230.484366.1603799162955@privateemail.com> Message-ID: <2126571944.509878.1603866292369@privateemail.com> Hi Nishaan, Thanks for you reply. When you say 5.1? is that 5.1.x.x or 5.0.5.1? Some of the limitations we have encountered is the multipart upload not being supported and some md5sum that the s3 api does that veeam actually dont like. also interms of the management on the Scale GUI, that has to be on one of the S3 CES nodes in order to be able to show project, container etc... but when you have a HAProxy for enabling SSL then a GUI is not available as they both use port 443? i know min.io is not the full stack of S3 API commands but as far as i can read it comes with more features out of the box than Scale S3 does, multipart for an example... I looked through your documentation and its very close to what we have set up today and found to not work... If multipart uploads would be supported today on scale S3 i would think about still using scale for the s3 part but as i expect that you talk about 5.1.x.x i dont see that being released any time soon? and dont know if that is actually going to be supported in that release then i cant wait for that to happen.. Best Regards Andi Christiansen > On 10/27/2020 2:38 PM Nishaan Docrat wrote: > > > > Hi Andi > > The current S3 compatibility in Spectrum Scale is delivered via the Swift3 middleware. This middleware has since been replaced by s3api in later versions of Swift. Spectrum Scale 5.1 will make use of Swift Train release which will include the new s3api middleware. > > I've tested the S3 compatibility with a few applications including Spectrum Scale itself (i.e. Cloud Data Sharing to another Scale Object store using S3 API) and Spectrum Protect etc. and haven't had any issue. I've also ran a few application tools to test for an S3 compliant object stores and again had no issues. > > You can use s3compat to test the current compatibility.. Or you can check here for the current compatibility.. https://docs.openstack.org/swift/latest/s3_compat.html https://docs.openstack.org/swift/latest/s3_compat.html > > Not sure if there is any other way to talk HTTPS without using HAProxy. > > In any case, I've documented the process to setup an S3 compliant object store including supporting virtual-hosted style bucket addressing which you can find here.. > > https://www.linkedin.com/feed/update/urn:li:activity:6720227398756909056/ https://www.linkedin.com/feed/update/urn:li:activity:6720227398756909056/ > > Most storage vendors including minio would not support the full S3 API stack as alot of the calls are specific to AWS (like billing stuff etc.). > > Anyway, good luck with your testing. > > Kind Regards > > Nishaan Docrat > Client Technical Specialist - Storage Systems > IBM Systems Hardware > > Work: +27 (0)11 302 5001 > Mobile: +27 (0)81 040 3793 > Email: nishaan at za.ibm.com http://www.ibm.com/storage > > > > [Inactive hide details for Andi Christiansen ---2020/10/27 13:59:30---Hi all, We have over a longer period used the S3 API withi]Andi Christiansen ---2020/10/27 13:59:30---Hi all, We have over a longer period used the S3 API within spectrum Scale.. And that has shown that > > From: Andi Christiansen > To: "gpfsug-discuss at spectrumscale.org" > Date: 2020/10/27 13:59 > Subject: [EXTERNAL] [gpfsug-discuss] Alternative to Scale S3 API. > Sent by: gpfsug-discuss-bounces at spectrumscale.org > > > --------------------------------------------- > > > > Hi all, > > > > We have over a longer period used the S3 API within spectrum Scale.. And that has shown that it does not support very many applications because of limitations of the API.. > > > > Has anyone got any experience with any other product we can deploy on-top of Spectrum Scale that will give us a true S3 API with full functionalities and able to answer on port 443? As of now we use HAProxy to forware ssl request back and forth from Scale S3 API. > > > > We have looked at MinIO which seems to be fairly simple and maybe might solve a lot of incompatibilities with clients software. But the product seems to be very badly documented at least for me. > > The idea is basically that a client uses their backup application(rubrik, veeam etc.) to connect to a domain(for example backup.mycompany.com) with their access and secret key and have access to their bucket only. and it must be over https/ssl. > > > > If someone has any knowledge to minio or any other product that might solve our problem I will be glad to hear from you! ? > > Thank you in advance! > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: 52733301.jpg Type: image/jpeg Size: 35643 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: From NISHAAN at za.ibm.com Wed Oct 28 06:45:29 2020 From: NISHAAN at za.ibm.com (Nishaan Docrat) Date: Wed, 28 Oct 2020 08:45:29 +0200 Subject: [gpfsug-discuss] Alternative to Scale S3 API. In-Reply-To: <2126571944.509878.1603866292369@privateemail.com> References: <1109480230.484366.1603799162955@privateemail.com> <2126571944.509878.1603866292369@privateemail.com> Message-ID: Hi Andi The s3api middleware does support multipart uploads.. https://docs.openstack.org/swift/latest/s3_compat.html The current version of Swift (PIKE) that is bundled with Spectrum Scale 5.0.X doesn't.. https://www.ibm.com/support/knowledgecenter/en/STXKQY_5.0.5/com.ibm.spectrum.scale.v5r05.doc/bl1adm_ManagingOpenStackACLsviaAmazonS3API.htm According to the Spectrum Scale Roadmap, 5.1 is due out 2H20.. Not sure if someone from development can confirm the GA date. Does Veeam have a test utility? You could always test it using the current Swift AIO or if you can provide me with a test utility I can test that for you. Kind Regards Nishaan Docrat Client Technical Specialist - Storage Systems IBM Systems Hardware Work: +27 (0)11 302 5001 Mobile: +27 (0)81 040 3793 Email: nishaan at za.ibm.com From: Andi Christiansen To: gpfsug main discussion list , Nishaan Docrat Date: 2020/10/28 08:24 Subject: [EXTERNAL] Re: [gpfsug-discuss] Alternative to Scale S3 API. Hi Nishaan, Thanks for you reply. When you say 5.1? is that 5.1.x.x or 5.0.5.1? Some... This Message Is From an External Sender This message came from outside your organization. Hi Nishaan, Thanks for you reply. When you say 5.1? is that 5.1.x.x or 5.0.5.1? Some of the limitations we have encountered is the multipart upload not being supported and some md5sum that the s3 api does that veeam actually dont like. also interms of the management on the Scale GUI, that has to be on one of the S3 CES nodes in order to be able to show project, container etc... but when you have a HAProxy for enabling SSL then a GUI is not available as they both use port 443? i know min.io is not the full stack of S3 API commands but as far as i can read it comes with more features out of the box than Scale S3 does, multipart for an example... I looked through your documentation and its very close to what we have set up today and found to not work... If multipart uploads would be supported today on scale S3 i would think about still using scale for the s3 part but as i expect that you talk about 5.1.x.x i dont see that being released any time soon? and dont know if that is actually going to be supported in that release then i cant wait for that to happen.. Best Regards Andi Christiansen On 10/27/2020 2:38 PM Nishaan Docrat wrote: Hi Andi The current S3 compatibility in Spectrum Scale is delivered via the Swift3 middleware. This middleware has since been replaced by s3api in later versions of Swift. Spectrum Scale 5.1 will make use of Swift Train release which will include the new s3api middleware. I've tested the S3 compatibility with a few applications including Spectrum Scale itself (i.e. Cloud Data Sharing to another Scale Object store using S3 API) and Spectrum Protect etc. and haven't had any issue. I've also ran a few application tools to test for an S3 compliant object stores and again had no issues. You can use s3compat to test the current compatibility.. Or you can check here for the current compatibility.. https://docs.openstack.org/swift/latest/s3_compat.html Not sure if there is any other way to talk HTTPS without using HAProxy. In any case, I've documented the process to setup an S3 compliant object store including supporting virtual-hosted style bucket addressing which you can find here.. https://www.linkedin.com/feed/update/urn:li:activity:6720227398756909056/ Most storage vendors including minio would not support the full S3 API stack as alot of the calls are specific to AWS (like billing stuff etc.). Anyway, good luck with your testing. Kind Regards Nishaan Docrat Client Technical Specialist - Storage Systems IBM Systems Hardware Work: +27 (0)11 302 5001 Mobile: +27 (0)81 040 3793 Email: nishaan at za.ibm.com Inactive hide details for Andi Christiansen ---2020/10/27 13:59:30---Hi all, We have over a longer period used the S3 API withi Andi Christiansen ---2020/10/27 13:59:30---Hi all, We have over a longer period used the S3 API within spectrum Scale.. And that has shown that From: Andi Christiansen To: "gpfsug-discuss at spectrumscale.org" Date: 2020/10/27 13:59 Subject: [EXTERNAL] [gpfsug-discuss] Alternative to Scale S3 API. Sent by: gpfsug-discuss-bounces at spectrumscale.org Hi all, We have over a longer period used the S3 API within spectrum Scale.. And that has shown that it does not support very many applications because of limitations of the API.. Has anyone got any experience with any other product we can deploy on-top of Spectrum Scale that will give us a true S3 API with full functionalities and able to answer on port 443? As of now we use HAProxy to forware ssl request back and forth from Scale S3 API. We have looked at MinIO which seems to be fairly simple and maybe might solve a lot of incompatibilities with clients software. But the product seems to be very badly documented at least for me. The idea is basically that a client uses their backup application (rubrik, veeam etc.) to connect to a domain(for example backup.mycompany.com) with their access and secret key and have access to their bucket only. and it must be over https/ssl. If someone has any knowledge to minio or any other product that might solve our problem I will be glad to hear from you! ? Thank you in advance! _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: 19991351.jpg Type: image/jpeg Size: 35643 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: From NISHAAN at za.ibm.com Wed Oct 28 07:12:55 2020 From: NISHAAN at za.ibm.com (Nishaan Docrat) Date: Wed, 28 Oct 2020 09:12:55 +0200 Subject: [gpfsug-discuss] Alternative to Scale S3 API. In-Reply-To: <2126571944.509878.1603866292369@privateemail.com> References: <1109480230.484366.1603799162955@privateemail.com> <2126571944.509878.1603866292369@privateemail.com> Message-ID: Hi Andi Sorry forgot to mention that I was told 5.1 will include the Swift Train release (2.23). The change from swift3 middleware to s3api was done in the Queens release (2.18) so 5.1 will definitely have multipart support. Kind Regards Nishaan Docrat Client Technical Specialist - Storage Systems IBM Systems Hardware Work: +27 (0)11 302 5001 Mobile: +27 (0)81 040 3793 Email: nishaan at za.ibm.com From: Andi Christiansen To: gpfsug main discussion list , Nishaan Docrat Date: 2020/10/28 08:24 Subject: [EXTERNAL] Re: [gpfsug-discuss] Alternative to Scale S3 API. Hi Nishaan, Thanks for you reply. When you say 5.1? is that 5.1.x.x or 5.0.5.1? Some... This Message Is From an External Sender This message came from outside your organization. Hi Nishaan, Thanks for you reply. When you say 5.1? is that 5.1.x.x or 5.0.5.1? Some of the limitations we have encountered is the multipart upload not being supported and some md5sum that the s3 api does that veeam actually dont like. also interms of the management on the Scale GUI, that has to be on one of the S3 CES nodes in order to be able to show project, container etc... but when you have a HAProxy for enabling SSL then a GUI is not available as they both use port 443? i know min.io is not the full stack of S3 API commands but as far as i can read it comes with more features out of the box than Scale S3 does, multipart for an example... I looked through your documentation and its very close to what we have set up today and found to not work... If multipart uploads would be supported today on scale S3 i would think about still using scale for the s3 part but as i expect that you talk about 5.1.x.x i dont see that being released any time soon? and dont know if that is actually going to be supported in that release then i cant wait for that to happen.. Best Regards Andi Christiansen On 10/27/2020 2:38 PM Nishaan Docrat wrote: Hi Andi The current S3 compatibility in Spectrum Scale is delivered via the Swift3 middleware. This middleware has since been replaced by s3api in later versions of Swift. Spectrum Scale 5.1 will make use of Swift Train release which will include the new s3api middleware. I've tested the S3 compatibility with a few applications including Spectrum Scale itself (i.e. Cloud Data Sharing to another Scale Object store using S3 API) and Spectrum Protect etc. and haven't had any issue. I've also ran a few application tools to test for an S3 compliant object stores and again had no issues. You can use s3compat to test the current compatibility.. Or you can check here for the current compatibility.. https://docs.openstack.org/swift/latest/s3_compat.html Not sure if there is any other way to talk HTTPS without using HAProxy. In any case, I've documented the process to setup an S3 compliant object store including supporting virtual-hosted style bucket addressing which you can find here.. https://www.linkedin.com/feed/update/urn:li:activity:6720227398756909056/ Most storage vendors including minio would not support the full S3 API stack as alot of the calls are specific to AWS (like billing stuff etc.). Anyway, good luck with your testing. Kind Regards Nishaan Docrat Client Technical Specialist - Storage Systems IBM Systems Hardware Work: +27 (0)11 302 5001 Mobile: +27 (0)81 040 3793 Email: nishaan at za.ibm.com Inactive hide details for Andi Christiansen ---2020/10/27 13:59:30---Hi all, We have over a longer period used the S3 API withi Andi Christiansen ---2020/10/27 13:59:30---Hi all, We have over a longer period used the S3 API within spectrum Scale.. And that has shown that From: Andi Christiansen To: "gpfsug-discuss at spectrumscale.org" Date: 2020/10/27 13:59 Subject: [EXTERNAL] [gpfsug-discuss] Alternative to Scale S3 API. Sent by: gpfsug-discuss-bounces at spectrumscale.org Hi all, We have over a longer period used the S3 API within spectrum Scale.. And that has shown that it does not support very many applications because of limitations of the API.. Has anyone got any experience with any other product we can deploy on-top of Spectrum Scale that will give us a true S3 API with full functionalities and able to answer on port 443? As of now we use HAProxy to forware ssl request back and forth from Scale S3 API. We have looked at MinIO which seems to be fairly simple and maybe might solve a lot of incompatibilities with clients software. But the product seems to be very badly documented at least for me. The idea is basically that a client uses their backup application (rubrik, veeam etc.) to connect to a domain(for example backup.mycompany.com) with their access and secret key and have access to their bucket only. and it must be over https/ssl. If someone has any knowledge to minio or any other product that might solve our problem I will be glad to hear from you! ? Thank you in advance! _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: 17810834.jpg Type: image/jpeg Size: 35643 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: From luis.bolinches at fi.ibm.com Wed Oct 28 07:15:21 2020 From: luis.bolinches at fi.ibm.com (Luis Bolinches) Date: Wed, 28 Oct 2020 07:15:21 +0000 Subject: [gpfsug-discuss] Alternative to Scale S3 API. In-Reply-To: Message-ID: An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: Image.1__=CDBB0C9CDFB04CE98f9e8a93df938690918cCDB at .jpg Type: image/jpeg Size: 35643 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: Image.2__=CDBB0C9CDFB04CE98f9e8a93df938690918cCDB at .gif Type: image/gif Size: 105 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: Image.1__=CDBB0C9CDFB04CE98f9e8a93df938690918cCDB at .jpg Type: image/jpeg Size: 35643 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: Image.2__=CDBB0C9CDFB04CE98f9e8a93df938690918cCDB at .gif Type: image/gif Size: 105 bytes Desc: not available URL: From NISHAAN at za.ibm.com Wed Oct 28 07:45:45 2020 From: NISHAAN at za.ibm.com (Nishaan Docrat) Date: Wed, 28 Oct 2020 09:45:45 +0200 Subject: [gpfsug-discuss] Alternative to Scale S3 API. In-Reply-To: References: Message-ID: Hi Luis Thanks for your reply.. It should address Andi's issue as the underlying Swift version is what is important and the functionality he needs is in the latest releases (I was told 5.1 includes Swift Train which is the latest version). Am sure there is a beta program for Spectrum Scale.. Perhaps Andi should speak to his software sales rep and ask to be included on it to get access so that he can test. Kind Regards Nishaan Docrat Client Technical Specialist - Storage Systems IBM Systems Hardware Work: +27 (0)11 302 5001 Mobile: +27 (0)81 040 3793 Email: nishaan at za.ibm.com From: "Luis Bolinches" To: gpfsug-discuss at spectrumscale.org Cc: gpfsug-discuss at spectrumscale.org Date: 2020/10/28 09:29 Subject: [EXTERNAL] Re: [gpfsug-discuss] Alternative to Scale S3 API. Sent by: gpfsug-discuss-bounces at spectrumscale.org Hi 5.1.x is going GA very soon (TM). Would it address the issues Andi sees on his environment or not I cannot say. I can take it with Andi for more details on the GA date -- Yst?v?llisin terveisin / Kind regards / Saludos cordiales / Salutations / Salutacions Luis Bolinches Consultant IT Specialist IBM Spectrum Scale development Mobile Phone: +358503112585 https://www.youracclaim.com/user/luis-bolinches Ab IBM Finland Oy Laajalahdentie 23 00330 Helsinki Uusimaa - Finland "If you always give you will always have" -- Anonymous ----- Original message ----- From: "Nishaan Docrat" Sent by: gpfsug-discuss-bounces at spectrumscale.org To: Andi Christiansen Cc: gpfsug main discussion list Subject: [EXTERNAL] Re: [gpfsug-discuss] Alternative to Scale S3 API. Date: Wed, Oct 28, 2020 08:47 Hi Andi The s3api middleware does support multipart uploads.. https://docs.openstack.org/swift/latest/s3_compat.html The current version of Swift (PIKE) that is bundled with Spectrum Scale 5.0.X doesn't.. https://www.ibm.com/support/knowledgecenter/en/STXKQY_5.0.5/com.ibm.spectrum.scale.v5r05.doc/bl1adm_ManagingOpenStackACLsviaAmazonS3API.htm According to the Spectrum Scale Roadmap, 5.1 is due out 2H20.. Not sure if someone from development can confirm the GA date. Does Veeam have a test utility? You could always test it using the current Swift AIO or if you can provide me with a test utility I can test that for you. Kind Regards Nishaan Docrat Client Technical Specialist - Storage Systems IBM Systems Hardware Work: +27 (0)11 302 5001 Mobile: +27 (0)81 040 3793 Email: nishaan at za.ibm.com Inactive hide details for Andi Christiansen ---2020/10/28 08:24:55---Hi Nishaan, Thanks for you reply.Andi Christiansen ---2020/10/28 08:24:55---Hi Nishaan, Thanks for you reply. From: Andi Christiansen To: gpfsug main discussion list , Nishaan Docrat Date: 2020/10/28 08:24 Subject: [EXTERNAL] Re: [gpfsug-discuss] Alternative to Scale S3 API. Hi Nishaan, Thanks for you reply. When you say 5.1? is that 5.1.x.x or 5.0.5.1? Some... This Message Is From an External Sender This message came from outside your organization. Hi Nishaan, Thanks for you reply. When you say 5.1? is that 5.1.x.x or 5.0.5.1? Some of the limitations we have encountered is the multipart upload not being supported and some md5sum that the s3 api does that veeam actually dont like. also interms of the management on the Scale GUI, that has to be on one of the S3 CES nodes in order to be able to show project, container etc... but when you have a HAProxy for enabling SSL then a GUI is not available as they both use port 443? i know min.io is not the full stack of S3 API commands but as far as i can read it comes with more features out of the box than Scale S3 does, multipart for an example... I looked through your documentation and its very close to what we have set up today and found to not work... If multipart uploads would be supported today on scale S3 i would think about still using scale for the s3 part but as i expect that you talk about 5.1.x.x i dont see that being released any time soon? and dont know if that is actually going to be supported in that release then i cant wait for that to happen.. Best Regards Andi Christiansen On 10/27/2020 2:38 PM Nishaan Docrat wrote: Hi Andi The current S3 compatibility in Spectrum Scale is delivered via the Swift3 middleware. This middleware has since been replaced by s3api in later versions of Swift. Spectrum Scale 5.1 will make use of Swift Train release which will include the new s3api middleware. I've tested the S3 compatibility with a few applications including Spectrum Scale itself (i.e. Cloud Data Sharing to another Scale Object store using S3 API) and Spectrum Protect etc. and haven't had any issue. I've also ran a few application tools to test for an S3 compliant object stores and again had no issues. You can use s3compat to test the current compatibility.. Or you can check here for the current compatibility.. https://docs.openstack.org/swift/latest/s3_compat.html Not sure if there is any other way to talk HTTPS without using HAProxy. In any case, I've documented the process to setup an S3 compliant object store including supporting virtual-hosted style bucket addressing which you can find here.. https://www.linkedin.com/feed/update/urn:li:activity:6720227398756909056/ Most storage vendors including minio would not support the full S3 API stack as alot of the calls are specific to AWS (like billing stuff etc.). Anyway, good luck with your testing. Kind Regards Nishaan Docrat Client Technical Specialist - Storage Systems IBM Systems Hardware Work: +27 (0)11 302 5001 Mobile: +27 (0)81 040 3793 Email: nishaan at za.ibm.com Inactive hide details for Andi Christiansen ---2020/10/27 13:59:30---Hi all, We have over a longer period used the S3 API withiAndi Christiansen ---2020/10/27 13:59:30---Hi all, We have over a longer period used the S3 API within spectrum Scale.. And that has shown that From: Andi Christiansen To: "gpfsug-discuss at spectrumscale.org" Date: 2020/10/27 13:59 Subject: [EXTERNAL] [gpfsug-discuss] Alternative to Scale S3 API. Sent by: gpfsug-discuss-bounces at spectrumscale.org Hi all, We have over a longer period used the S3 API within spectrum Scale.. And that has shown that it does not support very many applications because of limitations of the API.. Has anyone got any experience with any other product we can deploy on-top of Spectrum Scale that will give us a true S3 API with full functionalities and able to answer on port 443? As of now we use HAProxy to forware ssl request back and forth from Scale S3 API. We have looked at MinIO which seems to be fairly simple and maybe might solve a lot of incompatibilities with clients software. But the product seems to be very badly documented at least for me. The idea is basically that a client uses their backup application(rubrik, veeam etc.) to connect to a domain(for example backup.mycompany.com) with their access and secret key and have access to their bucket only. and it must be over https/ssl. If someone has any knowledge to minio or any other product that might solve our problem I will be glad to hear from you! ? Thank you in advance! _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss Ellei edell? ole toisin mainittu: / Unless stated otherwise above: Oy IBM Finland Ab PL 265, 00101 Helsinki, Finland Business ID, Y-tunnus: 0195876-3 Registered in Finland _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: 16781831.jpg Type: image/jpeg Size: 35643 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: From luis.bolinches at fi.ibm.com Wed Oct 28 07:51:30 2020 From: luis.bolinches at fi.ibm.com (Luis Bolinches) Date: Wed, 28 Oct 2020 07:51:30 +0000 Subject: [gpfsug-discuss] Alternative to Scale S3 API. In-Reply-To: References: , Message-ID: An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: Image.1__=CDBB0C9CDFB9D9C48f9e8a93df938690918cCDB at .jpg Type: image/jpeg Size: 35643 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: Image.2__=CDBB0C9CDFB9D9C48f9e8a93df938690918cCDB at .gif Type: image/gif Size: 105 bytes Desc: not available URL: From Robert.Oesterlin at nuance.com Thu Oct 29 11:16:13 2020 From: Robert.Oesterlin at nuance.com (Oesterlin, Robert) Date: Thu, 29 Oct 2020 11:16:13 +0000 Subject: [gpfsug-discuss] SSUG Digital Expert Talk: 11/4 - AI workloads on NVIDIA DGX, Red Hat OpenShift and IBM Spectrum Scale Message-ID: <77EA43ED-C430-42CA-872E-D2307F244775@nuance.com> Reminder for our upcoming expert talk: SSUG::Digital: Scalable multi-node training for AI workloads on NVIDIA DGX, Red Hat OpenShift and IBM Spectrum Scale November 4 @ 16:15 - 17:45 GMT Nvidia and IBM did a complex proof-of-concept to demonstrate the scaling of AI workload using Nvidia DGX, Red Hat OpenShift and IBM Spectrum Scale at the example of ResNet-50 and the segmentation of images using the Audi A2D2 dataset. The project team published an IBM Redpaper with all the technical details and will present the key learnings and results. Registration link for Webex session: https://www.spectrumscaleug.org/event/ssugdigital-multi-node-training-for-ai-workloads/ Bob Oesterlin Sr Principal Storage Engineer, Nuance -------------- next part -------------- An HTML attachment was scrubbed... URL: From kkr at lbl.gov Thu Oct 29 21:43:02 2020 From: kkr at lbl.gov (Kristy Kallback-Rose) Date: Thu, 29 Oct 2020 14:43:02 -0700 Subject: [gpfsug-discuss] SC20 Planning - What questions would you ask a panel? In-Reply-To: References: Message-ID: <3D8238FB-F8A5-48F1-BA5C-57AC93DCDE35@lbl.gov> Really? There?s nothing you want to ask about GPFS/Spectrum Scale? There will be access to developers and management alike, so I have to imagine you have something to ask? Don?t be shy. Please help make this a lively discussion by submitting a question, or two. Best, Kristy > On Oct 13, 2020, at 2:34 PM, Kristy Kallback-Rose wrote: > > Hi all, > > By now you know SC will be digital this year. We are working towards some SC events for the Spectrum Scale User Group, and using our usual slot of Sunday did not seem like a great idea. So, we?re planning a couple 90-minute sessions and would like to do a panel during one of them. We?ll hope to do live Q&A, like an in-person Ask Me Anything session, but it?s probably a good idea to have a bank of questions ready as well, Also, that way your question may get asked, even if you can?t make the live session ?we?ll record these sessions for later viewing. > > So, please send your questions for the panel and we?ll get a list built up. Better yet, attend the sessions live! Details to come, but for now, hold these time slots: > > November 16th - 8:00 AM Pacific/3:00 PM UTC > > November 18th - 8:00 AM Pacific/3:00 PM UTC > > Best, > Kristy > > Kristy Kallback-Rose > Senior HPC Storage Systems Analyst > National Energy Research Scientific Computing Center > Lawrence Berkeley National Laboratory > -------------- next part -------------- An HTML attachment was scrubbed... URL: From kkr at lbl.gov Thu Oct 29 21:49:34 2020 From: kkr at lbl.gov (Kristy Kallback-Rose) Date: Thu, 29 Oct 2020 14:49:34 -0700 Subject: [gpfsug-discuss] SC20 Sessions - Dates and times are settled, please join us! Message-ID: <8BECE369-B5B4-404F-B4C0-07EE02DE6295@lbl.gov> Hi all, The Spectrum Scale User Group will be hosting two 90 minute sessions at SC20 this year and we hope you can join us. The first one is: "Storage for AI" and will be held Monday, Nov. 16th, from 11:00-12:30 EST and the second one is "What's new in Spectrum Scale 5.1?" and will be held Wednesday, Nov. 18th from 11:00-12:30 EST. Please see the calendar at https://www.spectrumscaleug.org/eventslist/2020-11/ and register by clicking on a session on the calendar and then the "Please register here to join the session" link. Best, Kristy Kristy Kallback-Rose Senior HPC Storage Systems Analyst National Energy Research Scientific Computing Center Lawrence Berkeley National Laboratory From heinrich.billich at id.ethz.ch Fri Oct 30 12:21:58 2020 From: heinrich.billich at id.ethz.ch (Billich Heinrich Rainer (ID SD)) Date: Fri, 30 Oct 2020 12:21:58 +0000 Subject: [gpfsug-discuss] 'ganesha_mgr display_export - client not listed Message-ID: <660DD807-C723-44EF-BC51-57EFB296FFC4@id.ethz.ch> Hello, Some nfsv4 client of ganesha does not show up in the output of 'ganesha_mgr display_export'. The client has an active mount, but also shows some nfs issues, some commands did hang, the process just stays in state D (uninterruptible sleep) according to 'ps', but not the whole mount. I just wonder if the client's IP should always show up in the output of display_export once the client did issue a mount call and if the absence indicates that something is broken. Gutr,gut, Putting it the other way round: When is a client listed in the output of display_export and when is it removed from the list? We do collect more debug data, this is just something that catched my eye. Thank you, Heiner We run ganesha 2.7.5-ibm058.05 on a spectrum scale system on RedHat 7.7. I crosspost to the gpfsug mailing list. -- ======================= Heinrich Billich ETH Z?rich Informatikdienste Tel.: +41 44 632 72 56 heinrich.billich at id.ethz.ch ======================== # ganesha_mgr display_export 37 Display export with id 37 export 37: path = /xxxx/yyy, pseudo = /xxx/yyy , tag = /xxx/yyy Client type, CIDR version, CIDR address, CIDR mask, CIDR proto, Anonymous UID, Anonymous GID, Attribute timeout, Options, Set a.b.c.198/32, 0, 0, 255, 1, 4294967294, 4294967294, 0, 1126195680, 1081209831 a.b.c.143/32, 0, 0, 255, 1, 4294967294, 4294967294, 0, 1126195680, 1081209831 a.b.c.236/32, 0, 0, 255, 1, 4294967294, 4294967294, 0, 1126195680, 1081209831 a.b.c.34/32, 0, 0, 255, 1, 4294967294, 4294967294, 0, 1126195680, 1081209831 a.b.c.70/32, 0, 0, 255, 1, 4294967294, 4294967294, 0, 1126195680, 1081209831 a.b.c.71/32, 0, 0, 255, 1, 4294967294, 4294967294, 0, 1126195680, 1081209831 *, 0, 0, 0, 0, 4294967294, 4294967294, 0, 1126187490, 1081209831 From skylar2 at uw.edu Fri Oct 30 14:01:37 2020 From: skylar2 at uw.edu (Skylar Thompson) Date: Fri, 30 Oct 2020 07:01:37 -0700 Subject: [gpfsug-discuss] SC20 Planning - What questions would you ask a panel? In-Reply-To: <3D8238FB-F8A5-48F1-BA5C-57AC93DCDE35@lbl.gov> References: <3D8238FB-F8A5-48F1-BA5C-57AC93DCDE35@lbl.gov> Message-ID: <20201030140137.hakhxwppcmaoixy6@thargelion> Here's one: How is IBM working to improve the integration between TSM and GPFS? We're in the biomedical space and have some overlapping regulatory requirements around retention, which translate to complicated INCLUDE/EXCLUDE rules that mmbackup has always had trouble processing. In particular, we need to be able to INCLUDE particular paths to set a management class, but then EXCLUDE particular paths, which results in mmbackup generating file lists for dsmc including those excluded paths, which dsmc can exclude but it logs every single one every time it runs. On Thu, Oct 29, 2020 at 02:43:02PM -0700, Kristy Kallback-Rose wrote: > Really? There???s nothing you want to ask about GPFS/Spectrum Scale? There will be access to developers and management alike, so I have to imagine you have something to ask??? Don???t be shy. > > Please help make this a lively discussion by submitting a question, or two. > > Best, > Kristy > > > On Oct 13, 2020, at 2:34 PM, Kristy Kallback-Rose wrote: > > > > Hi all, > > > > By now you know SC will be digital this year. We are working towards some SC events for the Spectrum Scale User Group, and using our usual slot of Sunday did not seem like a great idea. So, we???re planning a couple 90-minute sessions and would like to do a panel during one of them. We???ll hope to do live Q&A, like an in-person Ask Me Anything session, but it???s probably a good idea to have a bank of questions ready as well, Also, that way your question may get asked, even if you can???t make the live session ???we???ll record these sessions for later viewing. > > > > So, please send your questions for the panel and we???ll get a list built up. Better yet, attend the sessions live! Details to come, but for now, hold these time slots: > > > > November 16th - 8:00 AM Pacific/3:00 PM UTC > > > > November 18th - 8:00 AM Pacific/3:00 PM UTC > > > > Best, > > Kristy > > > > Kristy Kallback-Rose > > Senior HPC Storage Systems Analyst > > National Energy Research Scientific Computing Center > > Lawrence Berkeley National Laboratory > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss -- -- Skylar Thompson (skylar2 at u.washington.edu) -- Genome Sciences Department (UW Medicine), System Administrator -- Foege Building S046, (206)-685-7354 -- Pronouns: He/Him/His From cblack at nygenome.org Fri Oct 30 14:19:24 2020 From: cblack at nygenome.org (Christopher Black) Date: Fri, 30 Oct 2020 14:19:24 +0000 Subject: [gpfsug-discuss] SC20 Sessions - Dates and times are settled, please join us! In-Reply-To: <8BECE369-B5B4-404F-B4C0-07EE02DE6295@lbl.gov> References: <8BECE369-B5B4-404F-B4C0-07EE02DE6295@lbl.gov> Message-ID: <62E7471D-02B9-4C27-B0F0-4038CCB2C66E@nygenome.org> Could you talk about upcoming work to address excessive prefetch when reading small fractions of many large files? Some bioinformatics workloads have a client node reading relatively small regions of multiple 50GB+ files. We've seen this trigger excessive prefetch bandwidth (especially on 16MB block filesystem). Investigation shows that much of the prefetched data is never read, but cache gets full, evicts blocks, then more prefetch happens. We can avoid this by turning prefetch off, but that reduces speed of other workloads that read full files sequentially. Turning prefetch on and off based on job won't work well for our users. We've heard this would be addressed in gpfs 5.1 at the earliest and have provided an example workload to devs. They've done some great analysis and determined the problem is worse on large (16M) block filesystems (which are now the recommended and default on new ess filesystems with sub-block allocation enabled). Best, Chris ?On 10/29/20, 5:49 PM, "gpfsug-discuss-bounces at spectrumscale.org on behalf of Kristy Kallback-Rose" wrote: Hi all, The Spectrum Scale User Group will be hosting two 90 minute sessions at SC20 this year and we hope you can join us. The first one is: "Storage for AI" and will be held Monday, Nov. 16th, from 11:00-12:30 EST and the second one is "What's new in Spectrum Scale 5.1?" and will be held Wednesday, Nov. 18th from 11:00-12:30 EST. Please see the calendar at https://urldefense.com/v3/__https://www.spectrumscaleug.org/eventslist/2020-11/__;!!C6sPl7C9qQ!G0wT65UH3HoMnjBM6_ZAVfZwWwJz5SoLE5gpB_LM0N8SNSU3TXItF31dfxG_8Pow$ and register by clicking on a session on the calendar and then the "Please register here to join the session" link. Best, Kristy Kristy Kallback-Rose Senior HPC Storage Systems Analyst National Energy Research Scientific Computing Center Lawrence Berkeley National Laboratory _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://urldefense.com/v3/__http://gpfsug.org/mailman/listinfo/gpfsug-discuss__;!!C6sPl7C9qQ!G0wT65UH3HoMnjBM6_ZAVfZwWwJz5SoLE5gpB_LM0N8SNSU3TXItF31df0lybvoA$ ________________________________ This message is for the recipient?s use only, and may contain confidential, privileged or protected information. Any unauthorized use or dissemination of this communication is prohibited. If you received this message in error, please immediately notify the sender and destroy all copies of this message. The recipient should check this email and any attachments for the presence of viruses, as we accept no liability for any damage caused by any virus transmitted by this email.