From jonathan.buzzard at strath.ac.uk  Fri Oct  2 17:14:12 2020
From: jonathan.buzzard at strath.ac.uk (Jonathan Buzzard)
Date: Fri, 2 Oct 2020 17:14:12 +0100
Subject: [gpfsug-discuss] Services on DSS/ESS nodes
Message-ID: <38bbcd24-4cc2-2abe-754f-3017d71c1eaa@strath.ac.uk>


What if any are the rules around running additional services on DSS/ESS 
nodes with regard to support? Let me outline our scenario

Our main cluster uses 10Gbps ethernet for storage with the DSS-G nodes 
hooked up with redundant 40Gbps ethernet.

However we have an older cluster that is used for undergraduate teaching 
that only has 1Gbps ethernet and QDR Infiniband. With no money to 
upgrade this to 10Gbps ethernet to support this we flipped one of the 
ports on the ConnectX4 cards on each DSS-G node to Infiniband and using 
IPoIB run the teaching nodes in this way.

However it means that we need an Ethernet to Infiniband gateway as the 
ethernet only connected nodes want to talk to the Infiniband connected 
ones on their Infiniband address. Not a problem we grabbed an old spare 
machine installed CentOS and configured it up to act as a bridge, and 
deploy a custom route to all the ethernet only connected nodes. It has 
been working fine for a couple of years now.

The problem is that this becomes firstly a single point of failure, on 
hardware that is six years old now. Secondly to apply updates on the 
gateway machine means all the teaching nodes have to be drained and GPFS 
umounted to reboot the machine after updates have been installed. It is 
currently not getting patched as frequently as I would like (and 
required by the Scottish government).

So thinking about it I have come to the conclusion that the ideal 
situation would be to use the DSS-G nodes as the gateway and run 
keepalived to move the gateway ethernet IP address between the two 
machines. It is idea because as long as one DSS-G node is up then there 
is a functioning gateway and nodes don't get ejected from the cluster. 
If both DSS-G nodes are down then there is no GPFS to mount anyway and 
lack of a gateway is a moot point.

I grabbed a couple of the teaching compute nodes in the summer and 
trialed it out. It works a treat.

I now need to check IBM are not going to throw a wobbler down the line 
if I need to get support before deploying it to the DSS-G nodes :-)


JAB.

-- 
Jonathan A. Buzzard                         Tel: +44141-5483420
HPC System Administrator, ARCHIE-WeSt.
University of Strathclyde, John Anderson Building, Glasgow. G4 0NG


From abeattie at au1.ibm.com  Fri Oct  2 23:19:15 2020
From: abeattie at au1.ibm.com (Andrew Beattie)
Date: Fri, 2 Oct 2020 22:19:15 +0000
Subject: [gpfsug-discuss] Services on DSS/ESS nodes
In-Reply-To: <38bbcd24-4cc2-2abe-754f-3017d71c1eaa@strath.ac.uk>
Message-ID: <OF777393BE.20C1535F-ON002585F5.007A9C47-1601677155073@notes.na.collabserv.com>


Jonathan,
I suggest you get a formal statement from Lenovo as the DSS-G Platform is
no longer an IBM platform.

But for ESS based platforms the answer would be, it is not supported to run
anything on the IO Servers other than GNR and the relevant Scale management
services, due to the fact that if you lose an IO Server, or if you in an
extended maintenance window the Server needs to host all the work that
would be being performed by both IO servers.

I don't know if Lenovo have different point if view.

Regards,

Andrew

Sent from my iPhone

> On 3 Oct 2020, at 02:14, Jonathan Buzzard <jonathan.buzzard at strath.ac.uk>
wrote:
>
> 
> What if any are the rules around running additional services on DSS/ESS
> nodes with regard to support? Let me outline our scenario
>
> Our main cluster uses 10Gbps ethernet for storage with the DSS-G nodes
> hooked up with redundant 40Gbps ethernet.
>
> However we have an older cluster that is used for undergraduate teaching
> that only has 1Gbps ethernet and QDR Infiniband. With no money to
> upgrade this to 10Gbps ethernet to support this we flipped one of the
> ports on the ConnectX4 cards on each DSS-G node to Infiniband and using
> IPoIB run the teaching nodes in this way.
>
> However it means that we need an Ethernet to Infiniband gateway as the
> ethernet only connected nodes want to talk to the Infiniband connected
> ones on their Infiniband address. Not a problem we grabbed an old spare
> machine installed CentOS and configured it up to act as a bridge, and
> deploy a custom route to all the ethernet only connected nodes. It has
> been working fine for a couple of years now.
>
> The problem is that this becomes firstly a single point of failure, on
> hardware that is six years old now. Secondly to apply updates on the
> gateway machine means all the teaching nodes have to be drained and GPFS
> umounted to reboot the machine after updates have been installed. It is
> currently not getting patched as frequently as I would like (and
> required by the Scottish government).
>
> So thinking about it I have come to the conclusion that the ideal
> situation would be to use the DSS-G nodes as the gateway and run
> keepalived to move the gateway ethernet IP address between the two
> machines. It is idea because as long as one DSS-G node is up then there
> is a functioning gateway and nodes don't get ejected from the cluster.
> If both DSS-G nodes are down then there is no GPFS to mount anyway and
> lack of a gateway is a moot point.
>
> I grabbed a couple of the teaching compute nodes in the summer and
> trialed it out. It works a treat.
>
> I now need to check IBM are not going to throw a wobbler down the line
> if I need to get support before deploying it to the DSS-G nodes :-)
>
>
> JAB.
>
> --
> Jonathan A. Buzzard                         Tel: +44141-5483420
> HPC System Administrator, ARCHIE-WeSt.
> University of Strathclyde, John Anderson Building, Glasgow. G4 0NG
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org
>
http://gpfsug.org/mailman/listinfo/gpfsug-discuss

>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20201002/6b69501c/attachment.htm>

From jonathan.buzzard at strath.ac.uk  Sat Oct  3 11:06:41 2020
From: jonathan.buzzard at strath.ac.uk (Jonathan Buzzard)
Date: Sat, 3 Oct 2020 11:06:41 +0100
Subject: [gpfsug-discuss] Services on DSS/ESS nodes
In-Reply-To: <OF777393BE.20C1535F-ON002585F5.007A9C47-1601677155073@notes.na.collabserv.com>
References: <OF777393BE.20C1535F-ON002585F5.007A9C47-1601677155073@notes.na.collabserv.com>
Message-ID: <7af0ac41-a280-5ecd-3658-7af761a4bf9b@strath.ac.uk>

On 02/10/2020 23:19, Andrew Beattie wrote:
> Jonathan,
> I suggest you get a formal statement from Lenovo as the DSS-G Platform 
> is no longer an IBM platform.
> 
> But for ESS based platforms the answer would be, it is not supported to 
> run anything on the IO Servers other than GNR and the relevant Scale 
> management services, due to the fact that if you lose an IO Server, or 
> if you in an extended maintenance window the Server needs to host all 
> the work that would be being performed by both IO servers.
> 

In the past ~500 days the Infiniband to Ethernet gateway has shifted 
~13GB of data, or about 25MB a day. Meanwhile in the last 470 days the 
DSS-G nodes have each shifted several PB. The proposed additional 
traffic is a drop in the ocean.

On my actual routers which shift much more data (over 300TB externally) 
with an uptime of ~180 days at the moment the CPU time consumed by 
keepalived is just under 31 minutes or about 8 seconds a day. These are 
much punier CPU's too. The proposed additional CPU usage is another drop 
in the ocean.

Given Lenovo sold the *same* configuration with x3650's and SR650's the 
"need all the CPU grunt" is somewhat fishy. Between the bid being 
submitted and actual tender award the SR650's came out and we paid a bit 
extra to uplift to the newer server hardware with exactly the same disk 
configuration. I believe IBM have done the same with the ESS/GNR servers 
too over time the same applies there too.

IMHO given keepalived is a base RHEL package, IBM/Lenovo should be 
offering running Infiniband to Ethernet gateways on the DSS/ESS nodes as 
a supported configuration for mixed network technology clusters :-)

Running a couple extra servers for this purpose is obnoxious from an 
environmental standpoint. That's IBM's green credentials out the window 
if you ask me.

I would note under those rules running a Nagios, Zabbix etc. client on 
the nodes is not permitted either. I would suggest that most sites would 
be rather unhappy about that :-)


 > I don't know if Lenovo have different point if view.
 >

Problem is when I ring up for support on my DSS-G I speak to an IBM 
employee not a Lenovo one :-)


JAB.

-- 
Jonathan A. Buzzard                         Tel: +44141-5483420
HPC System Administrator, ARCHIE-WeSt.
University of Strathclyde, John Anderson Building, Glasgow. G4 0NG


From abeattie at au1.ibm.com  Sat Oct  3 11:55:05 2020
From: abeattie at au1.ibm.com (Andrew Beattie)
Date: Sat, 3 Oct 2020 10:55:05 +0000
Subject: [gpfsug-discuss] Services on DSS/ESS nodes
In-Reply-To: <7af0ac41-a280-5ecd-3658-7af761a4bf9b@strath.ac.uk>
Message-ID: <OF37F2C74A.D272F48C-ON002585F6.003BF987-1601722505378@notes.na.collabserv.com>


Why do you need to run any kind of monitoring client on an IO server the
GUI / performance monitor already does all of that work for you and
collects the data on the dedicated EMS server.

If you have a small storage environment the. Yes the processor and memory
may feel like overkill, but tuned appropriately an IO server will use all
the memory you can give it to drive IO performance,

If you want to run a hybrid / non standard architecture then the IBM ESS /
DGSS platform may not be the right platform in comparison to a build your
own architecture, how ever you then take all the support issues onto your
self rather than it being the vendors problem.

Sent from my iPhone

> On 3 Oct 2020, at 20:06, Jonathan Buzzard <jonathan.buzzard at strath.ac.uk>
wrote:
>
> On 02/10/2020 23:19, Andrew Beattie wrote:
>> Jonathan,
>> I suggest you get a formal statement from Lenovo as the DSS-G Platform
>> is no longer an IBM platform.
>>
>> But for ESS based platforms the answer would be, it is not supported to
>> run anything on the IO Servers other than GNR and the relevant Scale
>> management services, due to the fact that if you lose an IO Server, or
>> if you in an extended maintenance window the Server needs to host all
>> the work that would be being performed by both IO servers.
>>
>
> In the past ~500 days the Infiniband to Ethernet gateway has shifted
> ~13GB of data, or about 25MB a day. Meanwhile in the last 470 days the
> DSS-G nodes have each shifted several PB. The proposed additional
> traffic is a drop in the ocean.
>
> On my actual routers which shift much more data (over 300TB externally)
> with an uptime of ~180 days at the moment the CPU time consumed by
> keepalived is just under 31 minutes or about 8 seconds a day. These are
> much punier CPU's too. The proposed additional CPU usage is another drop
> in the ocean.
>
> Given Lenovo sold the *same* configuration with x3650's and SR650's the
> "need all the CPU grunt" is somewhat fishy. Between the bid being
> submitted and actual tender award the SR650's came out and we paid a bit
> extra to uplift to the newer server hardware with exactly the same disk
> configuration. I believe IBM have done the same with the ESS/GNR servers
> too over time the same applies there too.
>
> IMHO given keepalived is a base RHEL package, IBM/Lenovo should be
> offering running Infiniband to Ethernet gateways on the DSS/ESS nodes as
> a supported configuration for mixed network technology clusters :-)
>
> Running a couple extra servers for this purpose is obnoxious from an
> environmental standpoint. That's IBM's green credentials out the window
> if you ask me.
>
> I would note under those rules running a Nagios, Zabbix etc. client on
> the nodes is not permitted either. I would suggest that most sites would
> be rather unhappy about that :-)
>
>
>> I don't know if Lenovo have different point if view.
>>
>
> Problem is when I ring up for support on my DSS-G I speak to an IBM
> employee not a Lenovo one :-)
>
>
> JAB.
>
> --
> Jonathan A. Buzzard                         Tel: +44141-5483420
> HPC System Administrator, ARCHIE-WeSt.
> University of Strathclyde, John Anderson Building, Glasgow. G4 0NG
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org
>
http://gpfsug.org/mailman/listinfo/gpfsug-discuss

>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20201003/b5de5e36/attachment.htm>

From luis.bolinches at fi.ibm.com  Sat Oct  3 12:19:36 2020
From: luis.bolinches at fi.ibm.com (Luis Bolinches)
Date: Sat, 3 Oct 2020 11:19:36 +0000
Subject: [gpfsug-discuss] Services on DSS/ESS nodes
In-Reply-To: <OF37F2C74A.D272F48C-ON002585F6.003BF987-1601722505378@notes.na.collabserv.com>
Message-ID: <OFBC4CB60E.7F91267F-ON002585F6.003E3808-1601723976459@notes.na.collabserv.com>

Are you mixing those ESS DSS in the same cluster? Or you are only running DSS


https://www.ibm.com/support/knowledgecenter/SSYSP8/gnrfaq.html?view=kc#supportqs__building

Mixing DSS and ESS in the same cluster is not a supported configuration.

You really need to talk with Lenovo as is your vendor. The fact that in your region your support is being given by an IBMer or not is not a relevant point. High enough in the chain always will end at IBM on any region as GNR is IBM tech for 17 years (yes 17) so if weird enough even on regions where Lenovo might do even third level it might end on development and/or research. But that is a Lenovo/IBM agreement not you and IBM. 

So please get the support statement from Lenovo about this and pls share it if you want/can so we all learn their position. 

Thanks. 


--
Cheers

> On 3. Oct 2020, at 13.55, Andrew Beattie <abeattie at au1.ibm.com> wrote:
> 
> ?
> Why do you need to run any kind of monitoring client on an IO server the GUI / performance monitor already does all of that work for you and collects the data on the dedicated EMS server.
> 
> If you have a small storage environment the. Yes the processor and memory may feel like overkill, but tuned appropriately an IO server will use all the memory you can give it to drive IO performance, 
> 
> If you want to run a hybrid / non standard architecture then the IBM ESS / DGSS platform may not be the right platform in comparison to a build your own architecture, how ever you then take all the support issues onto your self rather than it being the vendors problem. 
> 
> Sent from my iPhone
> 
> > On 3 Oct 2020, at 20:06, Jonathan Buzzard <jonathan.buzzard at strath.ac.uk> wrote:
> > 
> > On 02/10/2020 23:19, Andrew Beattie wrote:
> >> Jonathan,
> >> I suggest you get a formal statement from Lenovo as the DSS-G Platform 
> >> is no longer an IBM platform.
> >> 
> >> But for ESS based platforms the answer would be, it is not supported to 
> >> run anything on the IO Servers other than GNR and the relevant Scale 
> >> management services, due to the fact that if you lose an IO Server, or 
> >> if you in an extended maintenance window the Server needs to host all 
> >> the work that would be being performed by both IO servers.
> >> 
> > 
> > In the past ~500 days the Infiniband to Ethernet gateway has shifted 
> > ~13GB of data, or about 25MB a day. Meanwhile in the last 470 days the 
> > DSS-G nodes have each shifted several PB. The proposed additional 
> > traffic is a drop in the ocean.
> > 
> > On my actual routers which shift much more data (over 300TB externally) 
> > with an uptime of ~180 days at the moment the CPU time consumed by 
> > keepalived is just under 31 minutes or about 8 seconds a day. These are 
> > much punier CPU's too. The proposed additional CPU usage is another drop 
> > in the ocean.
> > 
> > Given Lenovo sold the *same* configuration with x3650's and SR650's the 
> > "need all the CPU grunt" is somewhat fishy. Between the bid being 
> > submitted and actual tender award the SR650's came out and we paid a bit 
> > extra to uplift to the newer server hardware with exactly the same disk 
> > configuration. I believe IBM have done the same with the ESS/GNR servers 
> > too over time the same applies there too.
> > 
> > IMHO given keepalived is a base RHEL package, IBM/Lenovo should be 
> > offering running Infiniband to Ethernet gateways on the DSS/ESS nodes as 
> > a supported configuration for mixed network technology clusters :-)
> > 
> > Running a couple extra servers for this purpose is obnoxious from an 
> > environmental standpoint. That's IBM's green credentials out the window 
> > if you ask me.
> > 
> > I would note under those rules running a Nagios, Zabbix etc. client on 
> > the nodes is not permitted either. I would suggest that most sites would 
> > be rather unhappy about that :-)
> > 
> > 
> >> I don't know if Lenovo have different point if view.
> >> 
> > 
> > Problem is when I ring up for support on my DSS-G I speak to an IBM 
> > employee not a Lenovo one :-)
> > 
> > 
> > JAB.
> > 
> > -- 
> > Jonathan A. Buzzard Tel: +44141-5483420
> > HPC System Administrator, ARCHIE-WeSt.
> > University of Strathclyde, John Anderson Building, Glasgow. G4 0NG
> > _______________________________________________
> > gpfsug-discuss mailing list
> > gpfsug-discuss at spectrumscale.org
> > http://gpfsug.org/mailman/listinfo/gpfsug-discuss 
> > 
> 

Ellei edell? ole toisin mainittu: / Unless stated otherwise above:
Oy IBM Finland Ab
PL 265, 00101 Helsinki, Finland
Business ID, Y-tunnus: 0195876-3 
Registered in Finland

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20201003/7806094c/attachment.htm>

From jonathan.buzzard at strath.ac.uk  Sat Oct  3 18:16:33 2020
From: jonathan.buzzard at strath.ac.uk (Jonathan Buzzard)
Date: Sat, 3 Oct 2020 18:16:33 +0100
Subject: [gpfsug-discuss] Services on DSS/ESS nodes
In-Reply-To: <OF37F2C74A.D272F48C-ON002585F6.003BF987-1601722505378@notes.na.collabserv.com>
References: <OF37F2C74A.D272F48C-ON002585F6.003BF987-1601722505378@notes.na.collabserv.com>
Message-ID: <d25eba8f-244b-f610-a49f-8826e77ec6b6@strath.ac.uk>

On 03/10/2020 11:55, Andrew Beattie wrote:
> Why do you need to run any kind of monitoring client on an IO server the 
> GUI / performance monitor already does all of that work for you and 
> collects the data on the dedicated EMS server.

Because any remotely sensible admin demands a single pane service 
monitoring system. If I have to look at A for everything but my DSS-G 
and B for my DSS-G that's an epic fail.

I often feel there is a huge disjuncture between the people that develop 
systems and those that look after them; they think the world revolves 
around them. It is clear this is one of those  cases.

> 
> If you have a small storage environment the. Yes the processor and 
> memory may feel like overkill, but tuned appropriately an IO server will 
> use all the memory you can give it to drive IO performance,

Right but the SR650's came with not only more CPU but more RAM than the 
x3650's. In which case why only 192GB of RAM? The SR650's can take much 
more than that. Why not 384GB of RAM :-) Right now we have a shade over 
50GB of RAM being unused. Been way for like ever because we naughtily 
have a influx DB client setup for a Grafana dashboard. We also 
presumably naughtily have remote syslog to Splunk.

> 
> If you want to run a hybrid / non standard architecture then the IBM ESS 
> / DGSS platform may not be the right platform in comparison to a build 
> your own architecture, how ever you then take all the support issues 
> onto your self rather than it being the vendors problem.
> 
I don't see anything that says you can't have some clients ethernet 
connected and some Infiniband connected. That of course requires a 
gateway, and the most logical place to put it is on the ESS or DSS nodes 
IMHO. I will see what Lenovo has to say, but looks like the IBM position 
is decidedly let's burn the planet, who gives a dam.


JAB.

-- 
Jonathan A. Buzzard                         Tel: +44141-5483420
HPC System Administrator, ARCHIE-WeSt.
University of Strathclyde, John Anderson Building, Glasgow. G4 0NG


From jonathan.buzzard at strath.ac.uk  Sat Oct  3 18:16:39 2020
From: jonathan.buzzard at strath.ac.uk (Jonathan Buzzard)
Date: Sat, 3 Oct 2020 18:16:39 +0100
Subject: [gpfsug-discuss] Services on DSS/ESS nodes
In-Reply-To: <OFBC4CB60E.7F91267F-ON002585F6.003E3808-1601723976459@notes.na.collabserv.com>
References: <OFBC4CB60E.7F91267F-ON002585F6.003E3808-1601723976459@notes.na.collabserv.com>
Message-ID: <d475e10b-2bea-aac3-3031-3c6dc348ce86@strath.ac.uk>

On 03/10/2020 12:19, Luis Bolinches wrote:
> Are you mixing those ESS DSS in the same cluster? Or you are only 
> running DSS
> 

Only running DSS. We are too far down the rabbit hole to ever switch to 
ESS now.

> 
> Mixing DSS and ESS in the same cluster is not a supported configuration.
>

I know, it means you can never ever migrate your storage from DSS to ESS 
without a full backup and restore. Who with any significant amount of 
storage is going to want to do that? The logic behind this escapes me, 
or perhaps in that scenario IBM might relax the rules for the migration 
period.


> You really need to talk with Lenovo as is your vendor. The fact that in 
> your region your support is being given by an IBMer or not is not a 
> relevant point. High enough in the chain always will end at IBM on any 
> region as GNR is IBM tech for 17 years (yes 17) so if weird enough even 
> on regions where Lenovo might do even third level it might end on 
> development and/or research. But that is a Lenovo/IBM agreement not you 
> and IBM.
> 
> So please get the support statement from Lenovo about this and pls share 
> it if you want/can so we all learn their position.
> 

Will attempt that, though I do think it should be a supported config out 
the box :-)


JAB.

-- 
Jonathan A. Buzzard                         Tel: +44141-5483420
HPC System Administrator, ARCHIE-WeSt.
University of Strathclyde, John Anderson Building, Glasgow. G4 0NG


From luis.bolinches at fi.ibm.com  Sun Oct  4 10:29:34 2020
From: luis.bolinches at fi.ibm.com (Luis Bolinches)
Date: Sun, 4 Oct 2020 09:29:34 +0000
Subject: [gpfsug-discuss] Services on DSS/ESS nodes
In-Reply-To: <d475e10b-2bea-aac3-3031-3c6dc348ce86@strath.ac.uk>
Message-ID: <OF078DF2EA.0852E4DE-ON002585F7.00342558-1601803774728@notes.na.collabserv.com>

Hi

As stated on the same link you can do remote mounts from each other and be a supported setup. 

? You can use the remote mount feature of IBM Spectrum Scale to share file system data across clusters.?

--
Cheers

> On 3. Oct 2020, at 20.16, Jonathan Buzzard <jonathan.buzzard at strath.ac.uk> wrote:
> 
> ?On 03/10/2020 12:19, Luis Bolinches wrote:
>> Are you mixing those ESS DSS in the same cluster? Or you are only
>> running DSS
> 
> Only running DSS. We are too far down the rabbit hole to ever switch to 
> ESS now.
> 
>> Mixing DSS and ESS in the same cluster is not a supported configuration.
> 
> I know, it means you can never ever migrate your storage from DSS to ESS 
> without a full backup and restore. Who with any significant amount of 
> storage is going to want to do that? The logic behind this escapes me, 
> or perhaps in that scenario IBM might relax the rules for the migration 
> period.
> 
> 
>> You really need to talk with Lenovo as is your vendor. The fact that in
>> your region your support is being given by an IBMer or not is not a
>> relevant point. High enough in the chain always will end at IBM on any
>> region as GNR is IBM tech for 17 years (yes 17) so if weird enough even
>> on regions where Lenovo might do even third level it might end on
>> development and/or research. But that is a Lenovo/IBM agreement not you
>> and IBM.
>> So please get the support statement from Lenovo about this and pls share
>> it if you want/can so we all learn their position.
> 
> Will attempt that, though I do think it should be a supported config out 
> the box :-)
> 
> 
> JAB.
> 
> -- 
> Jonathan A. Buzzard                         Tel: +44141-5483420
> HPC System Administrator, ARCHIE-WeSt.
> University of Strathclyde, John Anderson Building, Glasgow. G4 0NG
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss

Ellei edell? ole toisin mainittu: / Unless stated otherwise above:
Oy IBM Finland Ab
PL 265, 00101 Helsinki, Finland
Business ID, Y-tunnus: 0195876-3 
Registered in Finland

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20201004/43585e71/attachment.htm>

From jonathan.buzzard at strath.ac.uk  Sun Oct  4 11:17:30 2020
From: jonathan.buzzard at strath.ac.uk (Jonathan Buzzard)
Date: Sun, 4 Oct 2020 11:17:30 +0100
Subject: [gpfsug-discuss] Services on DSS/ESS nodes
In-Reply-To: <OF078DF2EA.0852E4DE-ON002585F7.00342558-1601803774728@notes.na.collabserv.com>
References: <OF078DF2EA.0852E4DE-ON002585F7.00342558-1601803774728@notes.na.collabserv.com>
Message-ID: <7ef58f7d-1c70-97d7-100d-395c403d6199@strath.ac.uk>

On 04/10/2020 10:29, Luis Bolinches wrote:
> Hi
> 
> As stated on the same link you can do remote mounts from each other and 
> be a supported setup.
> 
> ??You can use the remote mount feature of IBM Spectrum Scale to share 
> file system data across clusters.?
> 

You can, but imagine I have a DSS-G cluster, with 2PB of storage on it 
which is quite modest in 2020. It is now end of life and for whatever 
reason I decide I want to move to ESS instead.

What any sane storage admin want to do at this stage is set the ESS, add 
the ESS nodes to the existing cluster on the DSS-G then do a bit of 
mmadddisk/mmdeldisk and sit back while the data is seemlessly moved from 
the DSS-G to the ESS. Admittedly this might take a while :-)

Then once all the data is moved a bit of mmdelnode and bingo the storage 
has been migrated from DSS-G to ESS with zero downtime.

As that is not allowed for what I presume are commercial reasons (you 
could do it in reverse and presumable that is what IBM don't want) then 
once you are down the rabbit hole of one type of storage the you are not 
going to switch to a different one.

You need to look at it from the perspective of the users. They frankly 
could not give a monkeys what storage solution you are using. All they 
care about is having usable storage and large amounts of downtime to 
switch from one storage type to another is not really acceptable.


JAB.

-- 
Jonathan A. Buzzard                         Tel: +44141-5483420
HPC System Administrator, ARCHIE-WeSt.
University of Strathclyde, John Anderson Building, Glasgow. G4 0NG


From olaf.weiser at de.ibm.com  Mon Oct  5 07:19:40 2020
From: olaf.weiser at de.ibm.com (Olaf Weiser)
Date: Mon, 5 Oct 2020 06:19:40 +0000
Subject: [gpfsug-discuss] Services on DSS/ESS nodes
In-Reply-To: <7ef58f7d-1c70-97d7-100d-395c403d6199@strath.ac.uk>
References: <7ef58f7d-1c70-97d7-100d-395c403d6199@strath.ac.uk>,
	<OF078DF2EA.0852E4DE-ON002585F7.00342558-1601803774728@notes.na.collabserv.com>
Message-ID: <OF7BE40A26.30F06268-ON002585F8.001E36FC-002585F8.0022C2BE@notes.na.collabserv.com>

An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20201005/47f72836/attachment.htm>

From jordi.caubet at es.ibm.com  Mon Oct  5 07:27:39 2020
From: jordi.caubet at es.ibm.com (Jordi Caubet Serrabou)
Date: Mon, 5 Oct 2020 06:27:39 +0000
Subject: [gpfsug-discuss] Services on DSS/ESS nodes
In-Reply-To: <OF7BE40A26.30F06268-ON002585F8.001E36FC-002585F8.0022C2BE@notes.na.collabserv.com>
Message-ID: <OF2E966B1D.F37B6B20-ON002585F8.00237D51-1601879259249@notes.na.collabserv.com>

?Coming to the routing point, is there any reason why you need it ? I mean, this is because GPFS trying to connect between compute nodes or a reason outside GPFS scope ?
If the reason is GPFS,  imho best approach - without knowledge of the licensing you have - would be to use separate clusters: a storage cluster and two compute clusters.

Both compute clusters join using multicluster setup the storage cluster. There is no need both compute clusters see each other, they only need to see the storage cluster. One of the clusters using the 10G, the other cluster using the IPoIB interface.
You need at least three quorum nodes in each compute cluster but if licensing is per drive on the DSS, it is covered.
--
Jordi Caubet Serrabou
IBM Software Defined Infrastructure (SDI) and Flash Technical Sales Specialist
Technical Computing and HPC IT Specialist and Architect
Ext. Phone: (+34) 679.79.17.84 (internal 55834)
E-mail: jordi.caubet at es.ibm.com

> On 5 Oct 2020, at 08:19, Olaf Weiser <olaf.weiser at de.ibm.com> wrote:
> 
> ?
> let me add a few comments from some very successful large installations in Eruope
>  
> # InterOP
> Even though (as Luis pointed to) , there is no support statement to run intermix DSS/ESS in general, it was ~, and is, and will be, ~ allowed for short term purposes, such as e.g migration.
> The reason to not support those DSS/ESS mixed configuration in general is simply driven by the fact, that different release version of DSS/ESS potentially (not in every release, but sometimes)  comes with different driver levels, (e.g. MOFED), OS, RDMA-settings, GPFS tuning,  etc...
> Those changes can have an impact/multiple impacts and therefore, we do not support that in general. Of course -and this would be the advice for every one - if you are faced the need to run a mixed configuration for e.g. a migration and/or e.g. cause of you need to temporary provide space etc... contact you IBM representative and settle to plan that accordingly..
> There will be (likely) some additional requirements/dependencies defined  like  driver versions, OS,  and/or Scale versions, but you'll get a chance to run mixed configuration - temporary limited to your specific scenario.
>  
> # Monitoring
> No doubt, monitoring is essential and absolutely needed. - and/but - IBM wants customers to be very sensitive, what kind of additional software (=workload) gets installed on the ESS-IO servers. BTW, this rule applies as well to any other important GPFS node with special roles (e.g. any other NSD server etc)
> But given the fact, that customer's usually manage and monitor their server farms from a central point of control (any 3rd party software), it is common/ best practice , that additionally monitor software(clients/endpoints) has to run on GPFS nodes, so as on ESS nodes too.
>  
> If that way of acceptance applies for DSS too, you may want to double check with Lenovo ?!
>  
>  
> #additionally GW functions
> It would be a hot iron, to general allow routing on IO nodes. Similar to the mixed support approach, the field variety for such a statement would be hard(==impossible) to manage. As we all agree, additional network traffic can (and in fact will) impact GPFS.
> In your special case, the expected data rates seems to me more than ok and acceptable to go with your suggested config (as long workloads remain on that level / monitor it accordingly as you are already obviously doing) 
> Again,to be on the safe side.. contact your IBM representative and I'm sure you 'll find a way..
>  
>  
>  
> kind regards....
> olaf
>  
>  
> ----- Original message -----
> From: Jonathan Buzzard <jonathan.buzzard at strath.ac.uk>
> Sent by: gpfsug-discuss-bounces at spectrumscale.org
> To: gpfsug-discuss at spectrumscale.org
> Cc:
> Subject: [EXTERNAL] Re: [gpfsug-discuss] Services on DSS/ESS nodes
> Date: Sun, Oct 4, 2020 12:17 PM
>  
> On 04/10/2020 10:29, Luis Bolinches wrote:
> > Hi
> >
> > As stated on the same link you can do remote mounts from each other and
> > be a supported setup.
> >
> > ? You can use the remote mount feature of IBM Spectrum Scale to share
> > file system data across clusters.?
> >
> 
> You can, but imagine I have a DSS-G cluster, with 2PB of storage on it
> which is quite modest in 2020. It is now end of life and for whatever
> reason I decide I want to move to ESS instead.
> 
> What any sane storage admin want to do at this stage is set the ESS, add
> the ESS nodes to the existing cluster on the DSS-G then do a bit of
> mmadddisk/mmdeldisk and sit back while the data is seemlessly moved from
> the DSS-G to the ESS. Admittedly this might take a while :-)
> 
> Then once all the data is moved a bit of mmdelnode and bingo the storage
> has been migrated from DSS-G to ESS with zero downtime.
> 
> As that is not allowed for what I presume are commercial reasons (you
> could do it in reverse and presumable that is what IBM don't want) then
> once you are down the rabbit hole of one type of storage the you are not
> going to switch to a different one.
> 
> You need to look at it from the perspective of the users. They frankly
> could not give a monkeys what storage solution you are using. All they
> care about is having usable storage and large amounts of downtime to
> switch from one storage type to another is not really acceptable.
> 
> 
> JAB.
> 
> --
> Jonathan A. Buzzard                         Tel: +44141-5483420
> HPC System Administrator, ARCHIE-WeSt.
> University of Strathclyde, John Anderson Building, Glasgow. G4 0NG
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss 
>  
>  
> 
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
> 

Salvo indicado de otro modo m?s arriba / Unless stated otherwise above:
International Business Machines, S.A.
Santa Hortensia, 26-28, 28002 Madrid
Registro Mercantil de Madrid; Folio 1; Tomo 1525; Hoja M-28146
CIF A28-010791

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20201005/1846e9d6/attachment.htm>

From S.J.Thompson at bham.ac.uk  Mon Oct  5 09:40:56 2020
From: S.J.Thompson at bham.ac.uk (Simon Thompson)
Date: Mon, 5 Oct 2020 08:40:56 +0000
Subject: [gpfsug-discuss] Services on DSS/ESS nodes
In-Reply-To: <38bbcd24-4cc2-2abe-754f-3017d71c1eaa@strath.ac.uk>
References: <38bbcd24-4cc2-2abe-754f-3017d71c1eaa@strath.ac.uk>
Message-ID: <73AEC0BB-C67B-4863-A44F-6D32FDA56FEB@bham.ac.uk>

>    I now need to check IBM are not going to throw a wobbler down the line 
>    if I need to get support before deploying it to the DSS-G nodes :-)

I know there were a lot of other emails about this ...

I think you maybe want to be careful doing this. Whilst it might work when you setup the DSS-G like this, remember that the memory usage you are seeing at this point in time may not be what you always need. For example if you fail-over the recovery groups, you need to have enough free memory to handle this. E.g. a node failure, or more likely you are upgrading the building blocks.

Personally I wouldn't run other things like this on my DSS-G storage nodes. We do run e.g. nrpe monitoring to collect and report faults, but this is pretty lightweight compared to everything else. They even removed support for running the gui packages on the IO nodes - the early DSS-G builds used the IO nodes for this, but now you need separate systems for this.

Simon


From jonathan.buzzard at strath.ac.uk  Mon Oct  5 12:44:48 2020
From: jonathan.buzzard at strath.ac.uk (Jonathan Buzzard)
Date: Mon, 5 Oct 2020 12:44:48 +0100
Subject: [gpfsug-discuss] Services on DSS/ESS nodes
In-Reply-To: <73AEC0BB-C67B-4863-A44F-6D32FDA56FEB@bham.ac.uk>
References: <38bbcd24-4cc2-2abe-754f-3017d71c1eaa@strath.ac.uk>
	<73AEC0BB-C67B-4863-A44F-6D32FDA56FEB@bham.ac.uk>
Message-ID: <905a0bdb-b6a1-90e4-bf57-ed8edae6fb7c@strath.ac.uk>


On 05/10/2020 07:27, Jordi Caubet Serrabou wrote:

 > ?Coming to the routing point, is there any reason why you need it ? I
 > mean, this is because GPFS trying to connect between compute nodes or
 > a reason outside GPFS scope ?
 > If the reason is GPFS,  imho best approach - without knowledge of the
 > licensing you have - would be to use separate clusters: a storage
 > cluster and two compute clusters.

The issue is that individual nodes want to talk to one another on the 
data interface. Which caught me by surprise as the cluster is set to 
admin mode central.

The admin interface runs over ethernet for all nodes on a specific VLAN 
which which is given 802.1p priority 5 (that's Voice, < 10 ms latency 
and jitter). That saved a bunch of switching and cabling as you don't 
need the extra interface for the admin traffic. The cabling already 
significantly restricts airflow for a compute rack as it is, without 
adding a whole bunch more for a barely used admin interface.

It's like the people who wrote the best practice about separate 
interface for the admin traffic know very little about networking to be 
frankly honest. This is all last century technology.

The nodes for undergraduate teaching only have a couple of 1Gb ethernet 
ports which would suck for storage usage. However they also have QDR 
Infiniband. That is because even though undergraduates can't run 
multinode jobs, on the old cluster the Lustre storage was delivered over 
Infiniband, so they got Infiniband cards.

 > Both compute clusters join using multicluster setup the storage
 > cluster. There is no need both compute clusters see each other, they
 > only need to see the storage cluster. One of the clusters using the
 > 10G, the other cluster using the IPoIB interface.
 > You need at least three quorum nodes in each compute cluster but if
 > licensing is per drive on the DSS, it is covered.

Three clusters is starting to get complicated from an admin perspective. 
The biggest issue is coordinating maintenance and keep sufficient quorum 
nodes up.

Maintenance on compute nodes is done via the job scheduler. I know some 
people think this is crazy, but it is in reality extremely elegant.

We can schedule a reboot on a node as soon as the current job has 
finished (usually used for firmware upgrades). Or we can schedule a job 
to run as root (usually for applying updates) as soon as the current job 
has finished. As such we have no way of knowing when that will be for a 
given node, and there is a potential for all three quorum nodes to be 
down at once.

Using this scheme we can seamlessly upgrade the nodes safe in the 
knowledge that a node is either busy and it's running on the current 
configuration or it has been upgraded and is running the new 
configuration. Consequently multinode jobs are guaranteed to have all 
nodes in the job running on the same configuration.

The alternative is to drain the node, but there is only a 23% chance the 
node will become available during working hours leading to a significant 
loss of compute time when doing maintenance compared to our existing 
scheme where the loss of compute time is only as long as the upgrade 
takes to install. Pretty much the only time we have idle nodes is when 
the scheduler is reserving nodes ready to schedule a multi node job.

Right now we have a single cluster with the quorum nodes being the two 
DSS-G nodes and the node used for backup. It is easy to ensure that 
quorum is maintained on these, they also all run real RHEL, where as the 
compute nodes run CentOS.


JAB.

-- 
Jonathan A. Buzzard                         Tel: +44141-5483420
HPC System Administrator, ARCHIE-WeSt.
University of Strathclyde, John Anderson Building, Glasgow. G4 0NG


From carlz at us.ibm.com  Mon Oct  5 13:09:02 2020
From: carlz at us.ibm.com (Carl Zetie - carlz@us.ibm.com)
Date: Mon, 5 Oct 2020 12:09:02 +0000
Subject: [gpfsug-discuss] Services on DSS/ESS nodes
Message-ID: <714B599F-D06D-4D03-98F3-A2BF6F7360DB@us.ibm.com>


Jordi wrote:
?Both compute clusters join using multicluster setup the storage cluster. There is no need both compute clusters see each other, they only need to see the storage cluster. One of the clusters using the 10G, the other cluster using the IPoIB interface.
You need at least three quorum nodes in each compute cluster but if licensing is per drive on the DSS, it is covered.?

As a side note: One of the reasons we designed capacity (per Disk or per TB) licensing the way we did was specifically so that you could make this kind of architectural decision on its own merits, without worrying about a licensing penalty.


Carl Zetie
Program Director
Offering Management
Spectrum Scale
----
(919) 473 3318 ][ Research Triangle Park
carlz at us.ibm.com

[signature_1243111775]


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20201005/7eb5d683/attachment.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image001.png
Type: image/png
Size: 69558 bytes
Desc: image001.png
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20201005/7eb5d683/attachment.png>

From carlz at us.ibm.com  Mon Oct  5 13:20:25 2020
From: carlz at us.ibm.com (Carl Zetie - carlz@us.ibm.com)
Date: Mon, 5 Oct 2020 12:20:25 +0000
Subject: [gpfsug-discuss] Services on DSS/ESS nodes
Message-ID: <288C3527-32BA-43E2-B5EF-E79CC5765424@us.ibm.com>

>> Mixing DSS and ESS in the same cluster is not a supported configuration.
>
> I know, it means you can never ever migrate your storage from DSS to ESS
> without a full backup and restore. Who with any significant amount of
> storage is going to want to do that? The logic behind this escapes me,
> or perhaps in that scenario IBM might relax the rules for the migration
> period.
>

We do indeed relax the rules temporarily for a migration.

The reasoning behind this rule is for support. Many Scale support issues - often the toughest ones - are not about a single node, but about the cluster or network as a whole. So if you have a mix of IBM systems with systems supported by an OEM (this applies to any OEM by the way, not just Lenovo) and a cluster-wide issue, who are you going to call. (Well, in practice you?re going to call IBM and we?ll do our best to help you despite limits on our knowledge of the OEM systems?).

--CZ


Carl Zetie
Program Director
Offering Management
Spectrum Scale
----
(919) 473 3318 ][ Research Triangle Park
carlz at us.ibm.com

[signature_386371469]


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20201005/8629eaed/attachment.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image001.png
Type: image/png
Size: 69558 bytes
Desc: image001.png
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20201005/8629eaed/attachment.png>

From jonathan.buzzard at strath.ac.uk  Mon Oct  5 14:39:12 2020
From: jonathan.buzzard at strath.ac.uk (Jonathan Buzzard)
Date: Mon, 5 Oct 2020 14:39:12 +0100
Subject: [gpfsug-discuss] Services on DSS/ESS nodes
In-Reply-To: <73AEC0BB-C67B-4863-A44F-6D32FDA56FEB@bham.ac.uk>
References: <38bbcd24-4cc2-2abe-754f-3017d71c1eaa@strath.ac.uk>
	<73AEC0BB-C67B-4863-A44F-6D32FDA56FEB@bham.ac.uk>
Message-ID: <abf37ce3-aa29-b4e1-fab6-12673b7aad67@strath.ac.uk>

On 05/10/2020 09:40, Simon Thompson wrote:
>> I now need to check IBM are not going to throw a wobbler down the
>> line if I need to get support before deploying it to the DSS-G
>> nodes :-)
> 
> I know there were a lot of other emails about this ...
> 
> I think you maybe want to be careful doing this. Whilst it might work
> when you setup the DSS-G like this, remember that the memory usage
> you are seeing at this point in time may not be what you always need.
> For example if you fail-over the recovery groups, you need to have
> enough free memory to handle this. E.g. a node failure, or more
> likely you are upgrading the building blocks.

I think there is a lack of understanding on exactly how light weight 
keepalived is.

It's the same code as on my routers which are admittedly different CPU's 
(MIPS to be precise) but memory usage (taking out shared memory usage - 
libc for example is loaded anyway) is under 200KB. A bash shell uses 
more memory...

> 
> Personally I wouldn't run other things like this on my DSS-G storage
> nodes. We do run e.g. nrpe monitoring to collect and report faults,
> but this is pretty lightweight compared to everything else. They even
> removed support for running the gui packages on the IO nodes - the
> early DSS-G builds used the IO nodes for this, but now you need
> separate systems for this.
> 

And keepalived is in the same range as nrpe, which you do run :-) I have 
seen nrpe get out of hand and consume significant amounts of resources 
on a machine; the machine was ground to halt due to nrpe. One of the 
standard plugins was failing and sitting their busy waiting. Every five 
minutes it ran again. It of course decided to wait till ~7pm on a Friday 
to go wonky. By mid morning on Saturday it was virtually unresponsive, 
several minutes to get a shell...

I would note that you can run keepalived quite happily on an Ubiquiti 
EdgeRouter X which has a dual core 880 MHz MIPS CPU with 256MB of RAM. 
Mikrotik have models with similar specs that run it too.

On a dual Xeon Gold 6142 machine the usage of RAM and CPU by keepalived 
is noise.


JAB.

-- 
Jonathan A. Buzzard                         Tel: +44141-5483420
HPC System Administrator, ARCHIE-WeSt.
University of Strathclyde, John Anderson Building, Glasgow. G4 0NG


From committee at io500.org  Thu Oct  1 17:40:00 2020
From: committee at io500.org (committee at io500.org)
Date: Thu, 01 Oct 2020 10:40:00 -0600
Subject: [gpfsug-discuss] IO500 SC20 Call for Submission
Message-ID: <4a20ed6ae985a25c69d953e1ea633d62@io500.org>

CALL FOR IO500 SUBMISSION

Deadline: 30 October 2020 AoE 

Stabilization period: 1st October -- 9th October 2020 AoE 

The IO500 [1] is now accepting and encouraging submissions for the
upcoming 7th IO500 list, to be revealed at the IO500 Virtual BOF during
SC20. Once again, we are also accepting submissions to the 10 Node I/O
Challenge to encourage submission of small scale results. The new ranked
lists will be announced at our Virtual SC20 BoF. We hope to see you, and
your results, there.  

A new change for the upcoming submission procedure is the introduction
of a stabilization period that aims to harden the benchmark. The final
benchmark is released at the end of this period. During the
stabilization we encourage the community to test the proper execution of
the benchmark and provide us with feedback. We will apply bug fixes to
the code base and expect that results obtained will be valid as full
submission. We also continue with another list for the Student Cluster
Competition, since IO500 is used during this competition. 

Also new this year is that we have partnered with Anthony Kougkas' team
at Illinois Institute of Technology to evaluate the submission metadata
describing the storage system on which the test was run to improve the
quality and usefulness of the data IO500 collects. You may be contacted
by one of his students to clarify one or more of the metadata items from
your submission(s). We would appreciate, but do not require, your
cooperation to help improve the submission metadata quality. Results
from their work will be fed back to improve our submission process for
future lists. 

The IO500 benchmark suite is designed to be easy to run, and the
community has multiple active support channels to help with any
questions. Please submit results from your system, and we look forward
to seeing many of you at SC20! Please note that submissions of all sizes
are welcome, including multiple submissions from different storage
systems/tiers at a single site.  The website has customizable sorting so
it is possible to submit on a small system and still get a very good
per-client score, for example. Additionally, the list is about much more
than just the raw rank; all submissions help the community by collecting
and publishing a wider corpus of data. More details below. 

Following the success of the Top500 in collecting and analyzing
historical trends in supercomputer technology and evolution, the IO500
[1] was created in 2017, published its first list at SC17, and has grown
continuously since then. The need for such an initiative has long been
known within High-Performance Computing; however, defining appropriate
benchmarks had long been challenging. Despite this challenge, the
community, after long and spirited discussion, finally reached consensus
on a suite of benchmarks and a metric for resolving the scores into a
single ranking. 

The multi-fold goals of the benchmark suite are as follows: 

 	* Maximizing simplicity in running the benchmark suite
 	* Encouraging complexity in tuning for performance
 	* Allowing submitters to highlight their "hero run" performance
numbers
 	* Forcing submitters to simultaneously report performance for
challenging IO patterns.

Specifically, the benchmark suite includes a hero-run of both IOR and
mdtest configured however possible to maximize performance and establish
an upper-bound for performance. It also includes an IOR and mdtest run
with highly prescribed parameters in an attempt to determine a
lower-bound on the performance. Finally, it includes a namespace search,
as this has been determined to be a highly sought-after feature in HPC
storage systems that have historically not been well-measured.
Submitters are encouraged to share their tuning insights for
publication. 

The goals of the community are also multi-fold: 

 	* Gather historical data for the sake of analysis and to aid
predictions of storage futures
 	* Collect tuning information to share valuable performance
optimizations across the community
 	* Encourage vendors and designers to optimize for workloads beyond
"hero runs"
 	* Establish bounded expectations for users, procurers, and
administrators

10 NODE I/O CHALLENGE

The 10 Node Challenge is conducted using the regular IO500 benchmark,
however, with the rule that exactly 10 client nodes must be used to run
the benchmark. You may use any shared storage with, e.g., any number of
servers. When submitting for the IO500 list, you can opt-in for
"Participate in the 10 compute node challenge only", then we will not
include the results into the ranked list. Other 10-node node submissions
will be included in the full list and in the ranked list. We will
announce the result in a separate derived list and in the full list but
not on the ranked IO500 list at https://io500.org/ [2]  

BIRDS-OF-A-FEATHER

Once again, we encourage you to submit [1], to join our community, and
to attend our virtual BoF "The IO500 and the Virtual Institute of I/O"
at SC20, where we will announce the new IO500 list, the 10 node
challenge list, and the Student Cluster Competition list. We look
forward to answering any questions or concerns you might have. 

 	* [1] http://www.vi4io.org/io500/submission [3] 

Thanks, 

The IO500 Committee <committee at io500.org> 

 
Links:
------
[1] http://io500.org/
[2] https://io500.org/
[3] http://www.vi4io.org/io500/submission
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20201001/7611b021/attachment.htm>

From valdis.kletnieks at vt.edu  Wed Oct  7 00:45:46 2020
From: valdis.kletnieks at vt.edu (Valdis Kl=?utf-8?Q?=c4=93?=tnieks)
Date: Tue, 06 Oct 2020 19:45:46 -0400
Subject: [gpfsug-discuss] Services on DSS/ESS nodes
In-Reply-To: <OF37F2C74A.D272F48C-ON002585F6.003BF987-1601722505378@notes.na.collabserv.com>
References: <OF37F2C74A.D272F48C-ON002585F6.003BF987-1601722505378@notes.na.collabserv.com>
Message-ID: <138651.1602027946@turing-police>

On Sat, 03 Oct 2020 10:55:05 -0000, "Andrew Beattie" said:

> Why do you need to run any kind of monitoring client on an IO server the
> GUI / performance monitor already does all of that work for you and
> collects the data on the dedicated EMS server.

Does *ALL* that work for me?

Will it toss you an alert if your sshd goes away, or if somebody's tossing
packets that iptables is blocking for good reasons, or any of the many other
things that a competent sysadmin wants to be alerted on that aren't GPFS, but
which are things that Nagios and Zabbix and similar tools were invented
to track?


-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 832 bytes
Desc: not available
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20201006/822e7ca6/attachment.sig>

From S.J.Thompson at bham.ac.uk  Wed Oct  7 11:28:55 2020
From: S.J.Thompson at bham.ac.uk (Simon Thompson)
Date: Wed, 7 Oct 2020 10:28:55 +0000
Subject: [gpfsug-discuss] Services on DSS/ESS nodes
In-Reply-To: <138651.1602027946@turing-police>
References: <OF37F2C74A.D272F48C-ON002585F6.003BF987-1601722505378@notes.na.collabserv.com>
	<138651.1602027946@turing-police>
Message-ID: <FB188F2A-6E9D-49C5-8663-87547FD35BE3@bham.ac.uk>

Agreed ...

Report to me a pdisk is failing in my monitoring dashboard we use for *everything else*.
Tell me that kswapd is having one of those days.
Tell me rsyslogd has stopped sending for some reason.
Tell me if there are long waiters on the hosts.
Read the ipmi status of the host to tell me an OS drive is failed, or the CMOS battery is flat or ...

Whilst the GUI has a bunch of this stuff, in the real world the rest of us have reporting and dashboarding from many more systems...

Simon

?On 07/10/2020, 00:45, "gpfsug-discuss-bounces at spectrumscale.org on behalf of Valdis Kl?tnieks" <gpfsug-discuss-bounces at spectrumscale.org on behalf of valdis.kletnieks at vt.edu> wrote:

    On Sat, 03 Oct 2020 10:55:05 -0000, "Andrew Beattie" said:

    > Why do you need to run any kind of monitoring client on an IO server the
    > GUI / performance monitor already does all of that work for you and
    > collects the data on the dedicated EMS server.

    Does *ALL* that work for me?

    Will it toss you an alert if your sshd goes away, or if somebody's tossing
    packets that iptables is blocking for good reasons, or any of the many other
    things that a competent sysadmin wants to be alerted on that aren't GPFS, but
    which are things that Nagios and Zabbix and similar tools were invented
    to track?


From jonathan.buzzard at strath.ac.uk  Wed Oct  7 13:14:45 2020
From: jonathan.buzzard at strath.ac.uk (Jonathan Buzzard)
Date: Wed, 7 Oct 2020 13:14:45 +0100
Subject: [gpfsug-discuss] Services on DSS/ESS nodes
In-Reply-To: <FB188F2A-6E9D-49C5-8663-87547FD35BE3@bham.ac.uk>
References: <OF37F2C74A.D272F48C-ON002585F6.003BF987-1601722505378@notes.na.collabserv.com>
	<138651.1602027946@turing-police>
	<FB188F2A-6E9D-49C5-8663-87547FD35BE3@bham.ac.uk>
Message-ID: <e8cf9330-cf1f-e7ad-777e-6aeaade9b29b@strath.ac.uk>

On 07/10/2020 11:28, Simon Thompson wrote:
> Agreed ...
> 
> Report to me a pdisk is failing in my monitoring dashboard we use for *everything else*.
> Tell me that kswapd is having one of those days.
> Tell me rsyslogd has stopped sending for some reason.
> Tell me if there are long waiters on the hosts.
> Read the ipmi status of the host to tell me an OS drive is failed, or the CMOS battery is flat or ...
> 
> Whilst the GUI has a bunch of this stuff, in the real world the rest of us have reporting and dashboarding from many more systems...
> 

The problem is the developers know as much about looking after a system 
in the real world as a tea leaf knows the history of the East India 
Company. IMHO to even ask the question shows a total lack of 
understanding of the issue.

Consequently developers in their ivory towers have a habit of developing 
things that are as useful as a chocolate tea pot. Which putting it 
bluntly a competent sysadmins makes them look like a bunch of twits. I 
would note this is not a problem unique to IBM, it's developers in general.

The appropriate course of action would be not for IBM to develop a 
monitoring tool of their own but to provide a bunch of plugins for the 
popular monitoring tools that customers will already be using to monitor 
their whole IT estate.

Heaven forbid they could even run a poll to find out which ones the 
actual customers of their products are interested in rather than wasting 
effort developing software their customers are not actually interested in.

For my purposes there is I think an alternative. The actual routing of 
the IP packets is not a service, it's a kernel configuration to have the 
kernel route that packets :-) Keepalived just manages a floating IP 
address. There are other options to achieve this. They are clunkier but 
they side step IBM's silly rules.

I would however note at this point that at lots of sites all routing in 
the data centre is done using BGP. It comes in part out of the zero 
trust paradigm. I guess apparently running fail2ban is not permitted 
either. Can I even run firewalld? As you can seen a nothing else policy 
quickly becomes unsustainable IMHO.

There is a disjuncture between the developers in their ivory towers and 
the real world.


JAB.

-- 
Jonathan A. Buzzard                         Tel: +44141-5483420
HPC System Administrator, ARCHIE-WeSt.
University of Strathclyde, John Anderson Building, Glasgow. G4 0NG


From kkr at lbl.gov  Tue Oct 13 22:34:23 2020
From: kkr at lbl.gov (Kristy Kallback-Rose)
Date: Tue, 13 Oct 2020 14:34:23 -0700
Subject: [gpfsug-discuss] SC20 Planning - What questions would you ask a
	panel?
Message-ID: <B8A6C2AA-B4B0-473B-B7EC-C751F0EC9D7C@lbl.gov>

Hi all,

	By now you know SC will be digital this year. We are working towards some SC events for the Spectrum Scale User Group, and using our usual slot of Sunday did not seem like a great idea. So, we?re planning a couple 90-minute sessions and would like to do a panel during one of them. We?ll hope to do live Q&A, like an in-person Ask Me Anything session, but it?s probably a good idea to have a bank of questions ready as well, Also, that way your question may get asked, even if you can?t make the live session ?we?ll record these sessions for later viewing. 

	So, please send your questions for the panel and we?ll get a list built up. Better yet, attend the sessions live! Details to come, but for now, hold these time slots:

November 16th - 8:00 AM Pacific/3:00 PM UTC 

November 18th - 8:00 AM Pacific/3:00 PM UTC 

Best,
Kristy

Kristy Kallback-Rose
Senior HPC Storage Systems Analyst
National Energy Research Scientific Computing Center
Lawrence Berkeley National Laboratory

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20201013/4bb40577/attachment.htm>

From juergen.hannappel at desy.de  Wed Oct 21 17:13:01 2020
From: juergen.hannappel at desy.de (Hannappel, Juergen)
Date: Wed, 21 Oct 2020 18:13:01 +0200 (CEST)
Subject: [gpfsug-discuss] Mounting an nfs share on a CES node
Message-ID: <1195503772.13156505.1603296781279.JavaMail.zimbra@desy.de>

Hi,
I have a CES node exporting some filesystems vis smb and ganesha in a standard CES setup.
Now I want to mount a nfs share from a different, non-CES server on this CES node.
This did not work:
mount -o -fstype=nfs4,minorversion=1,rw,rsize=65536,wsize=65536 some.other.server:/some/path /mnt/
mount.nfs: mount to NFS server 'some.other.server:/some/path' failed: RPC Error: Program unavailable

Does the CES software stack interfere with the nfs client setup? It seems that at least with
rpc-statd there is some conflict:

 systemctl status rpc-statd
? rpc-statd.service - NFS status monitor for NFSv2/3 locking.
   Loaded: loaded (/usr/lib/systemd/system/rpc-statd.service; static; vendor preset: disabled)
   Active: failed (Result: exit-code) since Wed 2020-10-21 17:48:21 CEST; 22min ago
  Process: 19896 ExecStart=/usr/sbin/rpc.statd $STATDARGS (code=exited, status=1/FAILURE)

Oct 21 17:48:21 mynode systemd[1]: Starting NFS status monitor for NFSv2/3 locking....
Oct 21 17:48:21 mynode rpc.statd[19896]: Statd service already running!
Oct 21 17:48:21 mynode systemd[1]: rpc-statd.service: control process exited, code=exited status=1
Oct 21 17:48:21 mynode systemd[1]: Failed to start NFS status monitor for NFSv2/3 locking..
Oct 21 17:48:21 mynode systemd[1]: Unit rpc-statd.service entered failed state.
Oct 21 17:48:21 mynode systemd[1]: rpc-statd.service failed.
-- 
Dr. J?rgen Hannappel  DESY/IT    Tel.  : +49 40 8998-4616


From mnaineni at in.ibm.com  Thu Oct 22 04:38:59 2020
From: mnaineni at in.ibm.com (Malahal R Naineni)
Date: Thu, 22 Oct 2020 03:38:59 +0000
Subject: [gpfsug-discuss] Mounting an nfs share on a CES node
In-Reply-To: <1195503772.13156505.1603296781279.JavaMail.zimbra@desy.de>
References: <1195503772.13156505.1603296781279.JavaMail.zimbra@desy.de>
Message-ID: <OF616CC8F9.AD263CD8-ON00258609.0012EE72-00258609.00140C7B@notes.na.collabserv.com>

An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20201022/5b5db3bb/attachment.htm>

From andi at christiansen.xxx  Tue Oct 27 11:46:02 2020
From: andi at christiansen.xxx (Andi Christiansen)
Date: Tue, 27 Oct 2020 12:46:02 +0100 (CET)
Subject: [gpfsug-discuss] Alternative to Scale S3 API.
Message-ID: <1109480230.484366.1603799162955@privateemail.com>

Hi all,


We have over a longer period used the S3 API within spectrum Scale.. And that has shown that it does not support very many applications because of limitations of the API..


Has anyone got any experience with any other product we can deploy on-top of Spectrum Scale that will give us a true S3 API with full functionalities and able to answer on port 443? As of now we use HAProxy to forware ssl request back and forth from Scale S3 API.


We have looked at MinIO which seems to be fairly simple and maybe might solve a lot of incompatibilities with clients software. But the product seems to be very badly documented at least for me.

The idea is basically that a client uses their backup application(rubrik, veeam etc.) to connect to a domain(for example backup.mycompany.com) with their access and secret key and have access to their bucket only. and it must be over https/ssl.


If someone has any knowledge to minio or any other product that might solve our problem I will be glad to hear from you! ?

Thank you in advance!

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20201027/2ffb4d78/attachment.htm>

From NISHAAN at za.ibm.com  Tue Oct 27 13:38:01 2020
From: NISHAAN at za.ibm.com (Nishaan Docrat)
Date: Tue, 27 Oct 2020 15:38:01 +0200
Subject: [gpfsug-discuss] Alternative to Scale S3 API.
In-Reply-To: <1109480230.484366.1603799162955@privateemail.com>
References: <1109480230.484366.1603799162955@privateemail.com>
Message-ID: <OF408A679D.4CEDC2ED-ON4225860E.00499000-4225860E.004AE49C@notes.na.collabserv.com>

Hi Andi

The current S3 compatibility in Spectrum Scale is delivered via the Swift3
middleware. This middleware has since been replaced by s3api in later
versions of Swift. Spectrum Scale 5.1 will make use of Swift Train release
which will include the new s3api middleware.

I've tested the S3 compatibility with a few applications including Spectrum
Scale itself (i.e. Cloud Data Sharing to another Scale Object store using
S3 API) and Spectrum Protect etc. and haven't had any issue. I've also ran
a few application tools to test for an S3 compliant object stores and again
had no issues.

You can use s3compat to test the current compatibility.. Or you can check
here for the current compatibility..
https://docs.openstack.org/swift/latest/s3_compat.html

Not sure if there is any other way to talk HTTPS without using HAProxy.

In any case, I've documented the process to setup an S3 compliant object
store including supporting virtual-hosted style bucket addressing which you
can find here..

https://www.linkedin.com/feed/update/urn:li:activity:6720227398756909056/

Most storage vendors including minio would not support the full S3 API
stack as alot of the calls are specific to AWS (like billing stuff etc.).

Anyway, good luck with your testing.

Kind Regards

Nishaan Docrat
Client Technical Specialist - Storage Systems
IBM Systems Hardware

Work: +27 (0)11 302 5001
Mobile: +27 (0)81 040 3793
Email: nishaan at za.ibm.com


From:	Andi Christiansen <andi at christiansen.xxx>
To:	"gpfsug-discuss at spectrumscale.org"
            <gpfsug-discuss at spectrumscale.org>
Date:	2020/10/27 13:59
Subject:	[EXTERNAL] [gpfsug-discuss] Alternative to Scale S3 API.
Sent by:	gpfsug-discuss-bounces at spectrumscale.org


Hi all,


We have over a longer period used the S3 API within spectrum Scale.. And
that has shown that it does not support very many applications because of
limitations of the API..


Has anyone got any experience with any other product we can deploy on-top
of Spectrum Scale that will give us a true S3 API with full functionalities
and able to answer on port 443? As of now we use HAProxy to forware ssl
request back and forth from Scale S3 API.


We have looked at MinIO which seems to be fairly simple and maybe might
solve a lot of incompatibilities with clients software. But the product
seems to be very badly documented at least for me.

The idea is basically that a client uses their backup application(rubrik,
veeam etc.) to connect to a domain(for example backup.mycompany.com) with
their access and secret key and have access to their bucket only. and it
must be over https/ssl.


If someone has any knowledge to minio or any other product that might solve
our problem I will be glad to hear from you! ?

Thank you in advance!
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20201027/70cc1bc2/attachment.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 52733301.jpg
Type: image/jpeg
Size: 35643 bytes
Desc: not available
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20201027/70cc1bc2/attachment.jpg>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: graycol.gif
Type: image/gif
Size: 105 bytes
Desc: not available
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20201027/70cc1bc2/attachment.gif>

From andi at christiansen.xxx  Wed Oct 28 06:24:52 2020
From: andi at christiansen.xxx (Andi Christiansen)
Date: Wed, 28 Oct 2020 07:24:52 +0100 (CET)
Subject: [gpfsug-discuss] Alternative to Scale S3 API.
In-Reply-To: <OF408A679D.4CEDC2ED-ON4225860E.00499000-4225860E.004AE49C@notes.na.collabserv.com>
References: <1109480230.484366.1603799162955@privateemail.com>
	<OF408A679D.4CEDC2ED-ON4225860E.00499000-4225860E.004AE49C@notes.na.collabserv.com>
Message-ID: <2126571944.509878.1603866292369@privateemail.com>

Hi Nishaan,

Thanks for you reply.

When you say 5.1? is that 5.1.x.x or 5.0.5.1?

Some of the limitations we have encountered is the multipart upload not being supported and some md5sum that the s3 api does that veeam actually dont like. also interms of the management on the Scale GUI, that has to be on one of the S3 CES nodes in order to be able to show project, container etc... but when you have a HAProxy for enabling SSL then a GUI is not available as they both use port 443? 


i know min.io is not the full stack of S3 API commands but as far as i can read it comes with more features out of the box than Scale S3 does, multipart for an example...

I looked through your documentation and its very close to what we have set up today and found to not work... 

If multipart uploads would be supported today on scale S3 i would think about still using scale for the s3 part but as i expect that you talk about 5.1.x.x i dont see that being released any time soon? and dont know if that is actually going to be supported in that release then i cant wait for that to happen..

Best Regards
Andi Christiansen

>     On 10/27/2020 2:38 PM Nishaan Docrat <nishaan at za.ibm.com> wrote:
> 
> 
> 
>     Hi Andi
> 
>     The current S3 compatibility in Spectrum Scale is delivered via the Swift3 middleware. This middleware has since been replaced by s3api in later versions of Swift. Spectrum Scale 5.1 will make use of Swift Train release which will include the new s3api middleware.
> 
>     I've tested the S3 compatibility with a few applications including Spectrum Scale itself (i.e. Cloud Data Sharing to another Scale Object store using S3 API) and Spectrum Protect etc. and haven't had any issue. I've also ran a few application tools to test for an S3 compliant object stores and again had no issues.
> 
>     You can use s3compat to test the current compatibility.. Or you can check here for the current compatibility.. https://docs.openstack.org/swift/latest/s3_compat.html https://docs.openstack.org/swift/latest/s3_compat.html
> 
>     Not sure if there is any other way to talk HTTPS without using HAProxy.
> 
>     In any case, I've documented the process to setup an S3 compliant object store including supporting virtual-hosted style bucket addressing which you can find here..
> 
>     https://www.linkedin.com/feed/update/urn:li:activity:6720227398756909056/ https://www.linkedin.com/feed/update/urn:li:activity:6720227398756909056/
> 
>     Most storage vendors including minio would not support the full S3 API stack as alot of the calls are specific to AWS (like billing stuff etc.).
> 
>     Anyway, good luck with your testing.
> 
>     Kind Regards
> 
>     Nishaan Docrat
>     Client Technical Specialist - Storage Systems
>     IBM Systems Hardware
> 
>     Work: +27 (0)11 302 5001
>     Mobile: +27 (0)81 040 3793
>     Email: nishaan at za.ibm.com http://www.ibm.com/storage
> 
> 
> 
>     [Inactive hide details for Andi Christiansen ---2020/10/27 13:59:30---Hi all, We have over a longer period used the S3 API withi]Andi Christiansen ---2020/10/27 13:59:30---Hi all, We have over a longer period used the S3 API within spectrum Scale.. And that has shown that
> 
>     From: Andi Christiansen <andi at christiansen.xxx>
>     To: "gpfsug-discuss at spectrumscale.org" <gpfsug-discuss at spectrumscale.org>
>     Date: 2020/10/27 13:59
>     Subject: [EXTERNAL] [gpfsug-discuss] Alternative to Scale S3 API.
>     Sent by: gpfsug-discuss-bounces at spectrumscale.org
> 
> 
>     ---------------------------------------------
> 
> 
> 
>     Hi all,
> 
> 
> 
>     We have over a longer period used the S3 API within spectrum Scale.. And that has shown that it does not support very many applications because of limitations of the API..
> 
> 
> 
>     Has anyone got any experience with any other product we can deploy on-top of Spectrum Scale that will give us a true S3 API with full functionalities and able to answer on port 443? As of now we use HAProxy to forware ssl request back and forth from Scale S3 API.
> 
> 
> 
>     We have looked at MinIO which seems to be fairly simple and maybe might solve a lot of incompatibilities with clients software. But the product seems to be very badly documented at least for me.
> 
>     The idea is basically that a client uses their backup application(rubrik, veeam etc.) to connect to a domain(for example backup.mycompany.com) with their access and secret key and have access to their bucket only. and it must be over https/ssl.
> 
> 
> 
>     If someone has any knowledge to minio or any other product that might solve our problem I will be glad to hear from you! ?
> 
>     Thank you in advance!
>     _______________________________________________
>     gpfsug-discuss mailing list
>     gpfsug-discuss at spectrumscale.org
>     http://gpfsug.org/mailman/listinfo/gpfsug-discuss 
> 
> 
> 
>     _______________________________________________
>     gpfsug-discuss mailing list
>     gpfsug-discuss at spectrumscale.org
>     http://gpfsug.org/mailman/listinfo/gpfsug-discuss
> 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20201028/b16944e4/attachment.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 52733301.jpg
Type: image/jpeg
Size: 35643 bytes
Desc: not available
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20201028/b16944e4/attachment.jpg>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: graycol.gif
Type: image/gif
Size: 105 bytes
Desc: not available
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20201028/b16944e4/attachment.gif>

From NISHAAN at za.ibm.com  Wed Oct 28 06:45:29 2020
From: NISHAAN at za.ibm.com (Nishaan Docrat)
Date: Wed, 28 Oct 2020 08:45:29 +0200
Subject: [gpfsug-discuss] Alternative to Scale S3 API.
In-Reply-To: <2126571944.509878.1603866292369@privateemail.com>
References: <1109480230.484366.1603799162955@privateemail.com>
	<OF408A679D.4CEDC2ED-ON4225860E.00499000-4225860E.004AE49C@notes.na.collabserv.com>
	<2126571944.509878.1603866292369@privateemail.com>
Message-ID: <OF4063F396.B7DA9D54-ON4225860F.0023CA79-4225860F.00251FA3@notes.na.collabserv.com>

Hi Andi

The s3api middleware does support multipart uploads..

https://docs.openstack.org/swift/latest/s3_compat.html

The current version of Swift (PIKE) that is bundled with Spectrum Scale
5.0.X doesn't..
https://www.ibm.com/support/knowledgecenter/en/STXKQY_5.0.5/com.ibm.spectrum.scale.v5r05.doc/bl1adm_ManagingOpenStackACLsviaAmazonS3API.htm

According to the Spectrum Scale Roadmap, 5.1 is due out 2H20.. Not sure if
someone from development can confirm the GA date.

Does Veeam have a test utility? You could always test it using the current
Swift AIO or if you can provide me with a test utility I can test that for
you.

Kind Regards

Nishaan Docrat
Client Technical Specialist - Storage Systems
IBM Systems Hardware

Work: +27 (0)11 302 5001
Mobile: +27 (0)81 040 3793
Email: nishaan at za.ibm.com


From:	Andi Christiansen <andi at christiansen.xxx>
To:	gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>,
            Nishaan Docrat <NISHAAN at za.ibm.com>
Date:	2020/10/28 08:24
Subject:	[EXTERNAL] Re: [gpfsug-discuss] Alternative to Scale S3 API.


Hi Nishaan, Thanks for you reply. When you say 5.1? is that 5.1.x.x or
5.0.5.1? Some...
                                                                            
                                                                            
                   This Message Is From an External Sender                  
                   This message came from outside your organization.        
                                                                            
                                                                            
Hi Nishaan,

Thanks for you reply.

When you say 5.1? is that 5.1.x.x or 5.0.5.1?

Some of the limitations we have encountered is the multipart upload not
being supported and some md5sum that the s3 api does that veeam actually
dont like. also interms of the management on the Scale GUI, that has to be
on one of the S3 CES nodes in order to be able to show project, container
etc... but when you have a HAProxy for enabling SSL then a GUI is not
available as they both use port 443?


i know min.io is not the full stack of S3 API commands but as far as i can
read it comes with more features out of the box than Scale S3 does,
multipart for an example...

I looked through your documentation and its very close to what we have set
up today and found to not work...

If multipart uploads would be supported today on scale S3 i would think
about still using scale for the s3 part but as i expect that you talk about
5.1.x.x i dont see that being released any time soon? and dont know if that
is actually going to be supported in that release then i cant wait for that
to happen..

Best Regards
Andi Christiansen
      On 10/27/2020 2:38 PM Nishaan Docrat <nishaan at za.ibm.com> wrote:


      Hi Andi

      The current S3 compatibility in Spectrum Scale is delivered via the
      Swift3 middleware. This middleware has since been replaced by s3api
      in later versions of Swift. Spectrum Scale 5.1 will make use of Swift
      Train release which will include the new s3api middleware.

      I've tested the S3 compatibility with a few applications including
      Spectrum Scale itself (i.e. Cloud Data Sharing to another Scale
      Object store using S3 API) and Spectrum Protect etc. and haven't had
      any issue. I've also ran a few application tools to test for an S3
      compliant object stores and again had no issues.

      You can use s3compat to test the current compatibility.. Or you can
      check here for the current compatibility..
      https://docs.openstack.org/swift/latest/s3_compat.html

      Not sure if there is any other way to talk HTTPS without using
      HAProxy.

      In any case, I've documented the process to setup an S3 compliant
      object store including supporting virtual-hosted style bucket
      addressing which you can find here..

      https://www.linkedin.com/feed/update/urn:li:activity:6720227398756909056/


      Most storage vendors including minio would not support the full S3
      API stack as alot of the calls are specific to AWS (like billing
      stuff etc.).

      Anyway, good luck with your testing.

      Kind Regards

      Nishaan Docrat
      Client Technical Specialist - Storage Systems
      IBM Systems Hardware

      Work: +27 (0)11 302 5001
      Mobile: +27 (0)81 040 3793
      Email: nishaan at za.ibm.com


      Inactive hide details for Andi Christiansen ---2020/10/27
      13:59:30---Hi all, We have over a longer period used the S3 API withi
      Andi Christiansen ---2020/10/27 13:59:30---Hi all, We have over a
      longer period used the S3 API within spectrum Scale.. And that has
      shown that

      From: Andi Christiansen <andi at christiansen.xxx>
      To: "gpfsug-discuss at spectrumscale.org"
      <gpfsug-discuss at spectrumscale.org>
      Date: 2020/10/27 13:59
      Subject: [EXTERNAL] [gpfsug-discuss] Alternative to Scale S3 API.
      Sent by: gpfsug-discuss-bounces at spectrumscale.org


      Hi all,


      We have over a longer period used the S3 API within spectrum Scale..
      And that has shown that it does not support very many applications
      because of limitations of the API..


      Has anyone got any experience with any other product we can deploy
      on-top of Spectrum Scale that will give us a true S3 API with full
      functionalities and able to answer on port 443? As of now we use
      HAProxy to forware ssl request back and forth from Scale S3 API.


      We have looked at MinIO which seems to be fairly simple and maybe
      might solve a lot of incompatibilities with clients software. But the
      product seems to be very badly documented at least for me.

      The idea is basically that a client uses their backup application
      (rubrik, veeam etc.) to connect to a domain(for example
      backup.mycompany.com) with their access and secret key and have
      access to their bucket only. and it must be over https/ssl.


      If someone has any knowledge to minio or any other product that might
      solve our problem I will be glad to hear from you! ?

      Thank you in advance!
      _______________________________________________
      gpfsug-discuss mailing list
      gpfsug-discuss at spectrumscale.org
      http://gpfsug.org/mailman/listinfo/gpfsug-discuss


      _______________________________________________
      gpfsug-discuss mailing list
      gpfsug-discuss at spectrumscale.org
      http://gpfsug.org/mailman/listinfo/gpfsug-discuss


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20201028/4c5ddab4/attachment.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 19991351.jpg
Type: image/jpeg
Size: 35643 bytes
Desc: not available
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20201028/4c5ddab4/attachment.jpg>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: graycol.gif
Type: image/gif
Size: 105 bytes
Desc: not available
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20201028/4c5ddab4/attachment.gif>

From NISHAAN at za.ibm.com  Wed Oct 28 07:12:55 2020
From: NISHAAN at za.ibm.com (Nishaan Docrat)
Date: Wed, 28 Oct 2020 09:12:55 +0200
Subject: [gpfsug-discuss] Alternative to Scale S3 API.
In-Reply-To: <2126571944.509878.1603866292369@privateemail.com>
References: <1109480230.484366.1603799162955@privateemail.com>
	<OF408A679D.4CEDC2ED-ON4225860E.00499000-4225860E.004AE49C@notes.na.collabserv.com>
	<2126571944.509878.1603866292369@privateemail.com>
Message-ID: <OF5F31EC6E.6235CA80-ON4225860F.0027378A-4225860F.0027A28C@notes.na.collabserv.com>

Hi Andi

Sorry forgot to mention that I was told 5.1 will include the Swift Train
release (2.23). The change from swift3 middleware to s3api was done in the
Queens release (2.18)  so 5.1 will definitely have multipart support.

Kind Regards

Nishaan Docrat
Client Technical Specialist - Storage Systems
IBM Systems Hardware

Work: +27 (0)11 302 5001
Mobile: +27 (0)81 040 3793
Email: nishaan at za.ibm.com


From:	Andi Christiansen <andi at christiansen.xxx>
To:	gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>,
            Nishaan Docrat <NISHAAN at za.ibm.com>
Date:	2020/10/28 08:24
Subject:	[EXTERNAL] Re: [gpfsug-discuss] Alternative to Scale S3 API.


Hi Nishaan, Thanks for you reply. When you say 5.1? is that 5.1.x.x or
5.0.5.1? Some...
                                                                            
                                                                            
                   This Message Is From an External Sender                  
                   This message came from outside your organization.        
                                                                            
                                                                            
Hi Nishaan,

Thanks for you reply.

When you say 5.1? is that 5.1.x.x or 5.0.5.1?

Some of the limitations we have encountered is the multipart upload not
being supported and some md5sum that the s3 api does that veeam actually
dont like. also interms of the management on the Scale GUI, that has to be
on one of the S3 CES nodes in order to be able to show project, container
etc... but when you have a HAProxy for enabling SSL then a GUI is not
available as they both use port 443?


i know min.io is not the full stack of S3 API commands but as far as i can
read it comes with more features out of the box than Scale S3 does,
multipart for an example...

I looked through your documentation and its very close to what we have set
up today and found to not work...

If multipart uploads would be supported today on scale S3 i would think
about still using scale for the s3 part but as i expect that you talk about
5.1.x.x i dont see that being released any time soon? and dont know if that
is actually going to be supported in that release then i cant wait for that
to happen..

Best Regards
Andi Christiansen
      On 10/27/2020 2:38 PM Nishaan Docrat <nishaan at za.ibm.com> wrote:


      Hi Andi

      The current S3 compatibility in Spectrum Scale is delivered via the
      Swift3 middleware. This middleware has since been replaced by s3api
      in later versions of Swift. Spectrum Scale 5.1 will make use of Swift
      Train release which will include the new s3api middleware.

      I've tested the S3 compatibility with a few applications including
      Spectrum Scale itself (i.e. Cloud Data Sharing to another Scale
      Object store using S3 API) and Spectrum Protect etc. and haven't had
      any issue. I've also ran a few application tools to test for an S3
      compliant object stores and again had no issues.

      You can use s3compat to test the current compatibility.. Or you can
      check here for the current compatibility..
      https://docs.openstack.org/swift/latest/s3_compat.html

      Not sure if there is any other way to talk HTTPS without using
      HAProxy.

      In any case, I've documented the process to setup an S3 compliant
      object store including supporting virtual-hosted style bucket
      addressing which you can find here..

      https://www.linkedin.com/feed/update/urn:li:activity:6720227398756909056/


      Most storage vendors including minio would not support the full S3
      API stack as alot of the calls are specific to AWS (like billing
      stuff etc.).

      Anyway, good luck with your testing.

      Kind Regards

      Nishaan Docrat
      Client Technical Specialist - Storage Systems
      IBM Systems Hardware

      Work: +27 (0)11 302 5001
      Mobile: +27 (0)81 040 3793
      Email: nishaan at za.ibm.com


      Inactive hide details for Andi Christiansen ---2020/10/27
      13:59:30---Hi all, We have over a longer period used the S3 API withi
      Andi Christiansen ---2020/10/27 13:59:30---Hi all, We have over a
      longer period used the S3 API within spectrum Scale.. And that has
      shown that

      From: Andi Christiansen <andi at christiansen.xxx>
      To: "gpfsug-discuss at spectrumscale.org"
      <gpfsug-discuss at spectrumscale.org>
      Date: 2020/10/27 13:59
      Subject: [EXTERNAL] [gpfsug-discuss] Alternative to Scale S3 API.
      Sent by: gpfsug-discuss-bounces at spectrumscale.org


      Hi all,


      We have over a longer period used the S3 API within spectrum Scale..
      And that has shown that it does not support very many applications
      because of limitations of the API..


      Has anyone got any experience with any other product we can deploy
      on-top of Spectrum Scale that will give us a true S3 API with full
      functionalities and able to answer on port 443? As of now we use
      HAProxy to forware ssl request back and forth from Scale S3 API.


      We have looked at MinIO which seems to be fairly simple and maybe
      might solve a lot of incompatibilities with clients software. But the
      product seems to be very badly documented at least for me.

      The idea is basically that a client uses their backup application
      (rubrik, veeam etc.) to connect to a domain(for example
      backup.mycompany.com) with their access and secret key and have
      access to their bucket only. and it must be over https/ssl.


      If someone has any knowledge to minio or any other product that might
      solve our problem I will be glad to hear from you! ?

      Thank you in advance!
      _______________________________________________
      gpfsug-discuss mailing list
      gpfsug-discuss at spectrumscale.org
      http://gpfsug.org/mailman/listinfo/gpfsug-discuss


      _______________________________________________
      gpfsug-discuss mailing list
      gpfsug-discuss at spectrumscale.org
      http://gpfsug.org/mailman/listinfo/gpfsug-discuss


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20201028/0aaf9fcd/attachment.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 17810834.jpg
Type: image/jpeg
Size: 35643 bytes
Desc: not available
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20201028/0aaf9fcd/attachment.jpg>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: graycol.gif
Type: image/gif
Size: 105 bytes
Desc: not available
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20201028/0aaf9fcd/attachment.gif>

From luis.bolinches at fi.ibm.com  Wed Oct 28 07:15:21 2020
From: luis.bolinches at fi.ibm.com (Luis Bolinches)
Date: Wed, 28 Oct 2020 07:15:21 +0000
Subject: [gpfsug-discuss] Alternative to Scale S3 API.
In-Reply-To: <OF4063F396.B7DA9D54-ON4225860F.0023CA79-4225860F.00251FA3@notes.na.collabserv.com>
Message-ID: <OF06DC9BE9.BBFA65F5-ON0025860F.0027441B-0025860F.0027DBD0@notes.na.collabserv.com>

An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20201028/91a8e7e5/attachment.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: Image.1__=CDBB0C9CDFB04CE98f9e8a93df938690918cCDB at .jpg
Type: image/jpeg
Size: 35643 bytes
Desc: not available
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20201028/91a8e7e5/attachment.jpg>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: Image.2__=CDBB0C9CDFB04CE98f9e8a93df938690918cCDB at .gif
Type: image/gif
Size: 105 bytes
Desc: not available
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20201028/91a8e7e5/attachment.gif>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: Image.1__=CDBB0C9CDFB04CE98f9e8a93df938690918cCDB at .jpg
Type: image/jpeg
Size: 35643 bytes
Desc: not available
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20201028/91a8e7e5/attachment-0001.jpg>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: Image.2__=CDBB0C9CDFB04CE98f9e8a93df938690918cCDB at .gif
Type: image/gif
Size: 105 bytes
Desc: not available
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20201028/91a8e7e5/attachment-0001.gif>

From NISHAAN at za.ibm.com  Wed Oct 28 07:45:45 2020
From: NISHAAN at za.ibm.com (Nishaan Docrat)
Date: Wed, 28 Oct 2020 09:45:45 +0200
Subject: [gpfsug-discuss] Alternative to Scale S3 API.
In-Reply-To: <OF06DC9BE9.BBFA65F5-ON0025860F.0027441B-0025860F.0027DBD0@notes.na.collabserv.com>
References: <OF4063F396.B7DA9D54-ON4225860F.0023CA79-4225860F.00251FA3@notes.na.collabserv.com>
	<OF06DC9BE9.BBFA65F5-ON0025860F.0027441B-0025860F.0027DBD0@notes.na.collabserv.com>
Message-ID: <OF047A1414.5894B72F-ON4225860F.002A5F54-4225860F.002AA438@notes.na.collabserv.com>

Hi Luis

Thanks for your reply.. It should address Andi's issue as the underlying
Swift version is what is important and the functionality he needs is in the
latest releases (I was told 5.1 includes Swift Train which is the latest
version).

Am sure there is a beta program for Spectrum Scale.. Perhaps Andi should
speak to his software sales rep and ask to be included on it to get access
so that he can test.

Kind Regards

Nishaan Docrat
Client Technical Specialist - Storage Systems
IBM Systems Hardware

Work: +27 (0)11 302 5001
Mobile: +27 (0)81 040 3793
Email: nishaan at za.ibm.com


From:	"Luis Bolinches" <luis.bolinches at fi.ibm.com>
To:	gpfsug-discuss at spectrumscale.org
Cc:	gpfsug-discuss at spectrumscale.org
Date:	2020/10/28 09:29
Subject:	[EXTERNAL] Re: [gpfsug-discuss] Alternative to Scale S3 API.
Sent by:	gpfsug-discuss-bounces at spectrumscale.org


Hi

5.1.x is going GA very soon (TM). Would it address the issues Andi sees on
his environment or not I cannot say.

I can take it with Andi for more details on the GA date


--
Yst?v?llisin terveisin / Kind regards / Saludos cordiales / Salutations /
Salutacions
Luis Bolinches
Consultant IT Specialist
IBM Spectrum Scale development
Mobile Phone: +358503112585

https://www.youracclaim.com/user/luis-bolinches

Ab IBM Finland Oy
Laajalahdentie 23
00330 Helsinki
Uusimaa - Finland

"If you always give you will always have" --  Anonymous


 ----- Original message -----
 From: "Nishaan Docrat" <NISHAAN at za.ibm.com>
 Sent by: gpfsug-discuss-bounces at spectrumscale.org
 To: Andi Christiansen <andi at christiansen.xxx>
 Cc: gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
 Subject: [EXTERNAL] Re: [gpfsug-discuss] Alternative to Scale S3 API.
 Date: Wed, Oct 28, 2020 08:47


 Hi Andi

 The s3api middleware does support multipart uploads..

 https://docs.openstack.org/swift/latest/s3_compat.html

 The current version of Swift (PIKE) that is bundled with Spectrum Scale
 5.0.X doesn't..
 https://www.ibm.com/support/knowledgecenter/en/STXKQY_5.0.5/com.ibm.spectrum.scale.v5r05.doc/bl1adm_ManagingOpenStackACLsviaAmazonS3API.htm


 According to the Spectrum Scale Roadmap, 5.1 is due out 2H20.. Not sure if
 someone from development can confirm the GA date.

 Does Veeam have a test utility? You could always test it using the current
 Swift AIO or if you can provide me with a test utility I can test that for
 you.

 Kind Regards

 Nishaan Docrat
 Client Technical Specialist - Storage Systems
 IBM Systems Hardware

 Work: +27 (0)11 302 5001
 Mobile: +27 (0)81 040 3793
 Email: nishaan at za.ibm.com


 Inactive hide details for Andi Christiansen ---2020/10/28 08:24:55---Hi
 Nishaan, Thanks for you reply.Andi Christiansen ---2020/10/28
 08:24:55---Hi Nishaan, Thanks for you reply.

 From: Andi Christiansen <andi at christiansen.xxx>
 To: gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>,
 Nishaan Docrat <NISHAAN at za.ibm.com>
 Date: 2020/10/28 08:24
 Subject: [EXTERNAL] Re: [gpfsug-discuss] Alternative to Scale S3 API.


 Hi Nishaan, Thanks for you reply. When you say 5.1? is that 5.1.x.x or
 5.0.5.1? Some...
                                                                            
                                                                            
               This Message Is From an External Sender                      
               This message came from outside your organization.            
                                                                            
                                                                            
 Hi Nishaan,

 Thanks for you reply.

 When you say 5.1? is that 5.1.x.x or 5.0.5.1?

 Some of the limitations we have encountered is the multipart upload not
 being supported and some md5sum that the s3 api does that veeam actually
 dont like. also interms of the management on the Scale GUI, that has to be
 on one of the S3 CES nodes in order to be able to show project, container
 etc... but when you have a HAProxy for enabling SSL then a GUI is not
 available as they both use port 443?


 i know min.io is not the full stack of S3 API commands but as far as i can
 read it comes with more features out of the box than Scale S3 does,
 multipart for an example...

 I looked through your documentation and its very close to what we have set
 up today and found to not work...

 If multipart uploads would be supported today on scale S3 i would think
 about still using scale for the s3 part but as i expect that you talk
 about 5.1.x.x i dont see that being released any time soon? and dont know
 if that is actually going to be supported in that release then i cant wait
 for that to happen..

 Best Regards
 Andi Christiansen
             On 10/27/2020 2:38 PM Nishaan Docrat <nishaan at za.ibm.com>
             wrote:


             Hi Andi

             The current S3 compatibility in Spectrum Scale is delivered
             via the Swift3 middleware. This middleware has since been
             replaced by s3api in later versions of Swift. Spectrum Scale
             5.1 will make use of Swift Train release which will include
             the new s3api middleware.

             I've tested the S3 compatibility with a few applications
             including Spectrum Scale itself (i.e. Cloud Data Sharing to
             another Scale Object store using S3 API) and Spectrum Protect
             etc. and haven't had any issue. I've also ran a few
             application tools to test for an S3 compliant object stores
             and again had no issues.

             You can use s3compat to test the current compatibility.. Or
             you can check here for the current compatibility..
             https://docs.openstack.org/swift/latest/s3_compat.html

             Not sure if there is any other way to talk HTTPS without using
             HAProxy.

             In any case, I've documented the process to setup an S3
             compliant object store including supporting virtual-hosted
             style bucket addressing which you can find here..

             https://www.linkedin.com/feed/update/urn:li:activity:6720227398756909056/


             Most storage vendors including minio would not support the
             full S3 API stack as alot of the calls are specific to AWS
             (like billing stuff etc.).

             Anyway, good luck with your testing.

             Kind Regards

             Nishaan Docrat
             Client Technical Specialist - Storage Systems
             IBM Systems Hardware

             Work: +27 (0)11 302 5001
             Mobile: +27 (0)81 040 3793
             Email: nishaan at za.ibm.com


             Inactive hide details for Andi Christiansen ---2020/10/27
             13:59:30---Hi all, We have over a longer period used the S3
             API withiAndi Christiansen ---2020/10/27 13:59:30---Hi all, We
             have over a longer period used the S3 API within spectrum
             Scale.. And that has shown that

             From: Andi Christiansen <andi at christiansen.xxx>
             To: "gpfsug-discuss at spectrumscale.org"
             <gpfsug-discuss at spectrumscale.org>
             Date: 2020/10/27 13:59
             Subject: [EXTERNAL] [gpfsug-discuss] Alternative to Scale S3
             API.
             Sent by: gpfsug-discuss-bounces at spectrumscale.org


             Hi all,


             We have over a longer period used the S3 API within spectrum
             Scale.. And that has shown that it does not support very many
             applications because of limitations of the API..


             Has anyone got any experience with any other product we can
             deploy on-top of Spectrum Scale that will give us a true S3
             API with full functionalities and able to answer on port 443?
             As of now we use HAProxy to forware ssl request back and forth
             from Scale S3 API.


             We have looked at MinIO which seems to be fairly simple and
             maybe might solve a lot of incompatibilities with clients
             software. But the product seems to be very badly documented at
             least for me.

             The idea is basically that a client uses their backup
             application(rubrik, veeam etc.) to connect to a domain(for
             example backup.mycompany.com) with their access and secret key
             and have access to their bucket only. and it must be over
             https/ssl.


             If someone has any knowledge to minio or any other product
             that might solve our problem I will be glad to hear from you!
             ?

             Thank you in advance!
             _______________________________________________
             gpfsug-discuss mailing list
             gpfsug-discuss at spectrumscale.org
             http://gpfsug.org/mailman/listinfo/gpfsug-discuss


             _______________________________________________
             gpfsug-discuss mailing list
             gpfsug-discuss at spectrumscale.org
             http://gpfsug.org/mailman/listinfo/gpfsug-discuss


 _______________________________________________
 gpfsug-discuss mailing list
 gpfsug-discuss at spectrumscale.org
 http://gpfsug.org/mailman/listinfo/gpfsug-discuss


Ellei edell? ole toisin mainittu: / Unless stated otherwise above:
Oy IBM Finland Ab
PL 265, 00101 Helsinki, Finland
Business ID, Y-tunnus: 0195876-3
Registered in Finland
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20201028/3cef7908/attachment.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 16781831.jpg
Type: image/jpeg
Size: 35643 bytes
Desc: not available
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20201028/3cef7908/attachment.jpg>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: graycol.gif
Type: image/gif
Size: 105 bytes
Desc: not available
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20201028/3cef7908/attachment.gif>

From luis.bolinches at fi.ibm.com  Wed Oct 28 07:51:30 2020
From: luis.bolinches at fi.ibm.com (Luis Bolinches)
Date: Wed, 28 Oct 2020 07:51:30 +0000
Subject: [gpfsug-discuss] Alternative to Scale S3 API.
In-Reply-To: <OF047A1414.5894B72F-ON4225860F.002A5F54-4225860F.002AA438@notes.na.collabserv.com>
References: <OF047A1414.5894B72F-ON4225860F.002A5F54-4225860F.002AA438@notes.na.collabserv.com>,
	<OF4063F396.B7DA9D54-ON4225860F.0023CA79-4225860F.00251FA3@notes.na.collabserv.com><OF06DC9BE9.BBFA65F5-ON0025860F.0027441B-0025860F.0027DBD0@notes.na.collabserv.com>
Message-ID: <OF2F6C2E30.34B1F144-ON0025860F.002B29E9-0025860F.002B2B09@notes.na.collabserv.com>

An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20201028/e02ad49e/attachment.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: Image.1__=CDBB0C9CDFB9D9C48f9e8a93df938690918cCDB at .jpg
Type: image/jpeg
Size: 35643 bytes
Desc: not available
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20201028/e02ad49e/attachment.jpg>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: Image.2__=CDBB0C9CDFB9D9C48f9e8a93df938690918cCDB at .gif
Type: image/gif
Size: 105 bytes
Desc: not available
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20201028/e02ad49e/attachment.gif>

From Robert.Oesterlin at nuance.com  Thu Oct 29 11:16:13 2020
From: Robert.Oesterlin at nuance.com (Oesterlin, Robert)
Date: Thu, 29 Oct 2020 11:16:13 +0000
Subject: [gpfsug-discuss] SSUG Digital Expert Talk: 11/4 - AI workloads on
 NVIDIA DGX, Red Hat OpenShift and IBM Spectrum Scale
Message-ID: <77EA43ED-C430-42CA-872E-D2307F244775@nuance.com>

Reminder for our  upcoming expert talk:

SSUG::Digital: Scalable multi-node training for AI workloads on NVIDIA DGX, Red Hat OpenShift and IBM Spectrum Scale
November 4 @ 16:15 - 17:45 GMT

Nvidia and IBM did a complex proof-of-concept to demonstrate the scaling of AI workload using Nvidia DGX, Red Hat OpenShift and IBM Spectrum Scale at the example of ResNet-50 and the segmentation of images using the Audi A2D2 dataset. The project team published an IBM Redpaper with all the technical details and will present the key learnings and results.

Registration link for Webex session: https://www.spectrumscaleug.org/event/ssugdigital-multi-node-training-for-ai-workloads/


Bob Oesterlin
Sr Principal Storage Engineer, Nuance

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20201029/38f0946b/attachment.htm>

From kkr at lbl.gov  Thu Oct 29 21:43:02 2020
From: kkr at lbl.gov (Kristy Kallback-Rose)
Date: Thu, 29 Oct 2020 14:43:02 -0700
Subject: [gpfsug-discuss] SC20 Planning - What questions would you ask a
	panel?
In-Reply-To: <B8A6C2AA-B4B0-473B-B7EC-C751F0EC9D7C@lbl.gov>
References: <B8A6C2AA-B4B0-473B-B7EC-C751F0EC9D7C@lbl.gov>
Message-ID: <3D8238FB-F8A5-48F1-BA5C-57AC93DCDE35@lbl.gov>

Really? There?s nothing you want to ask about GPFS/Spectrum Scale? There will be access to developers and management alike, so I have to imagine you have something to ask? Don?t be shy. 

Please help make this a lively discussion by submitting a question, or two. 

Best,
Kristy

> On Oct 13, 2020, at 2:34 PM, Kristy Kallback-Rose <kkr at lbl.gov> wrote:
> 
> Hi all,
> 
> 	By now you know SC will be digital this year. We are working towards some SC events for the Spectrum Scale User Group, and using our usual slot of Sunday did not seem like a great idea. So, we?re planning a couple 90-minute sessions and would like to do a panel during one of them. We?ll hope to do live Q&A, like an in-person Ask Me Anything session, but it?s probably a good idea to have a bank of questions ready as well, Also, that way your question may get asked, even if you can?t make the live session ?we?ll record these sessions for later viewing. 
> 
> 	So, please send your questions for the panel and we?ll get a list built up. Better yet, attend the sessions live! Details to come, but for now, hold these time slots:
> 
> November 16th - 8:00 AM Pacific/3:00 PM UTC 
> 
> November 18th - 8:00 AM Pacific/3:00 PM UTC 
> 
> Best,
> Kristy
> 
> Kristy Kallback-Rose
> Senior HPC Storage Systems Analyst
> National Energy Research Scientific Computing Center
> Lawrence Berkeley National Laboratory
> 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20201029/ec75fc5e/attachment.htm>

From kkr at lbl.gov  Thu Oct 29 21:49:34 2020
From: kkr at lbl.gov (Kristy Kallback-Rose)
Date: Thu, 29 Oct 2020 14:49:34 -0700
Subject: [gpfsug-discuss] SC20  Sessions - Dates and times are settled,
	please join us!
Message-ID: <8BECE369-B5B4-404F-B4C0-07EE02DE6295@lbl.gov>

Hi all,

	The Spectrum Scale User Group will be hosting two 90 minute sessions at SC20 this year and we hope you can join us. The first one is:

 "Storage for AI" and will be held Monday, Nov. 16th, from 11:00-12:30 EST 

and the second one is 

"What's new in Spectrum Scale 5.1?" and will be held Wednesday, Nov. 18th from 11:00-12:30 EST.  

Please see the calendar at https://www.spectrumscaleug.org/eventslist/2020-11/ and register by clicking on a session on the calendar and then the "Please register here to join the session" link.

Best,
Kristy

Kristy Kallback-Rose
Senior HPC Storage Systems Analyst
National Energy Research Scientific Computing Center
Lawrence Berkeley National Laboratory


From heinrich.billich at id.ethz.ch  Fri Oct 30 12:21:58 2020
From: heinrich.billich at id.ethz.ch (Billich  Heinrich Rainer (ID SD))
Date: Fri, 30 Oct 2020 12:21:58 +0000
Subject: [gpfsug-discuss] 'ganesha_mgr display_export - client not listed
Message-ID: <660DD807-C723-44EF-BC51-57EFB296FFC4@id.ethz.ch>

Hello,

Some nfsv4 client of ganesha does not show up in the output of 'ganesha_mgr display_export'. The client has an active mount, but also shows some nfs issues, some commands did hang, the process just stays in state D (uninterruptible sleep) according to 'ps', but not the whole mount.  

I just wonder if the client's IP should always show up in the output of display_export once the client did issue a mount call and if the absence indicates that something is broken. 
Gutr,gut,
Putting it the other way round: When is a client listed in the output of display_export and when is it removed from the list?

We do collect more debug data, this is just something that catched my eye.
Thank you,

Heiner

We run ganesha 2.7.5-ibm058.05 on a spectrum scale system on RedHat 7.7.

I crosspost to the gpfsug mailing list.

-- 
=======================
Heinrich Billich
ETH Z?rich
Informatikdienste
Tel.: +41 44 632 72 56
heinrich.billich at id.ethz.ch
========================
 
 
# ganesha_mgr  display_export 37
Display export with id 37
export 37: path = /xxxx/yyy, pseudo = /xxx/yyy , tag = /xxx/yyy
 Client type,  CIDR version, CIDR address, CIDR mask, CIDR proto, Anonymous UID, Anonymous GID, Attribute timeout, Options, Set
  a.b.c.198/32,  0,  0,  255,  1,  4294967294,  4294967294,  0,  1126195680, 1081209831
 a.b.c.143/32,  0,  0,  255,  1,  4294967294,  4294967294,  0,  1126195680, 1081209831
 a.b.c.236/32,  0,  0,  255,  1,  4294967294,  4294967294,  0,  1126195680, 1081209831
 a.b.c.34/32,  0,  0,  255,  1,  4294967294,  4294967294,  0,  1126195680, 1081209831
 a.b.c.70/32,  0,  0,  255,  1,  4294967294,  4294967294,  0,  1126195680, 1081209831
 a.b.c.71/32,  0,  0,  255,  1,  4294967294,  4294967294,  0,  1126195680, 1081209831
 *,  0,  0,  0,  0,  4294967294,  4294967294,  0,  1126187490, 1081209831


From skylar2 at uw.edu  Fri Oct 30 14:01:37 2020
From: skylar2 at uw.edu (Skylar Thompson)
Date: Fri, 30 Oct 2020 07:01:37 -0700
Subject: [gpfsug-discuss] SC20 Planning - What questions would you ask a
 panel?
In-Reply-To: <3D8238FB-F8A5-48F1-BA5C-57AC93DCDE35@lbl.gov>
References: <B8A6C2AA-B4B0-473B-B7EC-C751F0EC9D7C@lbl.gov>
	<3D8238FB-F8A5-48F1-BA5C-57AC93DCDE35@lbl.gov>
Message-ID: <20201030140137.hakhxwppcmaoixy6@thargelion>

Here's one:

How is IBM working to improve the integration between TSM and GPFS? We're
in the biomedical space and have some overlapping regulatory requirements
around retention, which translate to complicated INCLUDE/EXCLUDE rules that
mmbackup has always had trouble processing. In particular, we need to be
able to INCLUDE particular paths to set a management class, but then
EXCLUDE particular paths, which results in mmbackup generating 
file lists for dsmc including those excluded paths, which dsmc can exclude
but it logs every single one every time it runs.

On Thu, Oct 29, 2020 at 02:43:02PM -0700, Kristy Kallback-Rose wrote:
> Really? There???s nothing you want to ask about GPFS/Spectrum Scale? There will be access to developers and management alike, so I have to imagine you have something to ask??? Don???t be shy. 
> 
> Please help make this a lively discussion by submitting a question, or two. 
> 
> Best,
> Kristy
> 
> > On Oct 13, 2020, at 2:34 PM, Kristy Kallback-Rose <kkr at lbl.gov> wrote:
> > 
> > Hi all,
> > 
> > 	By now you know SC will be digital this year. We are working towards some SC events for the Spectrum Scale User Group, and using our usual slot of Sunday did not seem like a great idea. So, we???re planning a couple 90-minute sessions and would like to do a panel during one of them. We???ll hope to do live Q&A, like an in-person Ask Me Anything session, but it???s probably a good idea to have a bank of questions ready as well, Also, that way your question may get asked, even if you can???t make the live session ???we???ll record these sessions for later viewing. 
> > 
> > 	So, please send your questions for the panel and we???ll get a list built up. Better yet, attend the sessions live! Details to come, but for now, hold these time slots:
> > 
> > November 16th - 8:00 AM Pacific/3:00 PM UTC 
> > 
> > November 18th - 8:00 AM Pacific/3:00 PM UTC 
> > 
> > Best,
> > Kristy
> > 
> > Kristy Kallback-Rose
> > Senior HPC Storage Systems Analyst
> > National Energy Research Scientific Computing Center
> > Lawrence Berkeley National Laboratory
> > 
> 

> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss


-- 
-- Skylar Thompson (skylar2 at u.washington.edu)
-- Genome Sciences Department (UW Medicine), System Administrator
-- Foege Building S046, (206)-685-7354
-- Pronouns: He/Him/His


From cblack at nygenome.org  Fri Oct 30 14:19:24 2020
From: cblack at nygenome.org (Christopher Black)
Date: Fri, 30 Oct 2020 14:19:24 +0000
Subject: [gpfsug-discuss] SC20  Sessions - Dates and times are settled,
 please join us!
In-Reply-To: <8BECE369-B5B4-404F-B4C0-07EE02DE6295@lbl.gov>
References: <8BECE369-B5B4-404F-B4C0-07EE02DE6295@lbl.gov>
Message-ID: <62E7471D-02B9-4C27-B0F0-4038CCB2C66E@nygenome.org>

Could you talk about upcoming work to address excessive prefetch when reading small fractions of many large files?
Some bioinformatics workloads have a client node reading relatively small regions of multiple 50GB+ files. We've seen this trigger excessive prefetch bandwidth (especially on 16MB block filesystem). Investigation shows that much of the prefetched data is never read, but cache gets full, evicts blocks, then more prefetch happens. We can avoid this by turning prefetch off, but that reduces speed of other workloads that read full files sequentially.  Turning prefetch on and off based on job won't work well for our users.

We've heard this would be addressed in gpfs 5.1 at the earliest and have provided an example workload to devs. They've done some great analysis and determined the problem is worse on large (16M) block filesystems (which are now the recommended and default on new ess filesystems with sub-block allocation enabled).

Best,
Chris

?On 10/29/20, 5:49 PM, "gpfsug-discuss-bounces at spectrumscale.org on behalf of Kristy Kallback-Rose" <gpfsug-discuss-bounces at spectrumscale.org on behalf of kkr at lbl.gov> wrote:

    Hi all,

    The Spectrum Scale User Group will be hosting two 90 minute sessions at SC20 this year and we hope you can join us. The first one is:

     "Storage for AI" and will be held Monday, Nov. 16th, from 11:00-12:30 EST

    and the second one is

    "What's new in Spectrum Scale 5.1?" and will be held Wednesday, Nov. 18th from 11:00-12:30 EST.

    Please see the calendar at https://urldefense.com/v3/__https://www.spectrumscaleug.org/eventslist/2020-11/__;!!C6sPl7C9qQ!G0wT65UH3HoMnjBM6_ZAVfZwWwJz5SoLE5gpB_LM0N8SNSU3TXItF31dfxG_8Pow$  and register by clicking on a session on the calendar and then the "Please register here to join the session" link.

    Best,
    Kristy

    Kristy Kallback-Rose
    Senior HPC Storage Systems Analyst
    National Energy Research Scientific Computing Center
    Lawrence Berkeley National Laboratory

    _______________________________________________
    gpfsug-discuss mailing list
    gpfsug-discuss at spectrumscale.org
    https://urldefense.com/v3/__http://gpfsug.org/mailman/listinfo/gpfsug-discuss__;!!C6sPl7C9qQ!G0wT65UH3HoMnjBM6_ZAVfZwWwJz5SoLE5gpB_LM0N8SNSU3TXItF31df0lybvoA$

________________________________

This message is for the recipient?s use only, and may contain confidential, privileged or protected information. Any unauthorized use or dissemination of this communication is prohibited. If you received this message in error, please immediately notify the sender and destroy all copies of this message. The recipient should check this email and any attachments for the presence of viruses, as we accept no liability for any damage caused by any virus transmitted by this email.

From jonathan.buzzard at strath.ac.uk  Fri Oct  2 17:14:12 2020
From: jonathan.buzzard at strath.ac.uk (Jonathan Buzzard)
Date: Fri, 2 Oct 2020 17:14:12 +0100
Subject: [gpfsug-discuss] Services on DSS/ESS nodes
Message-ID: <38bbcd24-4cc2-2abe-754f-3017d71c1eaa@strath.ac.uk>


What if any are the rules around running additional services on DSS/ESS 
nodes with regard to support? Let me outline our scenario

Our main cluster uses 10Gbps ethernet for storage with the DSS-G nodes 
hooked up with redundant 40Gbps ethernet.

However we have an older cluster that is used for undergraduate teaching 
that only has 1Gbps ethernet and QDR Infiniband. With no money to 
upgrade this to 10Gbps ethernet to support this we flipped one of the 
ports on the ConnectX4 cards on each DSS-G node to Infiniband and using 
IPoIB run the teaching nodes in this way.

However it means that we need an Ethernet to Infiniband gateway as the 
ethernet only connected nodes want to talk to the Infiniband connected 
ones on their Infiniband address. Not a problem we grabbed an old spare 
machine installed CentOS and configured it up to act as a bridge, and 
deploy a custom route to all the ethernet only connected nodes. It has 
been working fine for a couple of years now.

The problem is that this becomes firstly a single point of failure, on 
hardware that is six years old now. Secondly to apply updates on the 
gateway machine means all the teaching nodes have to be drained and GPFS 
umounted to reboot the machine after updates have been installed. It is 
currently not getting patched as frequently as I would like (and 
required by the Scottish government).

So thinking about it I have come to the conclusion that the ideal 
situation would be to use the DSS-G nodes as the gateway and run 
keepalived to move the gateway ethernet IP address between the two 
machines. It is idea because as long as one DSS-G node is up then there 
is a functioning gateway and nodes don't get ejected from the cluster. 
If both DSS-G nodes are down then there is no GPFS to mount anyway and 
lack of a gateway is a moot point.

I grabbed a couple of the teaching compute nodes in the summer and 
trialed it out. It works a treat.

I now need to check IBM are not going to throw a wobbler down the line 
if I need to get support before deploying it to the DSS-G nodes :-)


JAB.

-- 
Jonathan A. Buzzard                         Tel: +44141-5483420
HPC System Administrator, ARCHIE-WeSt.
University of Strathclyde, John Anderson Building, Glasgow. G4 0NG


From abeattie at au1.ibm.com  Fri Oct  2 23:19:15 2020
From: abeattie at au1.ibm.com (Andrew Beattie)
Date: Fri, 2 Oct 2020 22:19:15 +0000
Subject: [gpfsug-discuss] Services on DSS/ESS nodes
In-Reply-To: <38bbcd24-4cc2-2abe-754f-3017d71c1eaa@strath.ac.uk>
Message-ID: <OF777393BE.20C1535F-ON002585F5.007A9C47-1601677155073@notes.na.collabserv.com>


Jonathan,
I suggest you get a formal statement from Lenovo as the DSS-G Platform is
no longer an IBM platform.

But for ESS based platforms the answer would be, it is not supported to run
anything on the IO Servers other than GNR and the relevant Scale management
services, due to the fact that if you lose an IO Server, or if you in an
extended maintenance window the Server needs to host all the work that
would be being performed by both IO servers.

I don't know if Lenovo have different point if view.

Regards,

Andrew

Sent from my iPhone

> On 3 Oct 2020, at 02:14, Jonathan Buzzard <jonathan.buzzard at strath.ac.uk>
wrote:
>
> 
> What if any are the rules around running additional services on DSS/ESS
> nodes with regard to support? Let me outline our scenario
>
> Our main cluster uses 10Gbps ethernet for storage with the DSS-G nodes
> hooked up with redundant 40Gbps ethernet.
>
> However we have an older cluster that is used for undergraduate teaching
> that only has 1Gbps ethernet and QDR Infiniband. With no money to
> upgrade this to 10Gbps ethernet to support this we flipped one of the
> ports on the ConnectX4 cards on each DSS-G node to Infiniband and using
> IPoIB run the teaching nodes in this way.
>
> However it means that we need an Ethernet to Infiniband gateway as the
> ethernet only connected nodes want to talk to the Infiniband connected
> ones on their Infiniband address. Not a problem we grabbed an old spare
> machine installed CentOS and configured it up to act as a bridge, and
> deploy a custom route to all the ethernet only connected nodes. It has
> been working fine for a couple of years now.
>
> The problem is that this becomes firstly a single point of failure, on
> hardware that is six years old now. Secondly to apply updates on the
> gateway machine means all the teaching nodes have to be drained and GPFS
> umounted to reboot the machine after updates have been installed. It is
> currently not getting patched as frequently as I would like (and
> required by the Scottish government).
>
> So thinking about it I have come to the conclusion that the ideal
> situation would be to use the DSS-G nodes as the gateway and run
> keepalived to move the gateway ethernet IP address between the two
> machines. It is idea because as long as one DSS-G node is up then there
> is a functioning gateway and nodes don't get ejected from the cluster.
> If both DSS-G nodes are down then there is no GPFS to mount anyway and
> lack of a gateway is a moot point.
>
> I grabbed a couple of the teaching compute nodes in the summer and
> trialed it out. It works a treat.
>
> I now need to check IBM are not going to throw a wobbler down the line
> if I need to get support before deploying it to the DSS-G nodes :-)
>
>
> JAB.
>
> --
> Jonathan A. Buzzard                         Tel: +44141-5483420
> HPC System Administrator, ARCHIE-WeSt.
> University of Strathclyde, John Anderson Building, Glasgow. G4 0NG
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org
>
http://gpfsug.org/mailman/listinfo/gpfsug-discuss

>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20201002/6b69501c/attachment-0001.htm>

From jonathan.buzzard at strath.ac.uk  Sat Oct  3 11:06:41 2020
From: jonathan.buzzard at strath.ac.uk (Jonathan Buzzard)
Date: Sat, 3 Oct 2020 11:06:41 +0100
Subject: [gpfsug-discuss] Services on DSS/ESS nodes
In-Reply-To: <OF777393BE.20C1535F-ON002585F5.007A9C47-1601677155073@notes.na.collabserv.com>
References: <OF777393BE.20C1535F-ON002585F5.007A9C47-1601677155073@notes.na.collabserv.com>
Message-ID: <7af0ac41-a280-5ecd-3658-7af761a4bf9b@strath.ac.uk>

On 02/10/2020 23:19, Andrew Beattie wrote:
> Jonathan,
> I suggest you get a formal statement from Lenovo as the DSS-G Platform 
> is no longer an IBM platform.
> 
> But for ESS based platforms the answer would be, it is not supported to 
> run anything on the IO Servers other than GNR and the relevant Scale 
> management services, due to the fact that if you lose an IO Server, or 
> if you in an extended maintenance window the Server needs to host all 
> the work that would be being performed by both IO servers.
> 

In the past ~500 days the Infiniband to Ethernet gateway has shifted 
~13GB of data, or about 25MB a day. Meanwhile in the last 470 days the 
DSS-G nodes have each shifted several PB. The proposed additional 
traffic is a drop in the ocean.

On my actual routers which shift much more data (over 300TB externally) 
with an uptime of ~180 days at the moment the CPU time consumed by 
keepalived is just under 31 minutes or about 8 seconds a day. These are 
much punier CPU's too. The proposed additional CPU usage is another drop 
in the ocean.

Given Lenovo sold the *same* configuration with x3650's and SR650's the 
"need all the CPU grunt" is somewhat fishy. Between the bid being 
submitted and actual tender award the SR650's came out and we paid a bit 
extra to uplift to the newer server hardware with exactly the same disk 
configuration. I believe IBM have done the same with the ESS/GNR servers 
too over time the same applies there too.

IMHO given keepalived is a base RHEL package, IBM/Lenovo should be 
offering running Infiniband to Ethernet gateways on the DSS/ESS nodes as 
a supported configuration for mixed network technology clusters :-)

Running a couple extra servers for this purpose is obnoxious from an 
environmental standpoint. That's IBM's green credentials out the window 
if you ask me.

I would note under those rules running a Nagios, Zabbix etc. client on 
the nodes is not permitted either. I would suggest that most sites would 
be rather unhappy about that :-)


 > I don't know if Lenovo have different point if view.
 >

Problem is when I ring up for support on my DSS-G I speak to an IBM 
employee not a Lenovo one :-)


JAB.

-- 
Jonathan A. Buzzard                         Tel: +44141-5483420
HPC System Administrator, ARCHIE-WeSt.
University of Strathclyde, John Anderson Building, Glasgow. G4 0NG


From abeattie at au1.ibm.com  Sat Oct  3 11:55:05 2020
From: abeattie at au1.ibm.com (Andrew Beattie)
Date: Sat, 3 Oct 2020 10:55:05 +0000
Subject: [gpfsug-discuss] Services on DSS/ESS nodes
In-Reply-To: <7af0ac41-a280-5ecd-3658-7af761a4bf9b@strath.ac.uk>
Message-ID: <OF37F2C74A.D272F48C-ON002585F6.003BF987-1601722505378@notes.na.collabserv.com>


Why do you need to run any kind of monitoring client on an IO server the
GUI / performance monitor already does all of that work for you and
collects the data on the dedicated EMS server.

If you have a small storage environment the. Yes the processor and memory
may feel like overkill, but tuned appropriately an IO server will use all
the memory you can give it to drive IO performance,

If you want to run a hybrid / non standard architecture then the IBM ESS /
DGSS platform may not be the right platform in comparison to a build your
own architecture, how ever you then take all the support issues onto your
self rather than it being the vendors problem.

Sent from my iPhone

> On 3 Oct 2020, at 20:06, Jonathan Buzzard <jonathan.buzzard at strath.ac.uk>
wrote:
>
> On 02/10/2020 23:19, Andrew Beattie wrote:
>> Jonathan,
>> I suggest you get a formal statement from Lenovo as the DSS-G Platform
>> is no longer an IBM platform.
>>
>> But for ESS based platforms the answer would be, it is not supported to
>> run anything on the IO Servers other than GNR and the relevant Scale
>> management services, due to the fact that if you lose an IO Server, or
>> if you in an extended maintenance window the Server needs to host all
>> the work that would be being performed by both IO servers.
>>
>
> In the past ~500 days the Infiniband to Ethernet gateway has shifted
> ~13GB of data, or about 25MB a day. Meanwhile in the last 470 days the
> DSS-G nodes have each shifted several PB. The proposed additional
> traffic is a drop in the ocean.
>
> On my actual routers which shift much more data (over 300TB externally)
> with an uptime of ~180 days at the moment the CPU time consumed by
> keepalived is just under 31 minutes or about 8 seconds a day. These are
> much punier CPU's too. The proposed additional CPU usage is another drop
> in the ocean.
>
> Given Lenovo sold the *same* configuration with x3650's and SR650's the
> "need all the CPU grunt" is somewhat fishy. Between the bid being
> submitted and actual tender award the SR650's came out and we paid a bit
> extra to uplift to the newer server hardware with exactly the same disk
> configuration. I believe IBM have done the same with the ESS/GNR servers
> too over time the same applies there too.
>
> IMHO given keepalived is a base RHEL package, IBM/Lenovo should be
> offering running Infiniband to Ethernet gateways on the DSS/ESS nodes as
> a supported configuration for mixed network technology clusters :-)
>
> Running a couple extra servers for this purpose is obnoxious from an
> environmental standpoint. That's IBM's green credentials out the window
> if you ask me.
>
> I would note under those rules running a Nagios, Zabbix etc. client on
> the nodes is not permitted either. I would suggest that most sites would
> be rather unhappy about that :-)
>
>
>> I don't know if Lenovo have different point if view.
>>
>
> Problem is when I ring up for support on my DSS-G I speak to an IBM
> employee not a Lenovo one :-)
>
>
> JAB.
>
> --
> Jonathan A. Buzzard                         Tel: +44141-5483420
> HPC System Administrator, ARCHIE-WeSt.
> University of Strathclyde, John Anderson Building, Glasgow. G4 0NG
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org
>
http://gpfsug.org/mailman/listinfo/gpfsug-discuss

>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20201003/b5de5e36/attachment-0001.htm>

From luis.bolinches at fi.ibm.com  Sat Oct  3 12:19:36 2020
From: luis.bolinches at fi.ibm.com (Luis Bolinches)
Date: Sat, 3 Oct 2020 11:19:36 +0000
Subject: [gpfsug-discuss] Services on DSS/ESS nodes
In-Reply-To: <OF37F2C74A.D272F48C-ON002585F6.003BF987-1601722505378@notes.na.collabserv.com>
Message-ID: <OFBC4CB60E.7F91267F-ON002585F6.003E3808-1601723976459@notes.na.collabserv.com>

Are you mixing those ESS DSS in the same cluster? Or you are only running DSS


https://www.ibm.com/support/knowledgecenter/SSYSP8/gnrfaq.html?view=kc#supportqs__building

Mixing DSS and ESS in the same cluster is not a supported configuration.

You really need to talk with Lenovo as is your vendor. The fact that in your region your support is being given by an IBMer or not is not a relevant point. High enough in the chain always will end at IBM on any region as GNR is IBM tech for 17 years (yes 17) so if weird enough even on regions where Lenovo might do even third level it might end on development and/or research. But that is a Lenovo/IBM agreement not you and IBM. 

So please get the support statement from Lenovo about this and pls share it if you want/can so we all learn their position. 

Thanks. 


--
Cheers

> On 3. Oct 2020, at 13.55, Andrew Beattie <abeattie at au1.ibm.com> wrote:
> 
> ?
> Why do you need to run any kind of monitoring client on an IO server the GUI / performance monitor already does all of that work for you and collects the data on the dedicated EMS server.
> 
> If you have a small storage environment the. Yes the processor and memory may feel like overkill, but tuned appropriately an IO server will use all the memory you can give it to drive IO performance, 
> 
> If you want to run a hybrid / non standard architecture then the IBM ESS / DGSS platform may not be the right platform in comparison to a build your own architecture, how ever you then take all the support issues onto your self rather than it being the vendors problem. 
> 
> Sent from my iPhone
> 
> > On 3 Oct 2020, at 20:06, Jonathan Buzzard <jonathan.buzzard at strath.ac.uk> wrote:
> > 
> > On 02/10/2020 23:19, Andrew Beattie wrote:
> >> Jonathan,
> >> I suggest you get a formal statement from Lenovo as the DSS-G Platform 
> >> is no longer an IBM platform.
> >> 
> >> But for ESS based platforms the answer would be, it is not supported to 
> >> run anything on the IO Servers other than GNR and the relevant Scale 
> >> management services, due to the fact that if you lose an IO Server, or 
> >> if you in an extended maintenance window the Server needs to host all 
> >> the work that would be being performed by both IO servers.
> >> 
> > 
> > In the past ~500 days the Infiniband to Ethernet gateway has shifted 
> > ~13GB of data, or about 25MB a day. Meanwhile in the last 470 days the 
> > DSS-G nodes have each shifted several PB. The proposed additional 
> > traffic is a drop in the ocean.
> > 
> > On my actual routers which shift much more data (over 300TB externally) 
> > with an uptime of ~180 days at the moment the CPU time consumed by 
> > keepalived is just under 31 minutes or about 8 seconds a day. These are 
> > much punier CPU's too. The proposed additional CPU usage is another drop 
> > in the ocean.
> > 
> > Given Lenovo sold the *same* configuration with x3650's and SR650's the 
> > "need all the CPU grunt" is somewhat fishy. Between the bid being 
> > submitted and actual tender award the SR650's came out and we paid a bit 
> > extra to uplift to the newer server hardware with exactly the same disk 
> > configuration. I believe IBM have done the same with the ESS/GNR servers 
> > too over time the same applies there too.
> > 
> > IMHO given keepalived is a base RHEL package, IBM/Lenovo should be 
> > offering running Infiniband to Ethernet gateways on the DSS/ESS nodes as 
> > a supported configuration for mixed network technology clusters :-)
> > 
> > Running a couple extra servers for this purpose is obnoxious from an 
> > environmental standpoint. That's IBM's green credentials out the window 
> > if you ask me.
> > 
> > I would note under those rules running a Nagios, Zabbix etc. client on 
> > the nodes is not permitted either. I would suggest that most sites would 
> > be rather unhappy about that :-)
> > 
> > 
> >> I don't know if Lenovo have different point if view.
> >> 
> > 
> > Problem is when I ring up for support on my DSS-G I speak to an IBM 
> > employee not a Lenovo one :-)
> > 
> > 
> > JAB.
> > 
> > -- 
> > Jonathan A. Buzzard Tel: +44141-5483420
> > HPC System Administrator, ARCHIE-WeSt.
> > University of Strathclyde, John Anderson Building, Glasgow. G4 0NG
> > _______________________________________________
> > gpfsug-discuss mailing list
> > gpfsug-discuss at spectrumscale.org
> > http://gpfsug.org/mailman/listinfo/gpfsug-discuss 
> > 
> 

Ellei edell? ole toisin mainittu: / Unless stated otherwise above:
Oy IBM Finland Ab
PL 265, 00101 Helsinki, Finland
Business ID, Y-tunnus: 0195876-3 
Registered in Finland

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20201003/7806094c/attachment-0001.htm>

From jonathan.buzzard at strath.ac.uk  Sat Oct  3 18:16:33 2020
From: jonathan.buzzard at strath.ac.uk (Jonathan Buzzard)
Date: Sat, 3 Oct 2020 18:16:33 +0100
Subject: [gpfsug-discuss] Services on DSS/ESS nodes
In-Reply-To: <OF37F2C74A.D272F48C-ON002585F6.003BF987-1601722505378@notes.na.collabserv.com>
References: <OF37F2C74A.D272F48C-ON002585F6.003BF987-1601722505378@notes.na.collabserv.com>
Message-ID: <d25eba8f-244b-f610-a49f-8826e77ec6b6@strath.ac.uk>

On 03/10/2020 11:55, Andrew Beattie wrote:
> Why do you need to run any kind of monitoring client on an IO server the 
> GUI / performance monitor already does all of that work for you and 
> collects the data on the dedicated EMS server.

Because any remotely sensible admin demands a single pane service 
monitoring system. If I have to look at A for everything but my DSS-G 
and B for my DSS-G that's an epic fail.

I often feel there is a huge disjuncture between the people that develop 
systems and those that look after them; they think the world revolves 
around them. It is clear this is one of those  cases.

> 
> If you have a small storage environment the. Yes the processor and 
> memory may feel like overkill, but tuned appropriately an IO server will 
> use all the memory you can give it to drive IO performance,

Right but the SR650's came with not only more CPU but more RAM than the 
x3650's. In which case why only 192GB of RAM? The SR650's can take much 
more than that. Why not 384GB of RAM :-) Right now we have a shade over 
50GB of RAM being unused. Been way for like ever because we naughtily 
have a influx DB client setup for a Grafana dashboard. We also 
presumably naughtily have remote syslog to Splunk.

> 
> If you want to run a hybrid / non standard architecture then the IBM ESS 
> / DGSS platform may not be the right platform in comparison to a build 
> your own architecture, how ever you then take all the support issues 
> onto your self rather than it being the vendors problem.
> 
I don't see anything that says you can't have some clients ethernet 
connected and some Infiniband connected. That of course requires a 
gateway, and the most logical place to put it is on the ESS or DSS nodes 
IMHO. I will see what Lenovo has to say, but looks like the IBM position 
is decidedly let's burn the planet, who gives a dam.


JAB.

-- 
Jonathan A. Buzzard                         Tel: +44141-5483420
HPC System Administrator, ARCHIE-WeSt.
University of Strathclyde, John Anderson Building, Glasgow. G4 0NG


From jonathan.buzzard at strath.ac.uk  Sat Oct  3 18:16:39 2020
From: jonathan.buzzard at strath.ac.uk (Jonathan Buzzard)
Date: Sat, 3 Oct 2020 18:16:39 +0100
Subject: [gpfsug-discuss] Services on DSS/ESS nodes
In-Reply-To: <OFBC4CB60E.7F91267F-ON002585F6.003E3808-1601723976459@notes.na.collabserv.com>
References: <OFBC4CB60E.7F91267F-ON002585F6.003E3808-1601723976459@notes.na.collabserv.com>
Message-ID: <d475e10b-2bea-aac3-3031-3c6dc348ce86@strath.ac.uk>

On 03/10/2020 12:19, Luis Bolinches wrote:
> Are you mixing those ESS DSS in the same cluster? Or you are only 
> running DSS
> 

Only running DSS. We are too far down the rabbit hole to ever switch to 
ESS now.

> 
> Mixing DSS and ESS in the same cluster is not a supported configuration.
>

I know, it means you can never ever migrate your storage from DSS to ESS 
without a full backup and restore. Who with any significant amount of 
storage is going to want to do that? The logic behind this escapes me, 
or perhaps in that scenario IBM might relax the rules for the migration 
period.


> You really need to talk with Lenovo as is your vendor. The fact that in 
> your region your support is being given by an IBMer or not is not a 
> relevant point. High enough in the chain always will end at IBM on any 
> region as GNR is IBM tech for 17 years (yes 17) so if weird enough even 
> on regions where Lenovo might do even third level it might end on 
> development and/or research. But that is a Lenovo/IBM agreement not you 
> and IBM.
> 
> So please get the support statement from Lenovo about this and pls share 
> it if you want/can so we all learn their position.
> 

Will attempt that, though I do think it should be a supported config out 
the box :-)


JAB.

-- 
Jonathan A. Buzzard                         Tel: +44141-5483420
HPC System Administrator, ARCHIE-WeSt.
University of Strathclyde, John Anderson Building, Glasgow. G4 0NG


From luis.bolinches at fi.ibm.com  Sun Oct  4 10:29:34 2020
From: luis.bolinches at fi.ibm.com (Luis Bolinches)
Date: Sun, 4 Oct 2020 09:29:34 +0000
Subject: [gpfsug-discuss] Services on DSS/ESS nodes
In-Reply-To: <d475e10b-2bea-aac3-3031-3c6dc348ce86@strath.ac.uk>
Message-ID: <OF078DF2EA.0852E4DE-ON002585F7.00342558-1601803774728@notes.na.collabserv.com>

Hi

As stated on the same link you can do remote mounts from each other and be a supported setup. 

? You can use the remote mount feature of IBM Spectrum Scale to share file system data across clusters.?

--
Cheers

> On 3. Oct 2020, at 20.16, Jonathan Buzzard <jonathan.buzzard at strath.ac.uk> wrote:
> 
> ?On 03/10/2020 12:19, Luis Bolinches wrote:
>> Are you mixing those ESS DSS in the same cluster? Or you are only
>> running DSS
> 
> Only running DSS. We are too far down the rabbit hole to ever switch to 
> ESS now.
> 
>> Mixing DSS and ESS in the same cluster is not a supported configuration.
> 
> I know, it means you can never ever migrate your storage from DSS to ESS 
> without a full backup and restore. Who with any significant amount of 
> storage is going to want to do that? The logic behind this escapes me, 
> or perhaps in that scenario IBM might relax the rules for the migration 
> period.
> 
> 
>> You really need to talk with Lenovo as is your vendor. The fact that in
>> your region your support is being given by an IBMer or not is not a
>> relevant point. High enough in the chain always will end at IBM on any
>> region as GNR is IBM tech for 17 years (yes 17) so if weird enough even
>> on regions where Lenovo might do even third level it might end on
>> development and/or research. But that is a Lenovo/IBM agreement not you
>> and IBM.
>> So please get the support statement from Lenovo about this and pls share
>> it if you want/can so we all learn their position.
> 
> Will attempt that, though I do think it should be a supported config out 
> the box :-)
> 
> 
> JAB.
> 
> -- 
> Jonathan A. Buzzard                         Tel: +44141-5483420
> HPC System Administrator, ARCHIE-WeSt.
> University of Strathclyde, John Anderson Building, Glasgow. G4 0NG
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss

Ellei edell? ole toisin mainittu: / Unless stated otherwise above:
Oy IBM Finland Ab
PL 265, 00101 Helsinki, Finland
Business ID, Y-tunnus: 0195876-3 
Registered in Finland

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20201004/43585e71/attachment-0001.htm>

From jonathan.buzzard at strath.ac.uk  Sun Oct  4 11:17:30 2020
From: jonathan.buzzard at strath.ac.uk (Jonathan Buzzard)
Date: Sun, 4 Oct 2020 11:17:30 +0100
Subject: [gpfsug-discuss] Services on DSS/ESS nodes
In-Reply-To: <OF078DF2EA.0852E4DE-ON002585F7.00342558-1601803774728@notes.na.collabserv.com>
References: <OF078DF2EA.0852E4DE-ON002585F7.00342558-1601803774728@notes.na.collabserv.com>
Message-ID: <7ef58f7d-1c70-97d7-100d-395c403d6199@strath.ac.uk>

On 04/10/2020 10:29, Luis Bolinches wrote:
> Hi
> 
> As stated on the same link you can do remote mounts from each other and 
> be a supported setup.
> 
> ??You can use the remote mount feature of IBM Spectrum Scale to share 
> file system data across clusters.?
> 

You can, but imagine I have a DSS-G cluster, with 2PB of storage on it 
which is quite modest in 2020. It is now end of life and for whatever 
reason I decide I want to move to ESS instead.

What any sane storage admin want to do at this stage is set the ESS, add 
the ESS nodes to the existing cluster on the DSS-G then do a bit of 
mmadddisk/mmdeldisk and sit back while the data is seemlessly moved from 
the DSS-G to the ESS. Admittedly this might take a while :-)

Then once all the data is moved a bit of mmdelnode and bingo the storage 
has been migrated from DSS-G to ESS with zero downtime.

As that is not allowed for what I presume are commercial reasons (you 
could do it in reverse and presumable that is what IBM don't want) then 
once you are down the rabbit hole of one type of storage the you are not 
going to switch to a different one.

You need to look at it from the perspective of the users. They frankly 
could not give a monkeys what storage solution you are using. All they 
care about is having usable storage and large amounts of downtime to 
switch from one storage type to another is not really acceptable.


JAB.

-- 
Jonathan A. Buzzard                         Tel: +44141-5483420
HPC System Administrator, ARCHIE-WeSt.
University of Strathclyde, John Anderson Building, Glasgow. G4 0NG


From olaf.weiser at de.ibm.com  Mon Oct  5 07:19:40 2020
From: olaf.weiser at de.ibm.com (Olaf Weiser)
Date: Mon, 5 Oct 2020 06:19:40 +0000
Subject: [gpfsug-discuss] Services on DSS/ESS nodes
In-Reply-To: <7ef58f7d-1c70-97d7-100d-395c403d6199@strath.ac.uk>
References: <7ef58f7d-1c70-97d7-100d-395c403d6199@strath.ac.uk>,
	<OF078DF2EA.0852E4DE-ON002585F7.00342558-1601803774728@notes.na.collabserv.com>
Message-ID: <OF7BE40A26.30F06268-ON002585F8.001E36FC-002585F8.0022C2BE@notes.na.collabserv.com>

An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20201005/47f72836/attachment-0001.htm>

From jordi.caubet at es.ibm.com  Mon Oct  5 07:27:39 2020
From: jordi.caubet at es.ibm.com (Jordi Caubet Serrabou)
Date: Mon, 5 Oct 2020 06:27:39 +0000
Subject: [gpfsug-discuss] Services on DSS/ESS nodes
In-Reply-To: <OF7BE40A26.30F06268-ON002585F8.001E36FC-002585F8.0022C2BE@notes.na.collabserv.com>
Message-ID: <OF2E966B1D.F37B6B20-ON002585F8.00237D51-1601879259249@notes.na.collabserv.com>

?Coming to the routing point, is there any reason why you need it ? I mean, this is because GPFS trying to connect between compute nodes or a reason outside GPFS scope ?
If the reason is GPFS,  imho best approach - without knowledge of the licensing you have - would be to use separate clusters: a storage cluster and two compute clusters.

Both compute clusters join using multicluster setup the storage cluster. There is no need both compute clusters see each other, they only need to see the storage cluster. One of the clusters using the 10G, the other cluster using the IPoIB interface.
You need at least three quorum nodes in each compute cluster but if licensing is per drive on the DSS, it is covered.
--
Jordi Caubet Serrabou
IBM Software Defined Infrastructure (SDI) and Flash Technical Sales Specialist
Technical Computing and HPC IT Specialist and Architect
Ext. Phone: (+34) 679.79.17.84 (internal 55834)
E-mail: jordi.caubet at es.ibm.com

> On 5 Oct 2020, at 08:19, Olaf Weiser <olaf.weiser at de.ibm.com> wrote:
> 
> ?
> let me add a few comments from some very successful large installations in Eruope
>  
> # InterOP
> Even though (as Luis pointed to) , there is no support statement to run intermix DSS/ESS in general, it was ~, and is, and will be, ~ allowed for short term purposes, such as e.g migration.
> The reason to not support those DSS/ESS mixed configuration in general is simply driven by the fact, that different release version of DSS/ESS potentially (not in every release, but sometimes)  comes with different driver levels, (e.g. MOFED), OS, RDMA-settings, GPFS tuning,  etc...
> Those changes can have an impact/multiple impacts and therefore, we do not support that in general. Of course -and this would be the advice for every one - if you are faced the need to run a mixed configuration for e.g. a migration and/or e.g. cause of you need to temporary provide space etc... contact you IBM representative and settle to plan that accordingly..
> There will be (likely) some additional requirements/dependencies defined  like  driver versions, OS,  and/or Scale versions, but you'll get a chance to run mixed configuration - temporary limited to your specific scenario.
>  
> # Monitoring
> No doubt, monitoring is essential and absolutely needed. - and/but - IBM wants customers to be very sensitive, what kind of additional software (=workload) gets installed on the ESS-IO servers. BTW, this rule applies as well to any other important GPFS node with special roles (e.g. any other NSD server etc)
> But given the fact, that customer's usually manage and monitor their server farms from a central point of control (any 3rd party software), it is common/ best practice , that additionally monitor software(clients/endpoints) has to run on GPFS nodes, so as on ESS nodes too.
>  
> If that way of acceptance applies for DSS too, you may want to double check with Lenovo ?!
>  
>  
> #additionally GW functions
> It would be a hot iron, to general allow routing on IO nodes. Similar to the mixed support approach, the field variety for such a statement would be hard(==impossible) to manage. As we all agree, additional network traffic can (and in fact will) impact GPFS.
> In your special case, the expected data rates seems to me more than ok and acceptable to go with your suggested config (as long workloads remain on that level / monitor it accordingly as you are already obviously doing) 
> Again,to be on the safe side.. contact your IBM representative and I'm sure you 'll find a way..
>  
>  
>  
> kind regards....
> olaf
>  
>  
> ----- Original message -----
> From: Jonathan Buzzard <jonathan.buzzard at strath.ac.uk>
> Sent by: gpfsug-discuss-bounces at spectrumscale.org
> To: gpfsug-discuss at spectrumscale.org
> Cc:
> Subject: [EXTERNAL] Re: [gpfsug-discuss] Services on DSS/ESS nodes
> Date: Sun, Oct 4, 2020 12:17 PM
>  
> On 04/10/2020 10:29, Luis Bolinches wrote:
> > Hi
> >
> > As stated on the same link you can do remote mounts from each other and
> > be a supported setup.
> >
> > ? You can use the remote mount feature of IBM Spectrum Scale to share
> > file system data across clusters.?
> >
> 
> You can, but imagine I have a DSS-G cluster, with 2PB of storage on it
> which is quite modest in 2020. It is now end of life and for whatever
> reason I decide I want to move to ESS instead.
> 
> What any sane storage admin want to do at this stage is set the ESS, add
> the ESS nodes to the existing cluster on the DSS-G then do a bit of
> mmadddisk/mmdeldisk and sit back while the data is seemlessly moved from
> the DSS-G to the ESS. Admittedly this might take a while :-)
> 
> Then once all the data is moved a bit of mmdelnode and bingo the storage
> has been migrated from DSS-G to ESS with zero downtime.
> 
> As that is not allowed for what I presume are commercial reasons (you
> could do it in reverse and presumable that is what IBM don't want) then
> once you are down the rabbit hole of one type of storage the you are not
> going to switch to a different one.
> 
> You need to look at it from the perspective of the users. They frankly
> could not give a monkeys what storage solution you are using. All they
> care about is having usable storage and large amounts of downtime to
> switch from one storage type to another is not really acceptable.
> 
> 
> JAB.
> 
> --
> Jonathan A. Buzzard                         Tel: +44141-5483420
> HPC System Administrator, ARCHIE-WeSt.
> University of Strathclyde, John Anderson Building, Glasgow. G4 0NG
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss 
>  
>  
> 
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
> 

Salvo indicado de otro modo m?s arriba / Unless stated otherwise above:
International Business Machines, S.A.
Santa Hortensia, 26-28, 28002 Madrid
Registro Mercantil de Madrid; Folio 1; Tomo 1525; Hoja M-28146
CIF A28-010791

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20201005/1846e9d6/attachment-0001.htm>

From S.J.Thompson at bham.ac.uk  Mon Oct  5 09:40:56 2020
From: S.J.Thompson at bham.ac.uk (Simon Thompson)
Date: Mon, 5 Oct 2020 08:40:56 +0000
Subject: [gpfsug-discuss] Services on DSS/ESS nodes
In-Reply-To: <38bbcd24-4cc2-2abe-754f-3017d71c1eaa@strath.ac.uk>
References: <38bbcd24-4cc2-2abe-754f-3017d71c1eaa@strath.ac.uk>
Message-ID: <73AEC0BB-C67B-4863-A44F-6D32FDA56FEB@bham.ac.uk>

>    I now need to check IBM are not going to throw a wobbler down the line 
>    if I need to get support before deploying it to the DSS-G nodes :-)

I know there were a lot of other emails about this ...

I think you maybe want to be careful doing this. Whilst it might work when you setup the DSS-G like this, remember that the memory usage you are seeing at this point in time may not be what you always need. For example if you fail-over the recovery groups, you need to have enough free memory to handle this. E.g. a node failure, or more likely you are upgrading the building blocks.

Personally I wouldn't run other things like this on my DSS-G storage nodes. We do run e.g. nrpe monitoring to collect and report faults, but this is pretty lightweight compared to everything else. They even removed support for running the gui packages on the IO nodes - the early DSS-G builds used the IO nodes for this, but now you need separate systems for this.

Simon


From jonathan.buzzard at strath.ac.uk  Mon Oct  5 12:44:48 2020
From: jonathan.buzzard at strath.ac.uk (Jonathan Buzzard)
Date: Mon, 5 Oct 2020 12:44:48 +0100
Subject: [gpfsug-discuss] Services on DSS/ESS nodes
In-Reply-To: <73AEC0BB-C67B-4863-A44F-6D32FDA56FEB@bham.ac.uk>
References: <38bbcd24-4cc2-2abe-754f-3017d71c1eaa@strath.ac.uk>
	<73AEC0BB-C67B-4863-A44F-6D32FDA56FEB@bham.ac.uk>
Message-ID: <905a0bdb-b6a1-90e4-bf57-ed8edae6fb7c@strath.ac.uk>


On 05/10/2020 07:27, Jordi Caubet Serrabou wrote:

 > ?Coming to the routing point, is there any reason why you need it ? I
 > mean, this is because GPFS trying to connect between compute nodes or
 > a reason outside GPFS scope ?
 > If the reason is GPFS,  imho best approach - without knowledge of the
 > licensing you have - would be to use separate clusters: a storage
 > cluster and two compute clusters.

The issue is that individual nodes want to talk to one another on the 
data interface. Which caught me by surprise as the cluster is set to 
admin mode central.

The admin interface runs over ethernet for all nodes on a specific VLAN 
which which is given 802.1p priority 5 (that's Voice, < 10 ms latency 
and jitter). That saved a bunch of switching and cabling as you don't 
need the extra interface for the admin traffic. The cabling already 
significantly restricts airflow for a compute rack as it is, without 
adding a whole bunch more for a barely used admin interface.

It's like the people who wrote the best practice about separate 
interface for the admin traffic know very little about networking to be 
frankly honest. This is all last century technology.

The nodes for undergraduate teaching only have a couple of 1Gb ethernet 
ports which would suck for storage usage. However they also have QDR 
Infiniband. That is because even though undergraduates can't run 
multinode jobs, on the old cluster the Lustre storage was delivered over 
Infiniband, so they got Infiniband cards.

 > Both compute clusters join using multicluster setup the storage
 > cluster. There is no need both compute clusters see each other, they
 > only need to see the storage cluster. One of the clusters using the
 > 10G, the other cluster using the IPoIB interface.
 > You need at least three quorum nodes in each compute cluster but if
 > licensing is per drive on the DSS, it is covered.

Three clusters is starting to get complicated from an admin perspective. 
The biggest issue is coordinating maintenance and keep sufficient quorum 
nodes up.

Maintenance on compute nodes is done via the job scheduler. I know some 
people think this is crazy, but it is in reality extremely elegant.

We can schedule a reboot on a node as soon as the current job has 
finished (usually used for firmware upgrades). Or we can schedule a job 
to run as root (usually for applying updates) as soon as the current job 
has finished. As such we have no way of knowing when that will be for a 
given node, and there is a potential for all three quorum nodes to be 
down at once.

Using this scheme we can seamlessly upgrade the nodes safe in the 
knowledge that a node is either busy and it's running on the current 
configuration or it has been upgraded and is running the new 
configuration. Consequently multinode jobs are guaranteed to have all 
nodes in the job running on the same configuration.

The alternative is to drain the node, but there is only a 23% chance the 
node will become available during working hours leading to a significant 
loss of compute time when doing maintenance compared to our existing 
scheme where the loss of compute time is only as long as the upgrade 
takes to install. Pretty much the only time we have idle nodes is when 
the scheduler is reserving nodes ready to schedule a multi node job.

Right now we have a single cluster with the quorum nodes being the two 
DSS-G nodes and the node used for backup. It is easy to ensure that 
quorum is maintained on these, they also all run real RHEL, where as the 
compute nodes run CentOS.


JAB.

-- 
Jonathan A. Buzzard                         Tel: +44141-5483420
HPC System Administrator, ARCHIE-WeSt.
University of Strathclyde, John Anderson Building, Glasgow. G4 0NG


From carlz at us.ibm.com  Mon Oct  5 13:09:02 2020
From: carlz at us.ibm.com (Carl Zetie - carlz@us.ibm.com)
Date: Mon, 5 Oct 2020 12:09:02 +0000
Subject: [gpfsug-discuss] Services on DSS/ESS nodes
Message-ID: <714B599F-D06D-4D03-98F3-A2BF6F7360DB@us.ibm.com>


Jordi wrote:
?Both compute clusters join using multicluster setup the storage cluster. There is no need both compute clusters see each other, they only need to see the storage cluster. One of the clusters using the 10G, the other cluster using the IPoIB interface.
You need at least three quorum nodes in each compute cluster but if licensing is per drive on the DSS, it is covered.?

As a side note: One of the reasons we designed capacity (per Disk or per TB) licensing the way we did was specifically so that you could make this kind of architectural decision on its own merits, without worrying about a licensing penalty.


Carl Zetie
Program Director
Offering Management
Spectrum Scale
----
(919) 473 3318 ][ Research Triangle Park
carlz at us.ibm.com

[signature_1243111775]


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20201005/7eb5d683/attachment-0001.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image001.png
Type: image/png
Size: 69558 bytes
Desc: image001.png
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20201005/7eb5d683/attachment-0001.png>

From carlz at us.ibm.com  Mon Oct  5 13:20:25 2020
From: carlz at us.ibm.com (Carl Zetie - carlz@us.ibm.com)
Date: Mon, 5 Oct 2020 12:20:25 +0000
Subject: [gpfsug-discuss] Services on DSS/ESS nodes
Message-ID: <288C3527-32BA-43E2-B5EF-E79CC5765424@us.ibm.com>

>> Mixing DSS and ESS in the same cluster is not a supported configuration.
>
> I know, it means you can never ever migrate your storage from DSS to ESS
> without a full backup and restore. Who with any significant amount of
> storage is going to want to do that? The logic behind this escapes me,
> or perhaps in that scenario IBM might relax the rules for the migration
> period.
>

We do indeed relax the rules temporarily for a migration.

The reasoning behind this rule is for support. Many Scale support issues - often the toughest ones - are not about a single node, but about the cluster or network as a whole. So if you have a mix of IBM systems with systems supported by an OEM (this applies to any OEM by the way, not just Lenovo) and a cluster-wide issue, who are you going to call. (Well, in practice you?re going to call IBM and we?ll do our best to help you despite limits on our knowledge of the OEM systems?).

--CZ


Carl Zetie
Program Director
Offering Management
Spectrum Scale
----
(919) 473 3318 ][ Research Triangle Park
carlz at us.ibm.com

[signature_386371469]


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20201005/8629eaed/attachment-0001.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image001.png
Type: image/png
Size: 69558 bytes
Desc: image001.png
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20201005/8629eaed/attachment-0001.png>

From jonathan.buzzard at strath.ac.uk  Mon Oct  5 14:39:12 2020
From: jonathan.buzzard at strath.ac.uk (Jonathan Buzzard)
Date: Mon, 5 Oct 2020 14:39:12 +0100
Subject: [gpfsug-discuss] Services on DSS/ESS nodes
In-Reply-To: <73AEC0BB-C67B-4863-A44F-6D32FDA56FEB@bham.ac.uk>
References: <38bbcd24-4cc2-2abe-754f-3017d71c1eaa@strath.ac.uk>
	<73AEC0BB-C67B-4863-A44F-6D32FDA56FEB@bham.ac.uk>
Message-ID: <abf37ce3-aa29-b4e1-fab6-12673b7aad67@strath.ac.uk>

On 05/10/2020 09:40, Simon Thompson wrote:
>> I now need to check IBM are not going to throw a wobbler down the
>> line if I need to get support before deploying it to the DSS-G
>> nodes :-)
> 
> I know there were a lot of other emails about this ...
> 
> I think you maybe want to be careful doing this. Whilst it might work
> when you setup the DSS-G like this, remember that the memory usage
> you are seeing at this point in time may not be what you always need.
> For example if you fail-over the recovery groups, you need to have
> enough free memory to handle this. E.g. a node failure, or more
> likely you are upgrading the building blocks.

I think there is a lack of understanding on exactly how light weight 
keepalived is.

It's the same code as on my routers which are admittedly different CPU's 
(MIPS to be precise) but memory usage (taking out shared memory usage - 
libc for example is loaded anyway) is under 200KB. A bash shell uses 
more memory...

> 
> Personally I wouldn't run other things like this on my DSS-G storage
> nodes. We do run e.g. nrpe monitoring to collect and report faults,
> but this is pretty lightweight compared to everything else. They even
> removed support for running the gui packages on the IO nodes - the
> early DSS-G builds used the IO nodes for this, but now you need
> separate systems for this.
> 

And keepalived is in the same range as nrpe, which you do run :-) I have 
seen nrpe get out of hand and consume significant amounts of resources 
on a machine; the machine was ground to halt due to nrpe. One of the 
standard plugins was failing and sitting their busy waiting. Every five 
minutes it ran again. It of course decided to wait till ~7pm on a Friday 
to go wonky. By mid morning on Saturday it was virtually unresponsive, 
several minutes to get a shell...

I would note that you can run keepalived quite happily on an Ubiquiti 
EdgeRouter X which has a dual core 880 MHz MIPS CPU with 256MB of RAM. 
Mikrotik have models with similar specs that run it too.

On a dual Xeon Gold 6142 machine the usage of RAM and CPU by keepalived 
is noise.


JAB.

-- 
Jonathan A. Buzzard                         Tel: +44141-5483420
HPC System Administrator, ARCHIE-WeSt.
University of Strathclyde, John Anderson Building, Glasgow. G4 0NG


From committee at io500.org  Thu Oct  1 17:40:00 2020
From: committee at io500.org (committee at io500.org)
Date: Thu, 01 Oct 2020 10:40:00 -0600
Subject: [gpfsug-discuss] IO500 SC20 Call for Submission
Message-ID: <4a20ed6ae985a25c69d953e1ea633d62@io500.org>

CALL FOR IO500 SUBMISSION

Deadline: 30 October 2020 AoE 

Stabilization period: 1st October -- 9th October 2020 AoE 

The IO500 [1] is now accepting and encouraging submissions for the
upcoming 7th IO500 list, to be revealed at the IO500 Virtual BOF during
SC20. Once again, we are also accepting submissions to the 10 Node I/O
Challenge to encourage submission of small scale results. The new ranked
lists will be announced at our Virtual SC20 BoF. We hope to see you, and
your results, there.  

A new change for the upcoming submission procedure is the introduction
of a stabilization period that aims to harden the benchmark. The final
benchmark is released at the end of this period. During the
stabilization we encourage the community to test the proper execution of
the benchmark and provide us with feedback. We will apply bug fixes to
the code base and expect that results obtained will be valid as full
submission. We also continue with another list for the Student Cluster
Competition, since IO500 is used during this competition. 

Also new this year is that we have partnered with Anthony Kougkas' team
at Illinois Institute of Technology to evaluate the submission metadata
describing the storage system on which the test was run to improve the
quality and usefulness of the data IO500 collects. You may be contacted
by one of his students to clarify one or more of the metadata items from
your submission(s). We would appreciate, but do not require, your
cooperation to help improve the submission metadata quality. Results
from their work will be fed back to improve our submission process for
future lists. 

The IO500 benchmark suite is designed to be easy to run, and the
community has multiple active support channels to help with any
questions. Please submit results from your system, and we look forward
to seeing many of you at SC20! Please note that submissions of all sizes
are welcome, including multiple submissions from different storage
systems/tiers at a single site.  The website has customizable sorting so
it is possible to submit on a small system and still get a very good
per-client score, for example. Additionally, the list is about much more
than just the raw rank; all submissions help the community by collecting
and publishing a wider corpus of data. More details below. 

Following the success of the Top500 in collecting and analyzing
historical trends in supercomputer technology and evolution, the IO500
[1] was created in 2017, published its first list at SC17, and has grown
continuously since then. The need for such an initiative has long been
known within High-Performance Computing; however, defining appropriate
benchmarks had long been challenging. Despite this challenge, the
community, after long and spirited discussion, finally reached consensus
on a suite of benchmarks and a metric for resolving the scores into a
single ranking. 

The multi-fold goals of the benchmark suite are as follows: 

 	* Maximizing simplicity in running the benchmark suite
 	* Encouraging complexity in tuning for performance
 	* Allowing submitters to highlight their "hero run" performance
numbers
 	* Forcing submitters to simultaneously report performance for
challenging IO patterns.

Specifically, the benchmark suite includes a hero-run of both IOR and
mdtest configured however possible to maximize performance and establish
an upper-bound for performance. It also includes an IOR and mdtest run
with highly prescribed parameters in an attempt to determine a
lower-bound on the performance. Finally, it includes a namespace search,
as this has been determined to be a highly sought-after feature in HPC
storage systems that have historically not been well-measured.
Submitters are encouraged to share their tuning insights for
publication. 

The goals of the community are also multi-fold: 

 	* Gather historical data for the sake of analysis and to aid
predictions of storage futures
 	* Collect tuning information to share valuable performance
optimizations across the community
 	* Encourage vendors and designers to optimize for workloads beyond
"hero runs"
 	* Establish bounded expectations for users, procurers, and
administrators

10 NODE I/O CHALLENGE

The 10 Node Challenge is conducted using the regular IO500 benchmark,
however, with the rule that exactly 10 client nodes must be used to run
the benchmark. You may use any shared storage with, e.g., any number of
servers. When submitting for the IO500 list, you can opt-in for
"Participate in the 10 compute node challenge only", then we will not
include the results into the ranked list. Other 10-node node submissions
will be included in the full list and in the ranked list. We will
announce the result in a separate derived list and in the full list but
not on the ranked IO500 list at https://io500.org/ [2]  

BIRDS-OF-A-FEATHER

Once again, we encourage you to submit [1], to join our community, and
to attend our virtual BoF "The IO500 and the Virtual Institute of I/O"
at SC20, where we will announce the new IO500 list, the 10 node
challenge list, and the Student Cluster Competition list. We look
forward to answering any questions or concerns you might have. 

 	* [1] http://www.vi4io.org/io500/submission [3] 

Thanks, 

The IO500 Committee <committee at io500.org> 

 
Links:
------
[1] http://io500.org/
[2] https://io500.org/
[3] http://www.vi4io.org/io500/submission
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20201001/7611b021/attachment-0001.htm>

From valdis.kletnieks at vt.edu  Wed Oct  7 00:45:46 2020
From: valdis.kletnieks at vt.edu (Valdis Kl=?utf-8?Q?=c4=93?=tnieks)
Date: Tue, 06 Oct 2020 19:45:46 -0400
Subject: [gpfsug-discuss] Services on DSS/ESS nodes
In-Reply-To: <OF37F2C74A.D272F48C-ON002585F6.003BF987-1601722505378@notes.na.collabserv.com>
References: <OF37F2C74A.D272F48C-ON002585F6.003BF987-1601722505378@notes.na.collabserv.com>
Message-ID: <138651.1602027946@turing-police>

On Sat, 03 Oct 2020 10:55:05 -0000, "Andrew Beattie" said:

> Why do you need to run any kind of monitoring client on an IO server the
> GUI / performance monitor already does all of that work for you and
> collects the data on the dedicated EMS server.

Does *ALL* that work for me?

Will it toss you an alert if your sshd goes away, or if somebody's tossing
packets that iptables is blocking for good reasons, or any of the many other
things that a competent sysadmin wants to be alerted on that aren't GPFS, but
which are things that Nagios and Zabbix and similar tools were invented
to track?


-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 832 bytes
Desc: not available
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20201006/822e7ca6/attachment-0001.sig>

From S.J.Thompson at bham.ac.uk  Wed Oct  7 11:28:55 2020
From: S.J.Thompson at bham.ac.uk (Simon Thompson)
Date: Wed, 7 Oct 2020 10:28:55 +0000
Subject: [gpfsug-discuss] Services on DSS/ESS nodes
In-Reply-To: <138651.1602027946@turing-police>
References: <OF37F2C74A.D272F48C-ON002585F6.003BF987-1601722505378@notes.na.collabserv.com>
	<138651.1602027946@turing-police>
Message-ID: <FB188F2A-6E9D-49C5-8663-87547FD35BE3@bham.ac.uk>

Agreed ...

Report to me a pdisk is failing in my monitoring dashboard we use for *everything else*.
Tell me that kswapd is having one of those days.
Tell me rsyslogd has stopped sending for some reason.
Tell me if there are long waiters on the hosts.
Read the ipmi status of the host to tell me an OS drive is failed, or the CMOS battery is flat or ...

Whilst the GUI has a bunch of this stuff, in the real world the rest of us have reporting and dashboarding from many more systems...

Simon

?On 07/10/2020, 00:45, "gpfsug-discuss-bounces at spectrumscale.org on behalf of Valdis Kl?tnieks" <gpfsug-discuss-bounces at spectrumscale.org on behalf of valdis.kletnieks at vt.edu> wrote:

    On Sat, 03 Oct 2020 10:55:05 -0000, "Andrew Beattie" said:

    > Why do you need to run any kind of monitoring client on an IO server the
    > GUI / performance monitor already does all of that work for you and
    > collects the data on the dedicated EMS server.

    Does *ALL* that work for me?

    Will it toss you an alert if your sshd goes away, or if somebody's tossing
    packets that iptables is blocking for good reasons, or any of the many other
    things that a competent sysadmin wants to be alerted on that aren't GPFS, but
    which are things that Nagios and Zabbix and similar tools were invented
    to track?


From jonathan.buzzard at strath.ac.uk  Wed Oct  7 13:14:45 2020
From: jonathan.buzzard at strath.ac.uk (Jonathan Buzzard)
Date: Wed, 7 Oct 2020 13:14:45 +0100
Subject: [gpfsug-discuss] Services on DSS/ESS nodes
In-Reply-To: <FB188F2A-6E9D-49C5-8663-87547FD35BE3@bham.ac.uk>
References: <OF37F2C74A.D272F48C-ON002585F6.003BF987-1601722505378@notes.na.collabserv.com>
	<138651.1602027946@turing-police>
	<FB188F2A-6E9D-49C5-8663-87547FD35BE3@bham.ac.uk>
Message-ID: <e8cf9330-cf1f-e7ad-777e-6aeaade9b29b@strath.ac.uk>

On 07/10/2020 11:28, Simon Thompson wrote:
> Agreed ...
> 
> Report to me a pdisk is failing in my monitoring dashboard we use for *everything else*.
> Tell me that kswapd is having one of those days.
> Tell me rsyslogd has stopped sending for some reason.
> Tell me if there are long waiters on the hosts.
> Read the ipmi status of the host to tell me an OS drive is failed, or the CMOS battery is flat or ...
> 
> Whilst the GUI has a bunch of this stuff, in the real world the rest of us have reporting and dashboarding from many more systems...
> 

The problem is the developers know as much about looking after a system 
in the real world as a tea leaf knows the history of the East India 
Company. IMHO to even ask the question shows a total lack of 
understanding of the issue.

Consequently developers in their ivory towers have a habit of developing 
things that are as useful as a chocolate tea pot. Which putting it 
bluntly a competent sysadmins makes them look like a bunch of twits. I 
would note this is not a problem unique to IBM, it's developers in general.

The appropriate course of action would be not for IBM to develop a 
monitoring tool of their own but to provide a bunch of plugins for the 
popular monitoring tools that customers will already be using to monitor 
their whole IT estate.

Heaven forbid they could even run a poll to find out which ones the 
actual customers of their products are interested in rather than wasting 
effort developing software their customers are not actually interested in.

For my purposes there is I think an alternative. The actual routing of 
the IP packets is not a service, it's a kernel configuration to have the 
kernel route that packets :-) Keepalived just manages a floating IP 
address. There are other options to achieve this. They are clunkier but 
they side step IBM's silly rules.

I would however note at this point that at lots of sites all routing in 
the data centre is done using BGP. It comes in part out of the zero 
trust paradigm. I guess apparently running fail2ban is not permitted 
either. Can I even run firewalld? As you can seen a nothing else policy 
quickly becomes unsustainable IMHO.

There is a disjuncture between the developers in their ivory towers and 
the real world.


JAB.

-- 
Jonathan A. Buzzard                         Tel: +44141-5483420
HPC System Administrator, ARCHIE-WeSt.
University of Strathclyde, John Anderson Building, Glasgow. G4 0NG


From kkr at lbl.gov  Tue Oct 13 22:34:23 2020
From: kkr at lbl.gov (Kristy Kallback-Rose)
Date: Tue, 13 Oct 2020 14:34:23 -0700
Subject: [gpfsug-discuss] SC20 Planning - What questions would you ask a
	panel?
Message-ID: <B8A6C2AA-B4B0-473B-B7EC-C751F0EC9D7C@lbl.gov>

Hi all,

	By now you know SC will be digital this year. We are working towards some SC events for the Spectrum Scale User Group, and using our usual slot of Sunday did not seem like a great idea. So, we?re planning a couple 90-minute sessions and would like to do a panel during one of them. We?ll hope to do live Q&A, like an in-person Ask Me Anything session, but it?s probably a good idea to have a bank of questions ready as well, Also, that way your question may get asked, even if you can?t make the live session ?we?ll record these sessions for later viewing. 

	So, please send your questions for the panel and we?ll get a list built up. Better yet, attend the sessions live! Details to come, but for now, hold these time slots:

November 16th - 8:00 AM Pacific/3:00 PM UTC 

November 18th - 8:00 AM Pacific/3:00 PM UTC 

Best,
Kristy

Kristy Kallback-Rose
Senior HPC Storage Systems Analyst
National Energy Research Scientific Computing Center
Lawrence Berkeley National Laboratory

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20201013/4bb40577/attachment-0001.htm>

From juergen.hannappel at desy.de  Wed Oct 21 17:13:01 2020
From: juergen.hannappel at desy.de (Hannappel, Juergen)
Date: Wed, 21 Oct 2020 18:13:01 +0200 (CEST)
Subject: [gpfsug-discuss] Mounting an nfs share on a CES node
Message-ID: <1195503772.13156505.1603296781279.JavaMail.zimbra@desy.de>

Hi,
I have a CES node exporting some filesystems vis smb and ganesha in a standard CES setup.
Now I want to mount a nfs share from a different, non-CES server on this CES node.
This did not work:
mount -o -fstype=nfs4,minorversion=1,rw,rsize=65536,wsize=65536 some.other.server:/some/path /mnt/
mount.nfs: mount to NFS server 'some.other.server:/some/path' failed: RPC Error: Program unavailable

Does the CES software stack interfere with the nfs client setup? It seems that at least with
rpc-statd there is some conflict:

 systemctl status rpc-statd
? rpc-statd.service - NFS status monitor for NFSv2/3 locking.
   Loaded: loaded (/usr/lib/systemd/system/rpc-statd.service; static; vendor preset: disabled)
   Active: failed (Result: exit-code) since Wed 2020-10-21 17:48:21 CEST; 22min ago
  Process: 19896 ExecStart=/usr/sbin/rpc.statd $STATDARGS (code=exited, status=1/FAILURE)

Oct 21 17:48:21 mynode systemd[1]: Starting NFS status monitor for NFSv2/3 locking....
Oct 21 17:48:21 mynode rpc.statd[19896]: Statd service already running!
Oct 21 17:48:21 mynode systemd[1]: rpc-statd.service: control process exited, code=exited status=1
Oct 21 17:48:21 mynode systemd[1]: Failed to start NFS status monitor for NFSv2/3 locking..
Oct 21 17:48:21 mynode systemd[1]: Unit rpc-statd.service entered failed state.
Oct 21 17:48:21 mynode systemd[1]: rpc-statd.service failed.
-- 
Dr. J?rgen Hannappel  DESY/IT    Tel.  : +49 40 8998-4616


From mnaineni at in.ibm.com  Thu Oct 22 04:38:59 2020
From: mnaineni at in.ibm.com (Malahal R Naineni)
Date: Thu, 22 Oct 2020 03:38:59 +0000
Subject: [gpfsug-discuss] Mounting an nfs share on a CES node
In-Reply-To: <1195503772.13156505.1603296781279.JavaMail.zimbra@desy.de>
References: <1195503772.13156505.1603296781279.JavaMail.zimbra@desy.de>
Message-ID: <OF616CC8F9.AD263CD8-ON00258609.0012EE72-00258609.00140C7B@notes.na.collabserv.com>

An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20201022/5b5db3bb/attachment-0001.htm>

From andi at christiansen.xxx  Tue Oct 27 11:46:02 2020
From: andi at christiansen.xxx (Andi Christiansen)
Date: Tue, 27 Oct 2020 12:46:02 +0100 (CET)
Subject: [gpfsug-discuss] Alternative to Scale S3 API.
Message-ID: <1109480230.484366.1603799162955@privateemail.com>

Hi all,


We have over a longer period used the S3 API within spectrum Scale.. And that has shown that it does not support very many applications because of limitations of the API..


Has anyone got any experience with any other product we can deploy on-top of Spectrum Scale that will give us a true S3 API with full functionalities and able to answer on port 443? As of now we use HAProxy to forware ssl request back and forth from Scale S3 API.


We have looked at MinIO which seems to be fairly simple and maybe might solve a lot of incompatibilities with clients software. But the product seems to be very badly documented at least for me.

The idea is basically that a client uses their backup application(rubrik, veeam etc.) to connect to a domain(for example backup.mycompany.com) with their access and secret key and have access to their bucket only. and it must be over https/ssl.


If someone has any knowledge to minio or any other product that might solve our problem I will be glad to hear from you! ?

Thank you in advance!

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20201027/2ffb4d78/attachment-0001.htm>

From NISHAAN at za.ibm.com  Tue Oct 27 13:38:01 2020
From: NISHAAN at za.ibm.com (Nishaan Docrat)
Date: Tue, 27 Oct 2020 15:38:01 +0200
Subject: [gpfsug-discuss] Alternative to Scale S3 API.
In-Reply-To: <1109480230.484366.1603799162955@privateemail.com>
References: <1109480230.484366.1603799162955@privateemail.com>
Message-ID: <OF408A679D.4CEDC2ED-ON4225860E.00499000-4225860E.004AE49C@notes.na.collabserv.com>

Hi Andi

The current S3 compatibility in Spectrum Scale is delivered via the Swift3
middleware. This middleware has since been replaced by s3api in later
versions of Swift. Spectrum Scale 5.1 will make use of Swift Train release
which will include the new s3api middleware.

I've tested the S3 compatibility with a few applications including Spectrum
Scale itself (i.e. Cloud Data Sharing to another Scale Object store using
S3 API) and Spectrum Protect etc. and haven't had any issue. I've also ran
a few application tools to test for an S3 compliant object stores and again
had no issues.

You can use s3compat to test the current compatibility.. Or you can check
here for the current compatibility..
https://docs.openstack.org/swift/latest/s3_compat.html

Not sure if there is any other way to talk HTTPS without using HAProxy.

In any case, I've documented the process to setup an S3 compliant object
store including supporting virtual-hosted style bucket addressing which you
can find here..

https://www.linkedin.com/feed/update/urn:li:activity:6720227398756909056/

Most storage vendors including minio would not support the full S3 API
stack as alot of the calls are specific to AWS (like billing stuff etc.).

Anyway, good luck with your testing.

Kind Regards

Nishaan Docrat
Client Technical Specialist - Storage Systems
IBM Systems Hardware

Work: +27 (0)11 302 5001
Mobile: +27 (0)81 040 3793
Email: nishaan at za.ibm.com


From:	Andi Christiansen <andi at christiansen.xxx>
To:	"gpfsug-discuss at spectrumscale.org"
            <gpfsug-discuss at spectrumscale.org>
Date:	2020/10/27 13:59
Subject:	[EXTERNAL] [gpfsug-discuss] Alternative to Scale S3 API.
Sent by:	gpfsug-discuss-bounces at spectrumscale.org


Hi all,


We have over a longer period used the S3 API within spectrum Scale.. And
that has shown that it does not support very many applications because of
limitations of the API..


Has anyone got any experience with any other product we can deploy on-top
of Spectrum Scale that will give us a true S3 API with full functionalities
and able to answer on port 443? As of now we use HAProxy to forware ssl
request back and forth from Scale S3 API.


We have looked at MinIO which seems to be fairly simple and maybe might
solve a lot of incompatibilities with clients software. But the product
seems to be very badly documented at least for me.

The idea is basically that a client uses their backup application(rubrik,
veeam etc.) to connect to a domain(for example backup.mycompany.com) with
their access and secret key and have access to their bucket only. and it
must be over https/ssl.


If someone has any knowledge to minio or any other product that might solve
our problem I will be glad to hear from you! ?

Thank you in advance!
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20201027/70cc1bc2/attachment-0001.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 52733301.jpg
Type: image/jpeg
Size: 35643 bytes
Desc: not available
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20201027/70cc1bc2/attachment-0001.jpg>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: graycol.gif
Type: image/gif
Size: 105 bytes
Desc: not available
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20201027/70cc1bc2/attachment-0001.gif>

From andi at christiansen.xxx  Wed Oct 28 06:24:52 2020
From: andi at christiansen.xxx (Andi Christiansen)
Date: Wed, 28 Oct 2020 07:24:52 +0100 (CET)
Subject: [gpfsug-discuss] Alternative to Scale S3 API.
In-Reply-To: <OF408A679D.4CEDC2ED-ON4225860E.00499000-4225860E.004AE49C@notes.na.collabserv.com>
References: <1109480230.484366.1603799162955@privateemail.com>
	<OF408A679D.4CEDC2ED-ON4225860E.00499000-4225860E.004AE49C@notes.na.collabserv.com>
Message-ID: <2126571944.509878.1603866292369@privateemail.com>

Hi Nishaan,

Thanks for you reply.

When you say 5.1? is that 5.1.x.x or 5.0.5.1?

Some of the limitations we have encountered is the multipart upload not being supported and some md5sum that the s3 api does that veeam actually dont like. also interms of the management on the Scale GUI, that has to be on one of the S3 CES nodes in order to be able to show project, container etc... but when you have a HAProxy for enabling SSL then a GUI is not available as they both use port 443? 


i know min.io is not the full stack of S3 API commands but as far as i can read it comes with more features out of the box than Scale S3 does, multipart for an example...

I looked through your documentation and its very close to what we have set up today and found to not work... 

If multipart uploads would be supported today on scale S3 i would think about still using scale for the s3 part but as i expect that you talk about 5.1.x.x i dont see that being released any time soon? and dont know if that is actually going to be supported in that release then i cant wait for that to happen..

Best Regards
Andi Christiansen

>     On 10/27/2020 2:38 PM Nishaan Docrat <nishaan at za.ibm.com> wrote:
> 
> 
> 
>     Hi Andi
> 
>     The current S3 compatibility in Spectrum Scale is delivered via the Swift3 middleware. This middleware has since been replaced by s3api in later versions of Swift. Spectrum Scale 5.1 will make use of Swift Train release which will include the new s3api middleware.
> 
>     I've tested the S3 compatibility with a few applications including Spectrum Scale itself (i.e. Cloud Data Sharing to another Scale Object store using S3 API) and Spectrum Protect etc. and haven't had any issue. I've also ran a few application tools to test for an S3 compliant object stores and again had no issues.
> 
>     You can use s3compat to test the current compatibility.. Or you can check here for the current compatibility.. https://docs.openstack.org/swift/latest/s3_compat.html https://docs.openstack.org/swift/latest/s3_compat.html
> 
>     Not sure if there is any other way to talk HTTPS without using HAProxy.
> 
>     In any case, I've documented the process to setup an S3 compliant object store including supporting virtual-hosted style bucket addressing which you can find here..
> 
>     https://www.linkedin.com/feed/update/urn:li:activity:6720227398756909056/ https://www.linkedin.com/feed/update/urn:li:activity:6720227398756909056/
> 
>     Most storage vendors including minio would not support the full S3 API stack as alot of the calls are specific to AWS (like billing stuff etc.).
> 
>     Anyway, good luck with your testing.
> 
>     Kind Regards
> 
>     Nishaan Docrat
>     Client Technical Specialist - Storage Systems
>     IBM Systems Hardware
> 
>     Work: +27 (0)11 302 5001
>     Mobile: +27 (0)81 040 3793
>     Email: nishaan at za.ibm.com http://www.ibm.com/storage
> 
> 
> 
>     [Inactive hide details for Andi Christiansen ---2020/10/27 13:59:30---Hi all, We have over a longer period used the S3 API withi]Andi Christiansen ---2020/10/27 13:59:30---Hi all, We have over a longer period used the S3 API within spectrum Scale.. And that has shown that
> 
>     From: Andi Christiansen <andi at christiansen.xxx>
>     To: "gpfsug-discuss at spectrumscale.org" <gpfsug-discuss at spectrumscale.org>
>     Date: 2020/10/27 13:59
>     Subject: [EXTERNAL] [gpfsug-discuss] Alternative to Scale S3 API.
>     Sent by: gpfsug-discuss-bounces at spectrumscale.org
> 
> 
>     ---------------------------------------------
> 
> 
> 
>     Hi all,
> 
> 
> 
>     We have over a longer period used the S3 API within spectrum Scale.. And that has shown that it does not support very many applications because of limitations of the API..
> 
> 
> 
>     Has anyone got any experience with any other product we can deploy on-top of Spectrum Scale that will give us a true S3 API with full functionalities and able to answer on port 443? As of now we use HAProxy to forware ssl request back and forth from Scale S3 API.
> 
> 
> 
>     We have looked at MinIO which seems to be fairly simple and maybe might solve a lot of incompatibilities with clients software. But the product seems to be very badly documented at least for me.
> 
>     The idea is basically that a client uses their backup application(rubrik, veeam etc.) to connect to a domain(for example backup.mycompany.com) with their access and secret key and have access to their bucket only. and it must be over https/ssl.
> 
> 
> 
>     If someone has any knowledge to minio or any other product that might solve our problem I will be glad to hear from you! ?
> 
>     Thank you in advance!
>     _______________________________________________
>     gpfsug-discuss mailing list
>     gpfsug-discuss at spectrumscale.org
>     http://gpfsug.org/mailman/listinfo/gpfsug-discuss 
> 
> 
> 
>     _______________________________________________
>     gpfsug-discuss mailing list
>     gpfsug-discuss at spectrumscale.org
>     http://gpfsug.org/mailman/listinfo/gpfsug-discuss
> 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20201028/b16944e4/attachment-0001.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 52733301.jpg
Type: image/jpeg
Size: 35643 bytes
Desc: not available
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20201028/b16944e4/attachment-0001.jpg>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: graycol.gif
Type: image/gif
Size: 105 bytes
Desc: not available
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20201028/b16944e4/attachment-0001.gif>

From NISHAAN at za.ibm.com  Wed Oct 28 06:45:29 2020
From: NISHAAN at za.ibm.com (Nishaan Docrat)
Date: Wed, 28 Oct 2020 08:45:29 +0200
Subject: [gpfsug-discuss] Alternative to Scale S3 API.
In-Reply-To: <2126571944.509878.1603866292369@privateemail.com>
References: <1109480230.484366.1603799162955@privateemail.com>
	<OF408A679D.4CEDC2ED-ON4225860E.00499000-4225860E.004AE49C@notes.na.collabserv.com>
	<2126571944.509878.1603866292369@privateemail.com>
Message-ID: <OF4063F396.B7DA9D54-ON4225860F.0023CA79-4225860F.00251FA3@notes.na.collabserv.com>

Hi Andi

The s3api middleware does support multipart uploads..

https://docs.openstack.org/swift/latest/s3_compat.html

The current version of Swift (PIKE) that is bundled with Spectrum Scale
5.0.X doesn't..
https://www.ibm.com/support/knowledgecenter/en/STXKQY_5.0.5/com.ibm.spectrum.scale.v5r05.doc/bl1adm_ManagingOpenStackACLsviaAmazonS3API.htm

According to the Spectrum Scale Roadmap, 5.1 is due out 2H20.. Not sure if
someone from development can confirm the GA date.

Does Veeam have a test utility? You could always test it using the current
Swift AIO or if you can provide me with a test utility I can test that for
you.

Kind Regards

Nishaan Docrat
Client Technical Specialist - Storage Systems
IBM Systems Hardware

Work: +27 (0)11 302 5001
Mobile: +27 (0)81 040 3793
Email: nishaan at za.ibm.com


From:	Andi Christiansen <andi at christiansen.xxx>
To:	gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>,
            Nishaan Docrat <NISHAAN at za.ibm.com>
Date:	2020/10/28 08:24
Subject:	[EXTERNAL] Re: [gpfsug-discuss] Alternative to Scale S3 API.


Hi Nishaan, Thanks for you reply. When you say 5.1? is that 5.1.x.x or
5.0.5.1? Some...
                                                                            
                                                                            
                   This Message Is From an External Sender                  
                   This message came from outside your organization.        
                                                                            
                                                                            
Hi Nishaan,

Thanks for you reply.

When you say 5.1? is that 5.1.x.x or 5.0.5.1?

Some of the limitations we have encountered is the multipart upload not
being supported and some md5sum that the s3 api does that veeam actually
dont like. also interms of the management on the Scale GUI, that has to be
on one of the S3 CES nodes in order to be able to show project, container
etc... but when you have a HAProxy for enabling SSL then a GUI is not
available as they both use port 443?


i know min.io is not the full stack of S3 API commands but as far as i can
read it comes with more features out of the box than Scale S3 does,
multipart for an example...

I looked through your documentation and its very close to what we have set
up today and found to not work...

If multipart uploads would be supported today on scale S3 i would think
about still using scale for the s3 part but as i expect that you talk about
5.1.x.x i dont see that being released any time soon? and dont know if that
is actually going to be supported in that release then i cant wait for that
to happen..

Best Regards
Andi Christiansen
      On 10/27/2020 2:38 PM Nishaan Docrat <nishaan at za.ibm.com> wrote:


      Hi Andi

      The current S3 compatibility in Spectrum Scale is delivered via the
      Swift3 middleware. This middleware has since been replaced by s3api
      in later versions of Swift. Spectrum Scale 5.1 will make use of Swift
      Train release which will include the new s3api middleware.

      I've tested the S3 compatibility with a few applications including
      Spectrum Scale itself (i.e. Cloud Data Sharing to another Scale
      Object store using S3 API) and Spectrum Protect etc. and haven't had
      any issue. I've also ran a few application tools to test for an S3
      compliant object stores and again had no issues.

      You can use s3compat to test the current compatibility.. Or you can
      check here for the current compatibility..
      https://docs.openstack.org/swift/latest/s3_compat.html

      Not sure if there is any other way to talk HTTPS without using
      HAProxy.

      In any case, I've documented the process to setup an S3 compliant
      object store including supporting virtual-hosted style bucket
      addressing which you can find here..

      https://www.linkedin.com/feed/update/urn:li:activity:6720227398756909056/


      Most storage vendors including minio would not support the full S3
      API stack as alot of the calls are specific to AWS (like billing
      stuff etc.).

      Anyway, good luck with your testing.

      Kind Regards

      Nishaan Docrat
      Client Technical Specialist - Storage Systems
      IBM Systems Hardware

      Work: +27 (0)11 302 5001
      Mobile: +27 (0)81 040 3793
      Email: nishaan at za.ibm.com


      Inactive hide details for Andi Christiansen ---2020/10/27
      13:59:30---Hi all, We have over a longer period used the S3 API withi
      Andi Christiansen ---2020/10/27 13:59:30---Hi all, We have over a
      longer period used the S3 API within spectrum Scale.. And that has
      shown that

      From: Andi Christiansen <andi at christiansen.xxx>
      To: "gpfsug-discuss at spectrumscale.org"
      <gpfsug-discuss at spectrumscale.org>
      Date: 2020/10/27 13:59
      Subject: [EXTERNAL] [gpfsug-discuss] Alternative to Scale S3 API.
      Sent by: gpfsug-discuss-bounces at spectrumscale.org


      Hi all,


      We have over a longer period used the S3 API within spectrum Scale..
      And that has shown that it does not support very many applications
      because of limitations of the API..


      Has anyone got any experience with any other product we can deploy
      on-top of Spectrum Scale that will give us a true S3 API with full
      functionalities and able to answer on port 443? As of now we use
      HAProxy to forware ssl request back and forth from Scale S3 API.


      We have looked at MinIO which seems to be fairly simple and maybe
      might solve a lot of incompatibilities with clients software. But the
      product seems to be very badly documented at least for me.

      The idea is basically that a client uses their backup application
      (rubrik, veeam etc.) to connect to a domain(for example
      backup.mycompany.com) with their access and secret key and have
      access to their bucket only. and it must be over https/ssl.


      If someone has any knowledge to minio or any other product that might
      solve our problem I will be glad to hear from you! ?

      Thank you in advance!
      _______________________________________________
      gpfsug-discuss mailing list
      gpfsug-discuss at spectrumscale.org
      http://gpfsug.org/mailman/listinfo/gpfsug-discuss


      _______________________________________________
      gpfsug-discuss mailing list
      gpfsug-discuss at spectrumscale.org
      http://gpfsug.org/mailman/listinfo/gpfsug-discuss


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20201028/4c5ddab4/attachment-0001.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 19991351.jpg
Type: image/jpeg
Size: 35643 bytes
Desc: not available
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20201028/4c5ddab4/attachment-0001.jpg>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: graycol.gif
Type: image/gif
Size: 105 bytes
Desc: not available
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20201028/4c5ddab4/attachment-0001.gif>

From NISHAAN at za.ibm.com  Wed Oct 28 07:12:55 2020
From: NISHAAN at za.ibm.com (Nishaan Docrat)
Date: Wed, 28 Oct 2020 09:12:55 +0200
Subject: [gpfsug-discuss] Alternative to Scale S3 API.
In-Reply-To: <2126571944.509878.1603866292369@privateemail.com>
References: <1109480230.484366.1603799162955@privateemail.com>
	<OF408A679D.4CEDC2ED-ON4225860E.00499000-4225860E.004AE49C@notes.na.collabserv.com>
	<2126571944.509878.1603866292369@privateemail.com>
Message-ID: <OF5F31EC6E.6235CA80-ON4225860F.0027378A-4225860F.0027A28C@notes.na.collabserv.com>

Hi Andi

Sorry forgot to mention that I was told 5.1 will include the Swift Train
release (2.23). The change from swift3 middleware to s3api was done in the
Queens release (2.18)  so 5.1 will definitely have multipart support.

Kind Regards

Nishaan Docrat
Client Technical Specialist - Storage Systems
IBM Systems Hardware

Work: +27 (0)11 302 5001
Mobile: +27 (0)81 040 3793
Email: nishaan at za.ibm.com


From:	Andi Christiansen <andi at christiansen.xxx>
To:	gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>,
            Nishaan Docrat <NISHAAN at za.ibm.com>
Date:	2020/10/28 08:24
Subject:	[EXTERNAL] Re: [gpfsug-discuss] Alternative to Scale S3 API.


Hi Nishaan, Thanks for you reply. When you say 5.1? is that 5.1.x.x or
5.0.5.1? Some...
                                                                            
                                                                            
                   This Message Is From an External Sender                  
                   This message came from outside your organization.        
                                                                            
                                                                            
Hi Nishaan,

Thanks for you reply.

When you say 5.1? is that 5.1.x.x or 5.0.5.1?

Some of the limitations we have encountered is the multipart upload not
being supported and some md5sum that the s3 api does that veeam actually
dont like. also interms of the management on the Scale GUI, that has to be
on one of the S3 CES nodes in order to be able to show project, container
etc... but when you have a HAProxy for enabling SSL then a GUI is not
available as they both use port 443?


i know min.io is not the full stack of S3 API commands but as far as i can
read it comes with more features out of the box than Scale S3 does,
multipart for an example...

I looked through your documentation and its very close to what we have set
up today and found to not work...

If multipart uploads would be supported today on scale S3 i would think
about still using scale for the s3 part but as i expect that you talk about
5.1.x.x i dont see that being released any time soon? and dont know if that
is actually going to be supported in that release then i cant wait for that
to happen..

Best Regards
Andi Christiansen
      On 10/27/2020 2:38 PM Nishaan Docrat <nishaan at za.ibm.com> wrote:


      Hi Andi

      The current S3 compatibility in Spectrum Scale is delivered via the
      Swift3 middleware. This middleware has since been replaced by s3api
      in later versions of Swift. Spectrum Scale 5.1 will make use of Swift
      Train release which will include the new s3api middleware.

      I've tested the S3 compatibility with a few applications including
      Spectrum Scale itself (i.e. Cloud Data Sharing to another Scale
      Object store using S3 API) and Spectrum Protect etc. and haven't had
      any issue. I've also ran a few application tools to test for an S3
      compliant object stores and again had no issues.

      You can use s3compat to test the current compatibility.. Or you can
      check here for the current compatibility..
      https://docs.openstack.org/swift/latest/s3_compat.html

      Not sure if there is any other way to talk HTTPS without using
      HAProxy.

      In any case, I've documented the process to setup an S3 compliant
      object store including supporting virtual-hosted style bucket
      addressing which you can find here..

      https://www.linkedin.com/feed/update/urn:li:activity:6720227398756909056/


      Most storage vendors including minio would not support the full S3
      API stack as alot of the calls are specific to AWS (like billing
      stuff etc.).

      Anyway, good luck with your testing.

      Kind Regards

      Nishaan Docrat
      Client Technical Specialist - Storage Systems
      IBM Systems Hardware

      Work: +27 (0)11 302 5001
      Mobile: +27 (0)81 040 3793
      Email: nishaan at za.ibm.com


      Inactive hide details for Andi Christiansen ---2020/10/27
      13:59:30---Hi all, We have over a longer period used the S3 API withi
      Andi Christiansen ---2020/10/27 13:59:30---Hi all, We have over a
      longer period used the S3 API within spectrum Scale.. And that has
      shown that

      From: Andi Christiansen <andi at christiansen.xxx>
      To: "gpfsug-discuss at spectrumscale.org"
      <gpfsug-discuss at spectrumscale.org>
      Date: 2020/10/27 13:59
      Subject: [EXTERNAL] [gpfsug-discuss] Alternative to Scale S3 API.
      Sent by: gpfsug-discuss-bounces at spectrumscale.org


      Hi all,


      We have over a longer period used the S3 API within spectrum Scale..
      And that has shown that it does not support very many applications
      because of limitations of the API..


      Has anyone got any experience with any other product we can deploy
      on-top of Spectrum Scale that will give us a true S3 API with full
      functionalities and able to answer on port 443? As of now we use
      HAProxy to forware ssl request back and forth from Scale S3 API.


      We have looked at MinIO which seems to be fairly simple and maybe
      might solve a lot of incompatibilities with clients software. But the
      product seems to be very badly documented at least for me.

      The idea is basically that a client uses their backup application
      (rubrik, veeam etc.) to connect to a domain(for example
      backup.mycompany.com) with their access and secret key and have
      access to their bucket only. and it must be over https/ssl.


      If someone has any knowledge to minio or any other product that might
      solve our problem I will be glad to hear from you! ?

      Thank you in advance!
      _______________________________________________
      gpfsug-discuss mailing list
      gpfsug-discuss at spectrumscale.org
      http://gpfsug.org/mailman/listinfo/gpfsug-discuss


      _______________________________________________
      gpfsug-discuss mailing list
      gpfsug-discuss at spectrumscale.org
      http://gpfsug.org/mailman/listinfo/gpfsug-discuss


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20201028/0aaf9fcd/attachment-0001.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 17810834.jpg
Type: image/jpeg
Size: 35643 bytes
Desc: not available
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20201028/0aaf9fcd/attachment-0001.jpg>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: graycol.gif
Type: image/gif
Size: 105 bytes
Desc: not available
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20201028/0aaf9fcd/attachment-0001.gif>

From luis.bolinches at fi.ibm.com  Wed Oct 28 07:15:21 2020
From: luis.bolinches at fi.ibm.com (Luis Bolinches)
Date: Wed, 28 Oct 2020 07:15:21 +0000
Subject: [gpfsug-discuss] Alternative to Scale S3 API.
In-Reply-To: <OF4063F396.B7DA9D54-ON4225860F.0023CA79-4225860F.00251FA3@notes.na.collabserv.com>
Message-ID: <OF06DC9BE9.BBFA65F5-ON0025860F.0027441B-0025860F.0027DBD0@notes.na.collabserv.com>

An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20201028/91a8e7e5/attachment-0001.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: Image.1__=CDBB0C9CDFB04CE98f9e8a93df938690918cCDB at .jpg
Type: image/jpeg
Size: 35643 bytes
Desc: not available
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20201028/91a8e7e5/attachment-0002.jpg>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: Image.2__=CDBB0C9CDFB04CE98f9e8a93df938690918cCDB at .gif
Type: image/gif
Size: 105 bytes
Desc: not available
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20201028/91a8e7e5/attachment-0002.gif>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: Image.1__=CDBB0C9CDFB04CE98f9e8a93df938690918cCDB at .jpg
Type: image/jpeg
Size: 35643 bytes
Desc: not available
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20201028/91a8e7e5/attachment-0003.jpg>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: Image.2__=CDBB0C9CDFB04CE98f9e8a93df938690918cCDB at .gif
Type: image/gif
Size: 105 bytes
Desc: not available
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20201028/91a8e7e5/attachment-0003.gif>

From NISHAAN at za.ibm.com  Wed Oct 28 07:45:45 2020
From: NISHAAN at za.ibm.com (Nishaan Docrat)
Date: Wed, 28 Oct 2020 09:45:45 +0200
Subject: [gpfsug-discuss] Alternative to Scale S3 API.
In-Reply-To: <OF06DC9BE9.BBFA65F5-ON0025860F.0027441B-0025860F.0027DBD0@notes.na.collabserv.com>
References: <OF4063F396.B7DA9D54-ON4225860F.0023CA79-4225860F.00251FA3@notes.na.collabserv.com>
	<OF06DC9BE9.BBFA65F5-ON0025860F.0027441B-0025860F.0027DBD0@notes.na.collabserv.com>
Message-ID: <OF047A1414.5894B72F-ON4225860F.002A5F54-4225860F.002AA438@notes.na.collabserv.com>

Hi Luis

Thanks for your reply.. It should address Andi's issue as the underlying
Swift version is what is important and the functionality he needs is in the
latest releases (I was told 5.1 includes Swift Train which is the latest
version).

Am sure there is a beta program for Spectrum Scale.. Perhaps Andi should
speak to his software sales rep and ask to be included on it to get access
so that he can test.

Kind Regards

Nishaan Docrat
Client Technical Specialist - Storage Systems
IBM Systems Hardware

Work: +27 (0)11 302 5001
Mobile: +27 (0)81 040 3793
Email: nishaan at za.ibm.com


From:	"Luis Bolinches" <luis.bolinches at fi.ibm.com>
To:	gpfsug-discuss at spectrumscale.org
Cc:	gpfsug-discuss at spectrumscale.org
Date:	2020/10/28 09:29
Subject:	[EXTERNAL] Re: [gpfsug-discuss] Alternative to Scale S3 API.
Sent by:	gpfsug-discuss-bounces at spectrumscale.org


Hi

5.1.x is going GA very soon (TM). Would it address the issues Andi sees on
his environment or not I cannot say.

I can take it with Andi for more details on the GA date


--
Yst?v?llisin terveisin / Kind regards / Saludos cordiales / Salutations /
Salutacions
Luis Bolinches
Consultant IT Specialist
IBM Spectrum Scale development
Mobile Phone: +358503112585

https://www.youracclaim.com/user/luis-bolinches

Ab IBM Finland Oy
Laajalahdentie 23
00330 Helsinki
Uusimaa - Finland

"If you always give you will always have" --  Anonymous


 ----- Original message -----
 From: "Nishaan Docrat" <NISHAAN at za.ibm.com>
 Sent by: gpfsug-discuss-bounces at spectrumscale.org
 To: Andi Christiansen <andi at christiansen.xxx>
 Cc: gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
 Subject: [EXTERNAL] Re: [gpfsug-discuss] Alternative to Scale S3 API.
 Date: Wed, Oct 28, 2020 08:47


 Hi Andi

 The s3api middleware does support multipart uploads..

 https://docs.openstack.org/swift/latest/s3_compat.html

 The current version of Swift (PIKE) that is bundled with Spectrum Scale
 5.0.X doesn't..
 https://www.ibm.com/support/knowledgecenter/en/STXKQY_5.0.5/com.ibm.spectrum.scale.v5r05.doc/bl1adm_ManagingOpenStackACLsviaAmazonS3API.htm


 According to the Spectrum Scale Roadmap, 5.1 is due out 2H20.. Not sure if
 someone from development can confirm the GA date.

 Does Veeam have a test utility? You could always test it using the current
 Swift AIO or if you can provide me with a test utility I can test that for
 you.

 Kind Regards

 Nishaan Docrat
 Client Technical Specialist - Storage Systems
 IBM Systems Hardware

 Work: +27 (0)11 302 5001
 Mobile: +27 (0)81 040 3793
 Email: nishaan at za.ibm.com


 Inactive hide details for Andi Christiansen ---2020/10/28 08:24:55---Hi
 Nishaan, Thanks for you reply.Andi Christiansen ---2020/10/28
 08:24:55---Hi Nishaan, Thanks for you reply.

 From: Andi Christiansen <andi at christiansen.xxx>
 To: gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>,
 Nishaan Docrat <NISHAAN at za.ibm.com>
 Date: 2020/10/28 08:24
 Subject: [EXTERNAL] Re: [gpfsug-discuss] Alternative to Scale S3 API.


 Hi Nishaan, Thanks for you reply. When you say 5.1? is that 5.1.x.x or
 5.0.5.1? Some...
                                                                            
                                                                            
               This Message Is From an External Sender                      
               This message came from outside your organization.            
                                                                            
                                                                            
 Hi Nishaan,

 Thanks for you reply.

 When you say 5.1? is that 5.1.x.x or 5.0.5.1?

 Some of the limitations we have encountered is the multipart upload not
 being supported and some md5sum that the s3 api does that veeam actually
 dont like. also interms of the management on the Scale GUI, that has to be
 on one of the S3 CES nodes in order to be able to show project, container
 etc... but when you have a HAProxy for enabling SSL then a GUI is not
 available as they both use port 443?


 i know min.io is not the full stack of S3 API commands but as far as i can
 read it comes with more features out of the box than Scale S3 does,
 multipart for an example...

 I looked through your documentation and its very close to what we have set
 up today and found to not work...

 If multipart uploads would be supported today on scale S3 i would think
 about still using scale for the s3 part but as i expect that you talk
 about 5.1.x.x i dont see that being released any time soon? and dont know
 if that is actually going to be supported in that release then i cant wait
 for that to happen..

 Best Regards
 Andi Christiansen
             On 10/27/2020 2:38 PM Nishaan Docrat <nishaan at za.ibm.com>
             wrote:


             Hi Andi

             The current S3 compatibility in Spectrum Scale is delivered
             via the Swift3 middleware. This middleware has since been
             replaced by s3api in later versions of Swift. Spectrum Scale
             5.1 will make use of Swift Train release which will include
             the new s3api middleware.

             I've tested the S3 compatibility with a few applications
             including Spectrum Scale itself (i.e. Cloud Data Sharing to
             another Scale Object store using S3 API) and Spectrum Protect
             etc. and haven't had any issue. I've also ran a few
             application tools to test for an S3 compliant object stores
             and again had no issues.

             You can use s3compat to test the current compatibility.. Or
             you can check here for the current compatibility..
             https://docs.openstack.org/swift/latest/s3_compat.html

             Not sure if there is any other way to talk HTTPS without using
             HAProxy.

             In any case, I've documented the process to setup an S3
             compliant object store including supporting virtual-hosted
             style bucket addressing which you can find here..

             https://www.linkedin.com/feed/update/urn:li:activity:6720227398756909056/


             Most storage vendors including minio would not support the
             full S3 API stack as alot of the calls are specific to AWS
             (like billing stuff etc.).

             Anyway, good luck with your testing.

             Kind Regards

             Nishaan Docrat
             Client Technical Specialist - Storage Systems
             IBM Systems Hardware

             Work: +27 (0)11 302 5001
             Mobile: +27 (0)81 040 3793
             Email: nishaan at za.ibm.com


             Inactive hide details for Andi Christiansen ---2020/10/27
             13:59:30---Hi all, We have over a longer period used the S3
             API withiAndi Christiansen ---2020/10/27 13:59:30---Hi all, We
             have over a longer period used the S3 API within spectrum
             Scale.. And that has shown that

             From: Andi Christiansen <andi at christiansen.xxx>
             To: "gpfsug-discuss at spectrumscale.org"
             <gpfsug-discuss at spectrumscale.org>
             Date: 2020/10/27 13:59
             Subject: [EXTERNAL] [gpfsug-discuss] Alternative to Scale S3
             API.
             Sent by: gpfsug-discuss-bounces at spectrumscale.org


             Hi all,


             We have over a longer period used the S3 API within spectrum
             Scale.. And that has shown that it does not support very many
             applications because of limitations of the API..


             Has anyone got any experience with any other product we can
             deploy on-top of Spectrum Scale that will give us a true S3
             API with full functionalities and able to answer on port 443?
             As of now we use HAProxy to forware ssl request back and forth
             from Scale S3 API.


             We have looked at MinIO which seems to be fairly simple and
             maybe might solve a lot of incompatibilities with clients
             software. But the product seems to be very badly documented at
             least for me.

             The idea is basically that a client uses their backup
             application(rubrik, veeam etc.) to connect to a domain(for
             example backup.mycompany.com) with their access and secret key
             and have access to their bucket only. and it must be over
             https/ssl.


             If someone has any knowledge to minio or any other product
             that might solve our problem I will be glad to hear from you!
             ?

             Thank you in advance!
             _______________________________________________
             gpfsug-discuss mailing list
             gpfsug-discuss at spectrumscale.org
             http://gpfsug.org/mailman/listinfo/gpfsug-discuss


             _______________________________________________
             gpfsug-discuss mailing list
             gpfsug-discuss at spectrumscale.org
             http://gpfsug.org/mailman/listinfo/gpfsug-discuss


 _______________________________________________
 gpfsug-discuss mailing list
 gpfsug-discuss at spectrumscale.org
 http://gpfsug.org/mailman/listinfo/gpfsug-discuss


Ellei edell? ole toisin mainittu: / Unless stated otherwise above:
Oy IBM Finland Ab
PL 265, 00101 Helsinki, Finland
Business ID, Y-tunnus: 0195876-3
Registered in Finland
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20201028/3cef7908/attachment-0001.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 16781831.jpg
Type: image/jpeg
Size: 35643 bytes
Desc: not available
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20201028/3cef7908/attachment-0001.jpg>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: graycol.gif
Type: image/gif
Size: 105 bytes
Desc: not available
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20201028/3cef7908/attachment-0001.gif>

From luis.bolinches at fi.ibm.com  Wed Oct 28 07:51:30 2020
From: luis.bolinches at fi.ibm.com (Luis Bolinches)
Date: Wed, 28 Oct 2020 07:51:30 +0000
Subject: [gpfsug-discuss] Alternative to Scale S3 API.
In-Reply-To: <OF047A1414.5894B72F-ON4225860F.002A5F54-4225860F.002AA438@notes.na.collabserv.com>
References: <OF047A1414.5894B72F-ON4225860F.002A5F54-4225860F.002AA438@notes.na.collabserv.com>,
	<OF4063F396.B7DA9D54-ON4225860F.0023CA79-4225860F.00251FA3@notes.na.collabserv.com><OF06DC9BE9.BBFA65F5-ON0025860F.0027441B-0025860F.0027DBD0@notes.na.collabserv.com>
Message-ID: <OF2F6C2E30.34B1F144-ON0025860F.002B29E9-0025860F.002B2B09@notes.na.collabserv.com>

An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20201028/e02ad49e/attachment-0001.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: Image.1__=CDBB0C9CDFB9D9C48f9e8a93df938690918cCDB at .jpg
Type: image/jpeg
Size: 35643 bytes
Desc: not available
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20201028/e02ad49e/attachment-0001.jpg>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: Image.2__=CDBB0C9CDFB9D9C48f9e8a93df938690918cCDB at .gif
Type: image/gif
Size: 105 bytes
Desc: not available
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20201028/e02ad49e/attachment-0001.gif>

From Robert.Oesterlin at nuance.com  Thu Oct 29 11:16:13 2020
From: Robert.Oesterlin at nuance.com (Oesterlin, Robert)
Date: Thu, 29 Oct 2020 11:16:13 +0000
Subject: [gpfsug-discuss] SSUG Digital Expert Talk: 11/4 - AI workloads on
 NVIDIA DGX, Red Hat OpenShift and IBM Spectrum Scale
Message-ID: <77EA43ED-C430-42CA-872E-D2307F244775@nuance.com>

Reminder for our  upcoming expert talk:

SSUG::Digital: Scalable multi-node training for AI workloads on NVIDIA DGX, Red Hat OpenShift and IBM Spectrum Scale
November 4 @ 16:15 - 17:45 GMT

Nvidia and IBM did a complex proof-of-concept to demonstrate the scaling of AI workload using Nvidia DGX, Red Hat OpenShift and IBM Spectrum Scale at the example of ResNet-50 and the segmentation of images using the Audi A2D2 dataset. The project team published an IBM Redpaper with all the technical details and will present the key learnings and results.

Registration link for Webex session: https://www.spectrumscaleug.org/event/ssugdigital-multi-node-training-for-ai-workloads/


Bob Oesterlin
Sr Principal Storage Engineer, Nuance

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20201029/38f0946b/attachment-0001.htm>

From kkr at lbl.gov  Thu Oct 29 21:43:02 2020
From: kkr at lbl.gov (Kristy Kallback-Rose)
Date: Thu, 29 Oct 2020 14:43:02 -0700
Subject: [gpfsug-discuss] SC20 Planning - What questions would you ask a
	panel?
In-Reply-To: <B8A6C2AA-B4B0-473B-B7EC-C751F0EC9D7C@lbl.gov>
References: <B8A6C2AA-B4B0-473B-B7EC-C751F0EC9D7C@lbl.gov>
Message-ID: <3D8238FB-F8A5-48F1-BA5C-57AC93DCDE35@lbl.gov>

Really? There?s nothing you want to ask about GPFS/Spectrum Scale? There will be access to developers and management alike, so I have to imagine you have something to ask? Don?t be shy. 

Please help make this a lively discussion by submitting a question, or two. 

Best,
Kristy

> On Oct 13, 2020, at 2:34 PM, Kristy Kallback-Rose <kkr at lbl.gov> wrote:
> 
> Hi all,
> 
> 	By now you know SC will be digital this year. We are working towards some SC events for the Spectrum Scale User Group, and using our usual slot of Sunday did not seem like a great idea. So, we?re planning a couple 90-minute sessions and would like to do a panel during one of them. We?ll hope to do live Q&A, like an in-person Ask Me Anything session, but it?s probably a good idea to have a bank of questions ready as well, Also, that way your question may get asked, even if you can?t make the live session ?we?ll record these sessions for later viewing. 
> 
> 	So, please send your questions for the panel and we?ll get a list built up. Better yet, attend the sessions live! Details to come, but for now, hold these time slots:
> 
> November 16th - 8:00 AM Pacific/3:00 PM UTC 
> 
> November 18th - 8:00 AM Pacific/3:00 PM UTC 
> 
> Best,
> Kristy
> 
> Kristy Kallback-Rose
> Senior HPC Storage Systems Analyst
> National Energy Research Scientific Computing Center
> Lawrence Berkeley National Laboratory
> 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20201029/ec75fc5e/attachment-0001.htm>

From kkr at lbl.gov  Thu Oct 29 21:49:34 2020
From: kkr at lbl.gov (Kristy Kallback-Rose)
Date: Thu, 29 Oct 2020 14:49:34 -0700
Subject: [gpfsug-discuss] SC20  Sessions - Dates and times are settled,
	please join us!
Message-ID: <8BECE369-B5B4-404F-B4C0-07EE02DE6295@lbl.gov>

Hi all,

	The Spectrum Scale User Group will be hosting two 90 minute sessions at SC20 this year and we hope you can join us. The first one is:

 "Storage for AI" and will be held Monday, Nov. 16th, from 11:00-12:30 EST 

and the second one is 

"What's new in Spectrum Scale 5.1?" and will be held Wednesday, Nov. 18th from 11:00-12:30 EST.  

Please see the calendar at https://www.spectrumscaleug.org/eventslist/2020-11/ and register by clicking on a session on the calendar and then the "Please register here to join the session" link.

Best,
Kristy

Kristy Kallback-Rose
Senior HPC Storage Systems Analyst
National Energy Research Scientific Computing Center
Lawrence Berkeley National Laboratory


From heinrich.billich at id.ethz.ch  Fri Oct 30 12:21:58 2020
From: heinrich.billich at id.ethz.ch (Billich  Heinrich Rainer (ID SD))
Date: Fri, 30 Oct 2020 12:21:58 +0000
Subject: [gpfsug-discuss] 'ganesha_mgr display_export - client not listed
Message-ID: <660DD807-C723-44EF-BC51-57EFB296FFC4@id.ethz.ch>

Hello,

Some nfsv4 client of ganesha does not show up in the output of 'ganesha_mgr display_export'. The client has an active mount, but also shows some nfs issues, some commands did hang, the process just stays in state D (uninterruptible sleep) according to 'ps', but not the whole mount.  

I just wonder if the client's IP should always show up in the output of display_export once the client did issue a mount call and if the absence indicates that something is broken. 
Gutr,gut,
Putting it the other way round: When is a client listed in the output of display_export and when is it removed from the list?

We do collect more debug data, this is just something that catched my eye.
Thank you,

Heiner

We run ganesha 2.7.5-ibm058.05 on a spectrum scale system on RedHat 7.7.

I crosspost to the gpfsug mailing list.

-- 
=======================
Heinrich Billich
ETH Z?rich
Informatikdienste
Tel.: +41 44 632 72 56
heinrich.billich at id.ethz.ch
========================
 
 
# ganesha_mgr  display_export 37
Display export with id 37
export 37: path = /xxxx/yyy, pseudo = /xxx/yyy , tag = /xxx/yyy
 Client type,  CIDR version, CIDR address, CIDR mask, CIDR proto, Anonymous UID, Anonymous GID, Attribute timeout, Options, Set
  a.b.c.198/32,  0,  0,  255,  1,  4294967294,  4294967294,  0,  1126195680, 1081209831
 a.b.c.143/32,  0,  0,  255,  1,  4294967294,  4294967294,  0,  1126195680, 1081209831
 a.b.c.236/32,  0,  0,  255,  1,  4294967294,  4294967294,  0,  1126195680, 1081209831
 a.b.c.34/32,  0,  0,  255,  1,  4294967294,  4294967294,  0,  1126195680, 1081209831
 a.b.c.70/32,  0,  0,  255,  1,  4294967294,  4294967294,  0,  1126195680, 1081209831
 a.b.c.71/32,  0,  0,  255,  1,  4294967294,  4294967294,  0,  1126195680, 1081209831
 *,  0,  0,  0,  0,  4294967294,  4294967294,  0,  1126187490, 1081209831


From skylar2 at uw.edu  Fri Oct 30 14:01:37 2020
From: skylar2 at uw.edu (Skylar Thompson)
Date: Fri, 30 Oct 2020 07:01:37 -0700
Subject: [gpfsug-discuss] SC20 Planning - What questions would you ask a
 panel?
In-Reply-To: <3D8238FB-F8A5-48F1-BA5C-57AC93DCDE35@lbl.gov>
References: <B8A6C2AA-B4B0-473B-B7EC-C751F0EC9D7C@lbl.gov>
	<3D8238FB-F8A5-48F1-BA5C-57AC93DCDE35@lbl.gov>
Message-ID: <20201030140137.hakhxwppcmaoixy6@thargelion>

Here's one:

How is IBM working to improve the integration between TSM and GPFS? We're
in the biomedical space and have some overlapping regulatory requirements
around retention, which translate to complicated INCLUDE/EXCLUDE rules that
mmbackup has always had trouble processing. In particular, we need to be
able to INCLUDE particular paths to set a management class, but then
EXCLUDE particular paths, which results in mmbackup generating 
file lists for dsmc including those excluded paths, which dsmc can exclude
but it logs every single one every time it runs.

On Thu, Oct 29, 2020 at 02:43:02PM -0700, Kristy Kallback-Rose wrote:
> Really? There???s nothing you want to ask about GPFS/Spectrum Scale? There will be access to developers and management alike, so I have to imagine you have something to ask??? Don???t be shy. 
> 
> Please help make this a lively discussion by submitting a question, or two. 
> 
> Best,
> Kristy
> 
> > On Oct 13, 2020, at 2:34 PM, Kristy Kallback-Rose <kkr at lbl.gov> wrote:
> > 
> > Hi all,
> > 
> > 	By now you know SC will be digital this year. We are working towards some SC events for the Spectrum Scale User Group, and using our usual slot of Sunday did not seem like a great idea. So, we???re planning a couple 90-minute sessions and would like to do a panel during one of them. We???ll hope to do live Q&A, like an in-person Ask Me Anything session, but it???s probably a good idea to have a bank of questions ready as well, Also, that way your question may get asked, even if you can???t make the live session ???we???ll record these sessions for later viewing. 
> > 
> > 	So, please send your questions for the panel and we???ll get a list built up. Better yet, attend the sessions live! Details to come, but for now, hold these time slots:
> > 
> > November 16th - 8:00 AM Pacific/3:00 PM UTC 
> > 
> > November 18th - 8:00 AM Pacific/3:00 PM UTC 
> > 
> > Best,
> > Kristy
> > 
> > Kristy Kallback-Rose
> > Senior HPC Storage Systems Analyst
> > National Energy Research Scientific Computing Center
> > Lawrence Berkeley National Laboratory
> > 
> 

> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss


-- 
-- Skylar Thompson (skylar2 at u.washington.edu)
-- Genome Sciences Department (UW Medicine), System Administrator
-- Foege Building S046, (206)-685-7354
-- Pronouns: He/Him/His


From cblack at nygenome.org  Fri Oct 30 14:19:24 2020
From: cblack at nygenome.org (Christopher Black)
Date: Fri, 30 Oct 2020 14:19:24 +0000
Subject: [gpfsug-discuss] SC20  Sessions - Dates and times are settled,
 please join us!
In-Reply-To: <8BECE369-B5B4-404F-B4C0-07EE02DE6295@lbl.gov>
References: <8BECE369-B5B4-404F-B4C0-07EE02DE6295@lbl.gov>
Message-ID: <62E7471D-02B9-4C27-B0F0-4038CCB2C66E@nygenome.org>

Could you talk about upcoming work to address excessive prefetch when reading small fractions of many large files?
Some bioinformatics workloads have a client node reading relatively small regions of multiple 50GB+ files. We've seen this trigger excessive prefetch bandwidth (especially on 16MB block filesystem). Investigation shows that much of the prefetched data is never read, but cache gets full, evicts blocks, then more prefetch happens. We can avoid this by turning prefetch off, but that reduces speed of other workloads that read full files sequentially.  Turning prefetch on and off based on job won't work well for our users.

We've heard this would be addressed in gpfs 5.1 at the earliest and have provided an example workload to devs. They've done some great analysis and determined the problem is worse on large (16M) block filesystems (which are now the recommended and default on new ess filesystems with sub-block allocation enabled).

Best,
Chris

?On 10/29/20, 5:49 PM, "gpfsug-discuss-bounces at spectrumscale.org on behalf of Kristy Kallback-Rose" <gpfsug-discuss-bounces at spectrumscale.org on behalf of kkr at lbl.gov> wrote:

    Hi all,

    The Spectrum Scale User Group will be hosting two 90 minute sessions at SC20 this year and we hope you can join us. The first one is:

     "Storage for AI" and will be held Monday, Nov. 16th, from 11:00-12:30 EST

    and the second one is

    "What's new in Spectrum Scale 5.1?" and will be held Wednesday, Nov. 18th from 11:00-12:30 EST.

    Please see the calendar at https://urldefense.com/v3/__https://www.spectrumscaleug.org/eventslist/2020-11/__;!!C6sPl7C9qQ!G0wT65UH3HoMnjBM6_ZAVfZwWwJz5SoLE5gpB_LM0N8SNSU3TXItF31dfxG_8Pow$  and register by clicking on a session on the calendar and then the "Please register here to join the session" link.

    Best,
    Kristy

    Kristy Kallback-Rose
    Senior HPC Storage Systems Analyst
    National Energy Research Scientific Computing Center
    Lawrence Berkeley National Laboratory

    _______________________________________________
    gpfsug-discuss mailing list
    gpfsug-discuss at spectrumscale.org
    https://urldefense.com/v3/__http://gpfsug.org/mailman/listinfo/gpfsug-discuss__;!!C6sPl7C9qQ!G0wT65UH3HoMnjBM6_ZAVfZwWwJz5SoLE5gpB_LM0N8SNSU3TXItF31df0lybvoA$

________________________________

This message is for the recipient?s use only, and may contain confidential, privileged or protected information. Any unauthorized use or dissemination of this communication is prohibited. If you received this message in error, please immediately notify the sender and destroy all copies of this message. The recipient should check this email and any attachments for the presence of viruses, as we accept no liability for any damage caused by any virus transmitted by this email.

From jonathan.buzzard at strath.ac.uk  Fri Oct  2 17:14:12 2020
From: jonathan.buzzard at strath.ac.uk (Jonathan Buzzard)
Date: Fri, 2 Oct 2020 17:14:12 +0100
Subject: [gpfsug-discuss] Services on DSS/ESS nodes
Message-ID: <38bbcd24-4cc2-2abe-754f-3017d71c1eaa@strath.ac.uk>


What if any are the rules around running additional services on DSS/ESS 
nodes with regard to support? Let me outline our scenario

Our main cluster uses 10Gbps ethernet for storage with the DSS-G nodes 
hooked up with redundant 40Gbps ethernet.

However we have an older cluster that is used for undergraduate teaching 
that only has 1Gbps ethernet and QDR Infiniband. With no money to 
upgrade this to 10Gbps ethernet to support this we flipped one of the 
ports on the ConnectX4 cards on each DSS-G node to Infiniband and using 
IPoIB run the teaching nodes in this way.

However it means that we need an Ethernet to Infiniband gateway as the 
ethernet only connected nodes want to talk to the Infiniband connected 
ones on their Infiniband address. Not a problem we grabbed an old spare 
machine installed CentOS and configured it up to act as a bridge, and 
deploy a custom route to all the ethernet only connected nodes. It has 
been working fine for a couple of years now.

The problem is that this becomes firstly a single point of failure, on 
hardware that is six years old now. Secondly to apply updates on the 
gateway machine means all the teaching nodes have to be drained and GPFS 
umounted to reboot the machine after updates have been installed. It is 
currently not getting patched as frequently as I would like (and 
required by the Scottish government).

So thinking about it I have come to the conclusion that the ideal 
situation would be to use the DSS-G nodes as the gateway and run 
keepalived to move the gateway ethernet IP address between the two 
machines. It is idea because as long as one DSS-G node is up then there 
is a functioning gateway and nodes don't get ejected from the cluster. 
If both DSS-G nodes are down then there is no GPFS to mount anyway and 
lack of a gateway is a moot point.

I grabbed a couple of the teaching compute nodes in the summer and 
trialed it out. It works a treat.

I now need to check IBM are not going to throw a wobbler down the line 
if I need to get support before deploying it to the DSS-G nodes :-)


JAB.

-- 
Jonathan A. Buzzard                         Tel: +44141-5483420
HPC System Administrator, ARCHIE-WeSt.
University of Strathclyde, John Anderson Building, Glasgow. G4 0NG


From abeattie at au1.ibm.com  Fri Oct  2 23:19:15 2020
From: abeattie at au1.ibm.com (Andrew Beattie)
Date: Fri, 2 Oct 2020 22:19:15 +0000
Subject: [gpfsug-discuss] Services on DSS/ESS nodes
In-Reply-To: <38bbcd24-4cc2-2abe-754f-3017d71c1eaa@strath.ac.uk>
Message-ID: <OF777393BE.20C1535F-ON002585F5.007A9C47-1601677155073@notes.na.collabserv.com>


Jonathan,
I suggest you get a formal statement from Lenovo as the DSS-G Platform is
no longer an IBM platform.

But for ESS based platforms the answer would be, it is not supported to run
anything on the IO Servers other than GNR and the relevant Scale management
services, due to the fact that if you lose an IO Server, or if you in an
extended maintenance window the Server needs to host all the work that
would be being performed by both IO servers.

I don't know if Lenovo have different point if view.

Regards,

Andrew

Sent from my iPhone

> On 3 Oct 2020, at 02:14, Jonathan Buzzard <jonathan.buzzard at strath.ac.uk>
wrote:
>
> 
> What if any are the rules around running additional services on DSS/ESS
> nodes with regard to support? Let me outline our scenario
>
> Our main cluster uses 10Gbps ethernet for storage with the DSS-G nodes
> hooked up with redundant 40Gbps ethernet.
>
> However we have an older cluster that is used for undergraduate teaching
> that only has 1Gbps ethernet and QDR Infiniband. With no money to
> upgrade this to 10Gbps ethernet to support this we flipped one of the
> ports on the ConnectX4 cards on each DSS-G node to Infiniband and using
> IPoIB run the teaching nodes in this way.
>
> However it means that we need an Ethernet to Infiniband gateway as the
> ethernet only connected nodes want to talk to the Infiniband connected
> ones on their Infiniband address. Not a problem we grabbed an old spare
> machine installed CentOS and configured it up to act as a bridge, and
> deploy a custom route to all the ethernet only connected nodes. It has
> been working fine for a couple of years now.
>
> The problem is that this becomes firstly a single point of failure, on
> hardware that is six years old now. Secondly to apply updates on the
> gateway machine means all the teaching nodes have to be drained and GPFS
> umounted to reboot the machine after updates have been installed. It is
> currently not getting patched as frequently as I would like (and
> required by the Scottish government).
>
> So thinking about it I have come to the conclusion that the ideal
> situation would be to use the DSS-G nodes as the gateway and run
> keepalived to move the gateway ethernet IP address between the two
> machines. It is idea because as long as one DSS-G node is up then there
> is a functioning gateway and nodes don't get ejected from the cluster.
> If both DSS-G nodes are down then there is no GPFS to mount anyway and
> lack of a gateway is a moot point.
>
> I grabbed a couple of the teaching compute nodes in the summer and
> trialed it out. It works a treat.
>
> I now need to check IBM are not going to throw a wobbler down the line
> if I need to get support before deploying it to the DSS-G nodes :-)
>
>
> JAB.
>
> --
> Jonathan A. Buzzard                         Tel: +44141-5483420
> HPC System Administrator, ARCHIE-WeSt.
> University of Strathclyde, John Anderson Building, Glasgow. G4 0NG
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org
>
http://gpfsug.org/mailman/listinfo/gpfsug-discuss

>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20201002/6b69501c/attachment-0002.htm>

From jonathan.buzzard at strath.ac.uk  Sat Oct  3 11:06:41 2020
From: jonathan.buzzard at strath.ac.uk (Jonathan Buzzard)
Date: Sat, 3 Oct 2020 11:06:41 +0100
Subject: [gpfsug-discuss] Services on DSS/ESS nodes
In-Reply-To: <OF777393BE.20C1535F-ON002585F5.007A9C47-1601677155073@notes.na.collabserv.com>
References: <OF777393BE.20C1535F-ON002585F5.007A9C47-1601677155073@notes.na.collabserv.com>
Message-ID: <7af0ac41-a280-5ecd-3658-7af761a4bf9b@strath.ac.uk>

On 02/10/2020 23:19, Andrew Beattie wrote:
> Jonathan,
> I suggest you get a formal statement from Lenovo as the DSS-G Platform 
> is no longer an IBM platform.
> 
> But for ESS based platforms the answer would be, it is not supported to 
> run anything on the IO Servers other than GNR and the relevant Scale 
> management services, due to the fact that if you lose an IO Server, or 
> if you in an extended maintenance window the Server needs to host all 
> the work that would be being performed by both IO servers.
> 

In the past ~500 days the Infiniband to Ethernet gateway has shifted 
~13GB of data, or about 25MB a day. Meanwhile in the last 470 days the 
DSS-G nodes have each shifted several PB. The proposed additional 
traffic is a drop in the ocean.

On my actual routers which shift much more data (over 300TB externally) 
with an uptime of ~180 days at the moment the CPU time consumed by 
keepalived is just under 31 minutes or about 8 seconds a day. These are 
much punier CPU's too. The proposed additional CPU usage is another drop 
in the ocean.

Given Lenovo sold the *same* configuration with x3650's and SR650's the 
"need all the CPU grunt" is somewhat fishy. Between the bid being 
submitted and actual tender award the SR650's came out and we paid a bit 
extra to uplift to the newer server hardware with exactly the same disk 
configuration. I believe IBM have done the same with the ESS/GNR servers 
too over time the same applies there too.

IMHO given keepalived is a base RHEL package, IBM/Lenovo should be 
offering running Infiniband to Ethernet gateways on the DSS/ESS nodes as 
a supported configuration for mixed network technology clusters :-)

Running a couple extra servers for this purpose is obnoxious from an 
environmental standpoint. That's IBM's green credentials out the window 
if you ask me.

I would note under those rules running a Nagios, Zabbix etc. client on 
the nodes is not permitted either. I would suggest that most sites would 
be rather unhappy about that :-)


 > I don't know if Lenovo have different point if view.
 >

Problem is when I ring up for support on my DSS-G I speak to an IBM 
employee not a Lenovo one :-)


JAB.

-- 
Jonathan A. Buzzard                         Tel: +44141-5483420
HPC System Administrator, ARCHIE-WeSt.
University of Strathclyde, John Anderson Building, Glasgow. G4 0NG


From abeattie at au1.ibm.com  Sat Oct  3 11:55:05 2020
From: abeattie at au1.ibm.com (Andrew Beattie)
Date: Sat, 3 Oct 2020 10:55:05 +0000
Subject: [gpfsug-discuss] Services on DSS/ESS nodes
In-Reply-To: <7af0ac41-a280-5ecd-3658-7af761a4bf9b@strath.ac.uk>
Message-ID: <OF37F2C74A.D272F48C-ON002585F6.003BF987-1601722505378@notes.na.collabserv.com>


Why do you need to run any kind of monitoring client on an IO server the
GUI / performance monitor already does all of that work for you and
collects the data on the dedicated EMS server.

If you have a small storage environment the. Yes the processor and memory
may feel like overkill, but tuned appropriately an IO server will use all
the memory you can give it to drive IO performance,

If you want to run a hybrid / non standard architecture then the IBM ESS /
DGSS platform may not be the right platform in comparison to a build your
own architecture, how ever you then take all the support issues onto your
self rather than it being the vendors problem.

Sent from my iPhone

> On 3 Oct 2020, at 20:06, Jonathan Buzzard <jonathan.buzzard at strath.ac.uk>
wrote:
>
> On 02/10/2020 23:19, Andrew Beattie wrote:
>> Jonathan,
>> I suggest you get a formal statement from Lenovo as the DSS-G Platform
>> is no longer an IBM platform.
>>
>> But for ESS based platforms the answer would be, it is not supported to
>> run anything on the IO Servers other than GNR and the relevant Scale
>> management services, due to the fact that if you lose an IO Server, or
>> if you in an extended maintenance window the Server needs to host all
>> the work that would be being performed by both IO servers.
>>
>
> In the past ~500 days the Infiniband to Ethernet gateway has shifted
> ~13GB of data, or about 25MB a day. Meanwhile in the last 470 days the
> DSS-G nodes have each shifted several PB. The proposed additional
> traffic is a drop in the ocean.
>
> On my actual routers which shift much more data (over 300TB externally)
> with an uptime of ~180 days at the moment the CPU time consumed by
> keepalived is just under 31 minutes or about 8 seconds a day. These are
> much punier CPU's too. The proposed additional CPU usage is another drop
> in the ocean.
>
> Given Lenovo sold the *same* configuration with x3650's and SR650's the
> "need all the CPU grunt" is somewhat fishy. Between the bid being
> submitted and actual tender award the SR650's came out and we paid a bit
> extra to uplift to the newer server hardware with exactly the same disk
> configuration. I believe IBM have done the same with the ESS/GNR servers
> too over time the same applies there too.
>
> IMHO given keepalived is a base RHEL package, IBM/Lenovo should be
> offering running Infiniband to Ethernet gateways on the DSS/ESS nodes as
> a supported configuration for mixed network technology clusters :-)
>
> Running a couple extra servers for this purpose is obnoxious from an
> environmental standpoint. That's IBM's green credentials out the window
> if you ask me.
>
> I would note under those rules running a Nagios, Zabbix etc. client on
> the nodes is not permitted either. I would suggest that most sites would
> be rather unhappy about that :-)
>
>
>> I don't know if Lenovo have different point if view.
>>
>
> Problem is when I ring up for support on my DSS-G I speak to an IBM
> employee not a Lenovo one :-)
>
>
> JAB.
>
> --
> Jonathan A. Buzzard                         Tel: +44141-5483420
> HPC System Administrator, ARCHIE-WeSt.
> University of Strathclyde, John Anderson Building, Glasgow. G4 0NG
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org
>
http://gpfsug.org/mailman/listinfo/gpfsug-discuss

>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20201003/b5de5e36/attachment-0002.htm>

From luis.bolinches at fi.ibm.com  Sat Oct  3 12:19:36 2020
From: luis.bolinches at fi.ibm.com (Luis Bolinches)
Date: Sat, 3 Oct 2020 11:19:36 +0000
Subject: [gpfsug-discuss] Services on DSS/ESS nodes
In-Reply-To: <OF37F2C74A.D272F48C-ON002585F6.003BF987-1601722505378@notes.na.collabserv.com>
Message-ID: <OFBC4CB60E.7F91267F-ON002585F6.003E3808-1601723976459@notes.na.collabserv.com>

Are you mixing those ESS DSS in the same cluster? Or you are only running DSS


https://www.ibm.com/support/knowledgecenter/SSYSP8/gnrfaq.html?view=kc#supportqs__building

Mixing DSS and ESS in the same cluster is not a supported configuration.

You really need to talk with Lenovo as is your vendor. The fact that in your region your support is being given by an IBMer or not is not a relevant point. High enough in the chain always will end at IBM on any region as GNR is IBM tech for 17 years (yes 17) so if weird enough even on regions where Lenovo might do even third level it might end on development and/or research. But that is a Lenovo/IBM agreement not you and IBM. 

So please get the support statement from Lenovo about this and pls share it if you want/can so we all learn their position. 

Thanks. 


--
Cheers

> On 3. Oct 2020, at 13.55, Andrew Beattie <abeattie at au1.ibm.com> wrote:
> 
> ?
> Why do you need to run any kind of monitoring client on an IO server the GUI / performance monitor already does all of that work for you and collects the data on the dedicated EMS server.
> 
> If you have a small storage environment the. Yes the processor and memory may feel like overkill, but tuned appropriately an IO server will use all the memory you can give it to drive IO performance, 
> 
> If you want to run a hybrid / non standard architecture then the IBM ESS / DGSS platform may not be the right platform in comparison to a build your own architecture, how ever you then take all the support issues onto your self rather than it being the vendors problem. 
> 
> Sent from my iPhone
> 
> > On 3 Oct 2020, at 20:06, Jonathan Buzzard <jonathan.buzzard at strath.ac.uk> wrote:
> > 
> > On 02/10/2020 23:19, Andrew Beattie wrote:
> >> Jonathan,
> >> I suggest you get a formal statement from Lenovo as the DSS-G Platform 
> >> is no longer an IBM platform.
> >> 
> >> But for ESS based platforms the answer would be, it is not supported to 
> >> run anything on the IO Servers other than GNR and the relevant Scale 
> >> management services, due to the fact that if you lose an IO Server, or 
> >> if you in an extended maintenance window the Server needs to host all 
> >> the work that would be being performed by both IO servers.
> >> 
> > 
> > In the past ~500 days the Infiniband to Ethernet gateway has shifted 
> > ~13GB of data, or about 25MB a day. Meanwhile in the last 470 days the 
> > DSS-G nodes have each shifted several PB. The proposed additional 
> > traffic is a drop in the ocean.
> > 
> > On my actual routers which shift much more data (over 300TB externally) 
> > with an uptime of ~180 days at the moment the CPU time consumed by 
> > keepalived is just under 31 minutes or about 8 seconds a day. These are 
> > much punier CPU's too. The proposed additional CPU usage is another drop 
> > in the ocean.
> > 
> > Given Lenovo sold the *same* configuration with x3650's and SR650's the 
> > "need all the CPU grunt" is somewhat fishy. Between the bid being 
> > submitted and actual tender award the SR650's came out and we paid a bit 
> > extra to uplift to the newer server hardware with exactly the same disk 
> > configuration. I believe IBM have done the same with the ESS/GNR servers 
> > too over time the same applies there too.
> > 
> > IMHO given keepalived is a base RHEL package, IBM/Lenovo should be 
> > offering running Infiniband to Ethernet gateways on the DSS/ESS nodes as 
> > a supported configuration for mixed network technology clusters :-)
> > 
> > Running a couple extra servers for this purpose is obnoxious from an 
> > environmental standpoint. That's IBM's green credentials out the window 
> > if you ask me.
> > 
> > I would note under those rules running a Nagios, Zabbix etc. client on 
> > the nodes is not permitted either. I would suggest that most sites would 
> > be rather unhappy about that :-)
> > 
> > 
> >> I don't know if Lenovo have different point if view.
> >> 
> > 
> > Problem is when I ring up for support on my DSS-G I speak to an IBM 
> > employee not a Lenovo one :-)
> > 
> > 
> > JAB.
> > 
> > -- 
> > Jonathan A. Buzzard Tel: +44141-5483420
> > HPC System Administrator, ARCHIE-WeSt.
> > University of Strathclyde, John Anderson Building, Glasgow. G4 0NG
> > _______________________________________________
> > gpfsug-discuss mailing list
> > gpfsug-discuss at spectrumscale.org
> > http://gpfsug.org/mailman/listinfo/gpfsug-discuss 
> > 
> 

Ellei edell? ole toisin mainittu: / Unless stated otherwise above:
Oy IBM Finland Ab
PL 265, 00101 Helsinki, Finland
Business ID, Y-tunnus: 0195876-3 
Registered in Finland

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20201003/7806094c/attachment-0002.htm>

From jonathan.buzzard at strath.ac.uk  Sat Oct  3 18:16:33 2020
From: jonathan.buzzard at strath.ac.uk (Jonathan Buzzard)
Date: Sat, 3 Oct 2020 18:16:33 +0100
Subject: [gpfsug-discuss] Services on DSS/ESS nodes
In-Reply-To: <OF37F2C74A.D272F48C-ON002585F6.003BF987-1601722505378@notes.na.collabserv.com>
References: <OF37F2C74A.D272F48C-ON002585F6.003BF987-1601722505378@notes.na.collabserv.com>
Message-ID: <d25eba8f-244b-f610-a49f-8826e77ec6b6@strath.ac.uk>

On 03/10/2020 11:55, Andrew Beattie wrote:
> Why do you need to run any kind of monitoring client on an IO server the 
> GUI / performance monitor already does all of that work for you and 
> collects the data on the dedicated EMS server.

Because any remotely sensible admin demands a single pane service 
monitoring system. If I have to look at A for everything but my DSS-G 
and B for my DSS-G that's an epic fail.

I often feel there is a huge disjuncture between the people that develop 
systems and those that look after them; they think the world revolves 
around them. It is clear this is one of those  cases.

> 
> If you have a small storage environment the. Yes the processor and 
> memory may feel like overkill, but tuned appropriately an IO server will 
> use all the memory you can give it to drive IO performance,

Right but the SR650's came with not only more CPU but more RAM than the 
x3650's. In which case why only 192GB of RAM? The SR650's can take much 
more than that. Why not 384GB of RAM :-) Right now we have a shade over 
50GB of RAM being unused. Been way for like ever because we naughtily 
have a influx DB client setup for a Grafana dashboard. We also 
presumably naughtily have remote syslog to Splunk.

> 
> If you want to run a hybrid / non standard architecture then the IBM ESS 
> / DGSS platform may not be the right platform in comparison to a build 
> your own architecture, how ever you then take all the support issues 
> onto your self rather than it being the vendors problem.
> 
I don't see anything that says you can't have some clients ethernet 
connected and some Infiniband connected. That of course requires a 
gateway, and the most logical place to put it is on the ESS or DSS nodes 
IMHO. I will see what Lenovo has to say, but looks like the IBM position 
is decidedly let's burn the planet, who gives a dam.


JAB.

-- 
Jonathan A. Buzzard                         Tel: +44141-5483420
HPC System Administrator, ARCHIE-WeSt.
University of Strathclyde, John Anderson Building, Glasgow. G4 0NG


From jonathan.buzzard at strath.ac.uk  Sat Oct  3 18:16:39 2020
From: jonathan.buzzard at strath.ac.uk (Jonathan Buzzard)
Date: Sat, 3 Oct 2020 18:16:39 +0100
Subject: [gpfsug-discuss] Services on DSS/ESS nodes
In-Reply-To: <OFBC4CB60E.7F91267F-ON002585F6.003E3808-1601723976459@notes.na.collabserv.com>
References: <OFBC4CB60E.7F91267F-ON002585F6.003E3808-1601723976459@notes.na.collabserv.com>
Message-ID: <d475e10b-2bea-aac3-3031-3c6dc348ce86@strath.ac.uk>

On 03/10/2020 12:19, Luis Bolinches wrote:
> Are you mixing those ESS DSS in the same cluster? Or you are only 
> running DSS
> 

Only running DSS. We are too far down the rabbit hole to ever switch to 
ESS now.

> 
> Mixing DSS and ESS in the same cluster is not a supported configuration.
>

I know, it means you can never ever migrate your storage from DSS to ESS 
without a full backup and restore. Who with any significant amount of 
storage is going to want to do that? The logic behind this escapes me, 
or perhaps in that scenario IBM might relax the rules for the migration 
period.


> You really need to talk with Lenovo as is your vendor. The fact that in 
> your region your support is being given by an IBMer or not is not a 
> relevant point. High enough in the chain always will end at IBM on any 
> region as GNR is IBM tech for 17 years (yes 17) so if weird enough even 
> on regions where Lenovo might do even third level it might end on 
> development and/or research. But that is a Lenovo/IBM agreement not you 
> and IBM.
> 
> So please get the support statement from Lenovo about this and pls share 
> it if you want/can so we all learn their position.
> 

Will attempt that, though I do think it should be a supported config out 
the box :-)


JAB.

-- 
Jonathan A. Buzzard                         Tel: +44141-5483420
HPC System Administrator, ARCHIE-WeSt.
University of Strathclyde, John Anderson Building, Glasgow. G4 0NG


From luis.bolinches at fi.ibm.com  Sun Oct  4 10:29:34 2020
From: luis.bolinches at fi.ibm.com (Luis Bolinches)
Date: Sun, 4 Oct 2020 09:29:34 +0000
Subject: [gpfsug-discuss] Services on DSS/ESS nodes
In-Reply-To: <d475e10b-2bea-aac3-3031-3c6dc348ce86@strath.ac.uk>
Message-ID: <OF078DF2EA.0852E4DE-ON002585F7.00342558-1601803774728@notes.na.collabserv.com>

Hi

As stated on the same link you can do remote mounts from each other and be a supported setup. 

? You can use the remote mount feature of IBM Spectrum Scale to share file system data across clusters.?

--
Cheers

> On 3. Oct 2020, at 20.16, Jonathan Buzzard <jonathan.buzzard at strath.ac.uk> wrote:
> 
> ?On 03/10/2020 12:19, Luis Bolinches wrote:
>> Are you mixing those ESS DSS in the same cluster? Or you are only
>> running DSS
> 
> Only running DSS. We are too far down the rabbit hole to ever switch to 
> ESS now.
> 
>> Mixing DSS and ESS in the same cluster is not a supported configuration.
> 
> I know, it means you can never ever migrate your storage from DSS to ESS 
> without a full backup and restore. Who with any significant amount of 
> storage is going to want to do that? The logic behind this escapes me, 
> or perhaps in that scenario IBM might relax the rules for the migration 
> period.
> 
> 
>> You really need to talk with Lenovo as is your vendor. The fact that in
>> your region your support is being given by an IBMer or not is not a
>> relevant point. High enough in the chain always will end at IBM on any
>> region as GNR is IBM tech for 17 years (yes 17) so if weird enough even
>> on regions where Lenovo might do even third level it might end on
>> development and/or research. But that is a Lenovo/IBM agreement not you
>> and IBM.
>> So please get the support statement from Lenovo about this and pls share
>> it if you want/can so we all learn their position.
> 
> Will attempt that, though I do think it should be a supported config out 
> the box :-)
> 
> 
> JAB.
> 
> -- 
> Jonathan A. Buzzard                         Tel: +44141-5483420
> HPC System Administrator, ARCHIE-WeSt.
> University of Strathclyde, John Anderson Building, Glasgow. G4 0NG
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss

Ellei edell? ole toisin mainittu: / Unless stated otherwise above:
Oy IBM Finland Ab
PL 265, 00101 Helsinki, Finland
Business ID, Y-tunnus: 0195876-3 
Registered in Finland

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20201004/43585e71/attachment-0002.htm>

From jonathan.buzzard at strath.ac.uk  Sun Oct  4 11:17:30 2020
From: jonathan.buzzard at strath.ac.uk (Jonathan Buzzard)
Date: Sun, 4 Oct 2020 11:17:30 +0100
Subject: [gpfsug-discuss] Services on DSS/ESS nodes
In-Reply-To: <OF078DF2EA.0852E4DE-ON002585F7.00342558-1601803774728@notes.na.collabserv.com>
References: <OF078DF2EA.0852E4DE-ON002585F7.00342558-1601803774728@notes.na.collabserv.com>
Message-ID: <7ef58f7d-1c70-97d7-100d-395c403d6199@strath.ac.uk>

On 04/10/2020 10:29, Luis Bolinches wrote:
> Hi
> 
> As stated on the same link you can do remote mounts from each other and 
> be a supported setup.
> 
> ??You can use the remote mount feature of IBM Spectrum Scale to share 
> file system data across clusters.?
> 

You can, but imagine I have a DSS-G cluster, with 2PB of storage on it 
which is quite modest in 2020. It is now end of life and for whatever 
reason I decide I want to move to ESS instead.

What any sane storage admin want to do at this stage is set the ESS, add 
the ESS nodes to the existing cluster on the DSS-G then do a bit of 
mmadddisk/mmdeldisk and sit back while the data is seemlessly moved from 
the DSS-G to the ESS. Admittedly this might take a while :-)

Then once all the data is moved a bit of mmdelnode and bingo the storage 
has been migrated from DSS-G to ESS with zero downtime.

As that is not allowed for what I presume are commercial reasons (you 
could do it in reverse and presumable that is what IBM don't want) then 
once you are down the rabbit hole of one type of storage the you are not 
going to switch to a different one.

You need to look at it from the perspective of the users. They frankly 
could not give a monkeys what storage solution you are using. All they 
care about is having usable storage and large amounts of downtime to 
switch from one storage type to another is not really acceptable.


JAB.

-- 
Jonathan A. Buzzard                         Tel: +44141-5483420
HPC System Administrator, ARCHIE-WeSt.
University of Strathclyde, John Anderson Building, Glasgow. G4 0NG


From olaf.weiser at de.ibm.com  Mon Oct  5 07:19:40 2020
From: olaf.weiser at de.ibm.com (Olaf Weiser)
Date: Mon, 5 Oct 2020 06:19:40 +0000
Subject: [gpfsug-discuss] Services on DSS/ESS nodes
In-Reply-To: <7ef58f7d-1c70-97d7-100d-395c403d6199@strath.ac.uk>
References: <7ef58f7d-1c70-97d7-100d-395c403d6199@strath.ac.uk>,
	<OF078DF2EA.0852E4DE-ON002585F7.00342558-1601803774728@notes.na.collabserv.com>
Message-ID: <OF7BE40A26.30F06268-ON002585F8.001E36FC-002585F8.0022C2BE@notes.na.collabserv.com>

An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20201005/47f72836/attachment-0002.htm>

From jordi.caubet at es.ibm.com  Mon Oct  5 07:27:39 2020
From: jordi.caubet at es.ibm.com (Jordi Caubet Serrabou)
Date: Mon, 5 Oct 2020 06:27:39 +0000
Subject: [gpfsug-discuss] Services on DSS/ESS nodes
In-Reply-To: <OF7BE40A26.30F06268-ON002585F8.001E36FC-002585F8.0022C2BE@notes.na.collabserv.com>
Message-ID: <OF2E966B1D.F37B6B20-ON002585F8.00237D51-1601879259249@notes.na.collabserv.com>

?Coming to the routing point, is there any reason why you need it ? I mean, this is because GPFS trying to connect between compute nodes or a reason outside GPFS scope ?
If the reason is GPFS,  imho best approach - without knowledge of the licensing you have - would be to use separate clusters: a storage cluster and two compute clusters.

Both compute clusters join using multicluster setup the storage cluster. There is no need both compute clusters see each other, they only need to see the storage cluster. One of the clusters using the 10G, the other cluster using the IPoIB interface.
You need at least three quorum nodes in each compute cluster but if licensing is per drive on the DSS, it is covered.
--
Jordi Caubet Serrabou
IBM Software Defined Infrastructure (SDI) and Flash Technical Sales Specialist
Technical Computing and HPC IT Specialist and Architect
Ext. Phone: (+34) 679.79.17.84 (internal 55834)
E-mail: jordi.caubet at es.ibm.com

> On 5 Oct 2020, at 08:19, Olaf Weiser <olaf.weiser at de.ibm.com> wrote:
> 
> ?
> let me add a few comments from some very successful large installations in Eruope
>  
> # InterOP
> Even though (as Luis pointed to) , there is no support statement to run intermix DSS/ESS in general, it was ~, and is, and will be, ~ allowed for short term purposes, such as e.g migration.
> The reason to not support those DSS/ESS mixed configuration in general is simply driven by the fact, that different release version of DSS/ESS potentially (not in every release, but sometimes)  comes with different driver levels, (e.g. MOFED), OS, RDMA-settings, GPFS tuning,  etc...
> Those changes can have an impact/multiple impacts and therefore, we do not support that in general. Of course -and this would be the advice for every one - if you are faced the need to run a mixed configuration for e.g. a migration and/or e.g. cause of you need to temporary provide space etc... contact you IBM representative and settle to plan that accordingly..
> There will be (likely) some additional requirements/dependencies defined  like  driver versions, OS,  and/or Scale versions, but you'll get a chance to run mixed configuration - temporary limited to your specific scenario.
>  
> # Monitoring
> No doubt, monitoring is essential and absolutely needed. - and/but - IBM wants customers to be very sensitive, what kind of additional software (=workload) gets installed on the ESS-IO servers. BTW, this rule applies as well to any other important GPFS node with special roles (e.g. any other NSD server etc)
> But given the fact, that customer's usually manage and monitor their server farms from a central point of control (any 3rd party software), it is common/ best practice , that additionally monitor software(clients/endpoints) has to run on GPFS nodes, so as on ESS nodes too.
>  
> If that way of acceptance applies for DSS too, you may want to double check with Lenovo ?!
>  
>  
> #additionally GW functions
> It would be a hot iron, to general allow routing on IO nodes. Similar to the mixed support approach, the field variety for such a statement would be hard(==impossible) to manage. As we all agree, additional network traffic can (and in fact will) impact GPFS.
> In your special case, the expected data rates seems to me more than ok and acceptable to go with your suggested config (as long workloads remain on that level / monitor it accordingly as you are already obviously doing) 
> Again,to be on the safe side.. contact your IBM representative and I'm sure you 'll find a way..
>  
>  
>  
> kind regards....
> olaf
>  
>  
> ----- Original message -----
> From: Jonathan Buzzard <jonathan.buzzard at strath.ac.uk>
> Sent by: gpfsug-discuss-bounces at spectrumscale.org
> To: gpfsug-discuss at spectrumscale.org
> Cc:
> Subject: [EXTERNAL] Re: [gpfsug-discuss] Services on DSS/ESS nodes
> Date: Sun, Oct 4, 2020 12:17 PM
>  
> On 04/10/2020 10:29, Luis Bolinches wrote:
> > Hi
> >
> > As stated on the same link you can do remote mounts from each other and
> > be a supported setup.
> >
> > ? You can use the remote mount feature of IBM Spectrum Scale to share
> > file system data across clusters.?
> >
> 
> You can, but imagine I have a DSS-G cluster, with 2PB of storage on it
> which is quite modest in 2020. It is now end of life and for whatever
> reason I decide I want to move to ESS instead.
> 
> What any sane storage admin want to do at this stage is set the ESS, add
> the ESS nodes to the existing cluster on the DSS-G then do a bit of
> mmadddisk/mmdeldisk and sit back while the data is seemlessly moved from
> the DSS-G to the ESS. Admittedly this might take a while :-)
> 
> Then once all the data is moved a bit of mmdelnode and bingo the storage
> has been migrated from DSS-G to ESS with zero downtime.
> 
> As that is not allowed for what I presume are commercial reasons (you
> could do it in reverse and presumable that is what IBM don't want) then
> once you are down the rabbit hole of one type of storage the you are not
> going to switch to a different one.
> 
> You need to look at it from the perspective of the users. They frankly
> could not give a monkeys what storage solution you are using. All they
> care about is having usable storage and large amounts of downtime to
> switch from one storage type to another is not really acceptable.
> 
> 
> JAB.
> 
> --
> Jonathan A. Buzzard                         Tel: +44141-5483420
> HPC System Administrator, ARCHIE-WeSt.
> University of Strathclyde, John Anderson Building, Glasgow. G4 0NG
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss 
>  
>  
> 
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
> 

Salvo indicado de otro modo m?s arriba / Unless stated otherwise above:
International Business Machines, S.A.
Santa Hortensia, 26-28, 28002 Madrid
Registro Mercantil de Madrid; Folio 1; Tomo 1525; Hoja M-28146
CIF A28-010791

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20201005/1846e9d6/attachment-0002.htm>

From S.J.Thompson at bham.ac.uk  Mon Oct  5 09:40:56 2020
From: S.J.Thompson at bham.ac.uk (Simon Thompson)
Date: Mon, 5 Oct 2020 08:40:56 +0000
Subject: [gpfsug-discuss] Services on DSS/ESS nodes
In-Reply-To: <38bbcd24-4cc2-2abe-754f-3017d71c1eaa@strath.ac.uk>
References: <38bbcd24-4cc2-2abe-754f-3017d71c1eaa@strath.ac.uk>
Message-ID: <73AEC0BB-C67B-4863-A44F-6D32FDA56FEB@bham.ac.uk>

>    I now need to check IBM are not going to throw a wobbler down the line 
>    if I need to get support before deploying it to the DSS-G nodes :-)

I know there were a lot of other emails about this ...

I think you maybe want to be careful doing this. Whilst it might work when you setup the DSS-G like this, remember that the memory usage you are seeing at this point in time may not be what you always need. For example if you fail-over the recovery groups, you need to have enough free memory to handle this. E.g. a node failure, or more likely you are upgrading the building blocks.

Personally I wouldn't run other things like this on my DSS-G storage nodes. We do run e.g. nrpe monitoring to collect and report faults, but this is pretty lightweight compared to everything else. They even removed support for running the gui packages on the IO nodes - the early DSS-G builds used the IO nodes for this, but now you need separate systems for this.

Simon


From jonathan.buzzard at strath.ac.uk  Mon Oct  5 12:44:48 2020
From: jonathan.buzzard at strath.ac.uk (Jonathan Buzzard)
Date: Mon, 5 Oct 2020 12:44:48 +0100
Subject: [gpfsug-discuss] Services on DSS/ESS nodes
In-Reply-To: <73AEC0BB-C67B-4863-A44F-6D32FDA56FEB@bham.ac.uk>
References: <38bbcd24-4cc2-2abe-754f-3017d71c1eaa@strath.ac.uk>
	<73AEC0BB-C67B-4863-A44F-6D32FDA56FEB@bham.ac.uk>
Message-ID: <905a0bdb-b6a1-90e4-bf57-ed8edae6fb7c@strath.ac.uk>


On 05/10/2020 07:27, Jordi Caubet Serrabou wrote:

 > ?Coming to the routing point, is there any reason why you need it ? I
 > mean, this is because GPFS trying to connect between compute nodes or
 > a reason outside GPFS scope ?
 > If the reason is GPFS,  imho best approach - without knowledge of the
 > licensing you have - would be to use separate clusters: a storage
 > cluster and two compute clusters.

The issue is that individual nodes want to talk to one another on the 
data interface. Which caught me by surprise as the cluster is set to 
admin mode central.

The admin interface runs over ethernet for all nodes on a specific VLAN 
which which is given 802.1p priority 5 (that's Voice, < 10 ms latency 
and jitter). That saved a bunch of switching and cabling as you don't 
need the extra interface for the admin traffic. The cabling already 
significantly restricts airflow for a compute rack as it is, without 
adding a whole bunch more for a barely used admin interface.

It's like the people who wrote the best practice about separate 
interface for the admin traffic know very little about networking to be 
frankly honest. This is all last century technology.

The nodes for undergraduate teaching only have a couple of 1Gb ethernet 
ports which would suck for storage usage. However they also have QDR 
Infiniband. That is because even though undergraduates can't run 
multinode jobs, on the old cluster the Lustre storage was delivered over 
Infiniband, so they got Infiniband cards.

 > Both compute clusters join using multicluster setup the storage
 > cluster. There is no need both compute clusters see each other, they
 > only need to see the storage cluster. One of the clusters using the
 > 10G, the other cluster using the IPoIB interface.
 > You need at least three quorum nodes in each compute cluster but if
 > licensing is per drive on the DSS, it is covered.

Three clusters is starting to get complicated from an admin perspective. 
The biggest issue is coordinating maintenance and keep sufficient quorum 
nodes up.

Maintenance on compute nodes is done via the job scheduler. I know some 
people think this is crazy, but it is in reality extremely elegant.

We can schedule a reboot on a node as soon as the current job has 
finished (usually used for firmware upgrades). Or we can schedule a job 
to run as root (usually for applying updates) as soon as the current job 
has finished. As such we have no way of knowing when that will be for a 
given node, and there is a potential for all three quorum nodes to be 
down at once.

Using this scheme we can seamlessly upgrade the nodes safe in the 
knowledge that a node is either busy and it's running on the current 
configuration or it has been upgraded and is running the new 
configuration. Consequently multinode jobs are guaranteed to have all 
nodes in the job running on the same configuration.

The alternative is to drain the node, but there is only a 23% chance the 
node will become available during working hours leading to a significant 
loss of compute time when doing maintenance compared to our existing 
scheme where the loss of compute time is only as long as the upgrade 
takes to install. Pretty much the only time we have idle nodes is when 
the scheduler is reserving nodes ready to schedule a multi node job.

Right now we have a single cluster with the quorum nodes being the two 
DSS-G nodes and the node used for backup. It is easy to ensure that 
quorum is maintained on these, they also all run real RHEL, where as the 
compute nodes run CentOS.


JAB.

-- 
Jonathan A. Buzzard                         Tel: +44141-5483420
HPC System Administrator, ARCHIE-WeSt.
University of Strathclyde, John Anderson Building, Glasgow. G4 0NG


From carlz at us.ibm.com  Mon Oct  5 13:09:02 2020
From: carlz at us.ibm.com (Carl Zetie - carlz@us.ibm.com)
Date: Mon, 5 Oct 2020 12:09:02 +0000
Subject: [gpfsug-discuss] Services on DSS/ESS nodes
Message-ID: <714B599F-D06D-4D03-98F3-A2BF6F7360DB@us.ibm.com>


Jordi wrote:
?Both compute clusters join using multicluster setup the storage cluster. There is no need both compute clusters see each other, they only need to see the storage cluster. One of the clusters using the 10G, the other cluster using the IPoIB interface.
You need at least three quorum nodes in each compute cluster but if licensing is per drive on the DSS, it is covered.?

As a side note: One of the reasons we designed capacity (per Disk or per TB) licensing the way we did was specifically so that you could make this kind of architectural decision on its own merits, without worrying about a licensing penalty.


Carl Zetie
Program Director
Offering Management
Spectrum Scale
----
(919) 473 3318 ][ Research Triangle Park
carlz at us.ibm.com

[signature_1243111775]


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20201005/7eb5d683/attachment-0002.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image001.png
Type: image/png
Size: 69558 bytes
Desc: image001.png
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20201005/7eb5d683/attachment-0002.png>

From carlz at us.ibm.com  Mon Oct  5 13:20:25 2020
From: carlz at us.ibm.com (Carl Zetie - carlz@us.ibm.com)
Date: Mon, 5 Oct 2020 12:20:25 +0000
Subject: [gpfsug-discuss] Services on DSS/ESS nodes
Message-ID: <288C3527-32BA-43E2-B5EF-E79CC5765424@us.ibm.com>

>> Mixing DSS and ESS in the same cluster is not a supported configuration.
>
> I know, it means you can never ever migrate your storage from DSS to ESS
> without a full backup and restore. Who with any significant amount of
> storage is going to want to do that? The logic behind this escapes me,
> or perhaps in that scenario IBM might relax the rules for the migration
> period.
>

We do indeed relax the rules temporarily for a migration.

The reasoning behind this rule is for support. Many Scale support issues - often the toughest ones - are not about a single node, but about the cluster or network as a whole. So if you have a mix of IBM systems with systems supported by an OEM (this applies to any OEM by the way, not just Lenovo) and a cluster-wide issue, who are you going to call. (Well, in practice you?re going to call IBM and we?ll do our best to help you despite limits on our knowledge of the OEM systems?).

--CZ


Carl Zetie
Program Director
Offering Management
Spectrum Scale
----
(919) 473 3318 ][ Research Triangle Park
carlz at us.ibm.com

[signature_386371469]


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20201005/8629eaed/attachment-0002.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image001.png
Type: image/png
Size: 69558 bytes
Desc: image001.png
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20201005/8629eaed/attachment-0002.png>

From jonathan.buzzard at strath.ac.uk  Mon Oct  5 14:39:12 2020
From: jonathan.buzzard at strath.ac.uk (Jonathan Buzzard)
Date: Mon, 5 Oct 2020 14:39:12 +0100
Subject: [gpfsug-discuss] Services on DSS/ESS nodes
In-Reply-To: <73AEC0BB-C67B-4863-A44F-6D32FDA56FEB@bham.ac.uk>
References: <38bbcd24-4cc2-2abe-754f-3017d71c1eaa@strath.ac.uk>
	<73AEC0BB-C67B-4863-A44F-6D32FDA56FEB@bham.ac.uk>
Message-ID: <abf37ce3-aa29-b4e1-fab6-12673b7aad67@strath.ac.uk>

On 05/10/2020 09:40, Simon Thompson wrote:
>> I now need to check IBM are not going to throw a wobbler down the
>> line if I need to get support before deploying it to the DSS-G
>> nodes :-)
> 
> I know there were a lot of other emails about this ...
> 
> I think you maybe want to be careful doing this. Whilst it might work
> when you setup the DSS-G like this, remember that the memory usage
> you are seeing at this point in time may not be what you always need.
> For example if you fail-over the recovery groups, you need to have
> enough free memory to handle this. E.g. a node failure, or more
> likely you are upgrading the building blocks.

I think there is a lack of understanding on exactly how light weight 
keepalived is.

It's the same code as on my routers which are admittedly different CPU's 
(MIPS to be precise) but memory usage (taking out shared memory usage - 
libc for example is loaded anyway) is under 200KB. A bash shell uses 
more memory...

> 
> Personally I wouldn't run other things like this on my DSS-G storage
> nodes. We do run e.g. nrpe monitoring to collect and report faults,
> but this is pretty lightweight compared to everything else. They even
> removed support for running the gui packages on the IO nodes - the
> early DSS-G builds used the IO nodes for this, but now you need
> separate systems for this.
> 

And keepalived is in the same range as nrpe, which you do run :-) I have 
seen nrpe get out of hand and consume significant amounts of resources 
on a machine; the machine was ground to halt due to nrpe. One of the 
standard plugins was failing and sitting their busy waiting. Every five 
minutes it ran again. It of course decided to wait till ~7pm on a Friday 
to go wonky. By mid morning on Saturday it was virtually unresponsive, 
several minutes to get a shell...

I would note that you can run keepalived quite happily on an Ubiquiti 
EdgeRouter X which has a dual core 880 MHz MIPS CPU with 256MB of RAM. 
Mikrotik have models with similar specs that run it too.

On a dual Xeon Gold 6142 machine the usage of RAM and CPU by keepalived 
is noise.


JAB.

-- 
Jonathan A. Buzzard                         Tel: +44141-5483420
HPC System Administrator, ARCHIE-WeSt.
University of Strathclyde, John Anderson Building, Glasgow. G4 0NG


From committee at io500.org  Thu Oct  1 17:40:00 2020
From: committee at io500.org (committee at io500.org)
Date: Thu, 01 Oct 2020 10:40:00 -0600
Subject: [gpfsug-discuss] IO500 SC20 Call for Submission
Message-ID: <4a20ed6ae985a25c69d953e1ea633d62@io500.org>

CALL FOR IO500 SUBMISSION

Deadline: 30 October 2020 AoE 

Stabilization period: 1st October -- 9th October 2020 AoE 

The IO500 [1] is now accepting and encouraging submissions for the
upcoming 7th IO500 list, to be revealed at the IO500 Virtual BOF during
SC20. Once again, we are also accepting submissions to the 10 Node I/O
Challenge to encourage submission of small scale results. The new ranked
lists will be announced at our Virtual SC20 BoF. We hope to see you, and
your results, there.  

A new change for the upcoming submission procedure is the introduction
of a stabilization period that aims to harden the benchmark. The final
benchmark is released at the end of this period. During the
stabilization we encourage the community to test the proper execution of
the benchmark and provide us with feedback. We will apply bug fixes to
the code base and expect that results obtained will be valid as full
submission. We also continue with another list for the Student Cluster
Competition, since IO500 is used during this competition. 

Also new this year is that we have partnered with Anthony Kougkas' team
at Illinois Institute of Technology to evaluate the submission metadata
describing the storage system on which the test was run to improve the
quality and usefulness of the data IO500 collects. You may be contacted
by one of his students to clarify one or more of the metadata items from
your submission(s). We would appreciate, but do not require, your
cooperation to help improve the submission metadata quality. Results
from their work will be fed back to improve our submission process for
future lists. 

The IO500 benchmark suite is designed to be easy to run, and the
community has multiple active support channels to help with any
questions. Please submit results from your system, and we look forward
to seeing many of you at SC20! Please note that submissions of all sizes
are welcome, including multiple submissions from different storage
systems/tiers at a single site.  The website has customizable sorting so
it is possible to submit on a small system and still get a very good
per-client score, for example. Additionally, the list is about much more
than just the raw rank; all submissions help the community by collecting
and publishing a wider corpus of data. More details below. 

Following the success of the Top500 in collecting and analyzing
historical trends in supercomputer technology and evolution, the IO500
[1] was created in 2017, published its first list at SC17, and has grown
continuously since then. The need for such an initiative has long been
known within High-Performance Computing; however, defining appropriate
benchmarks had long been challenging. Despite this challenge, the
community, after long and spirited discussion, finally reached consensus
on a suite of benchmarks and a metric for resolving the scores into a
single ranking. 

The multi-fold goals of the benchmark suite are as follows: 

 	* Maximizing simplicity in running the benchmark suite
 	* Encouraging complexity in tuning for performance
 	* Allowing submitters to highlight their "hero run" performance
numbers
 	* Forcing submitters to simultaneously report performance for
challenging IO patterns.

Specifically, the benchmark suite includes a hero-run of both IOR and
mdtest configured however possible to maximize performance and establish
an upper-bound for performance. It also includes an IOR and mdtest run
with highly prescribed parameters in an attempt to determine a
lower-bound on the performance. Finally, it includes a namespace search,
as this has been determined to be a highly sought-after feature in HPC
storage systems that have historically not been well-measured.
Submitters are encouraged to share their tuning insights for
publication. 

The goals of the community are also multi-fold: 

 	* Gather historical data for the sake of analysis and to aid
predictions of storage futures
 	* Collect tuning information to share valuable performance
optimizations across the community
 	* Encourage vendors and designers to optimize for workloads beyond
"hero runs"
 	* Establish bounded expectations for users, procurers, and
administrators

10 NODE I/O CHALLENGE

The 10 Node Challenge is conducted using the regular IO500 benchmark,
however, with the rule that exactly 10 client nodes must be used to run
the benchmark. You may use any shared storage with, e.g., any number of
servers. When submitting for the IO500 list, you can opt-in for
"Participate in the 10 compute node challenge only", then we will not
include the results into the ranked list. Other 10-node node submissions
will be included in the full list and in the ranked list. We will
announce the result in a separate derived list and in the full list but
not on the ranked IO500 list at https://io500.org/ [2]  

BIRDS-OF-A-FEATHER

Once again, we encourage you to submit [1], to join our community, and
to attend our virtual BoF "The IO500 and the Virtual Institute of I/O"
at SC20, where we will announce the new IO500 list, the 10 node
challenge list, and the Student Cluster Competition list. We look
forward to answering any questions or concerns you might have. 

 	* [1] http://www.vi4io.org/io500/submission [3] 

Thanks, 

The IO500 Committee <committee at io500.org> 

 
Links:
------
[1] http://io500.org/
[2] https://io500.org/
[3] http://www.vi4io.org/io500/submission
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20201001/7611b021/attachment-0002.htm>

From valdis.kletnieks at vt.edu  Wed Oct  7 00:45:46 2020
From: valdis.kletnieks at vt.edu (Valdis Kl=?utf-8?Q?=c4=93?=tnieks)
Date: Tue, 06 Oct 2020 19:45:46 -0400
Subject: [gpfsug-discuss] Services on DSS/ESS nodes
In-Reply-To: <OF37F2C74A.D272F48C-ON002585F6.003BF987-1601722505378@notes.na.collabserv.com>
References: <OF37F2C74A.D272F48C-ON002585F6.003BF987-1601722505378@notes.na.collabserv.com>
Message-ID: <138651.1602027946@turing-police>

On Sat, 03 Oct 2020 10:55:05 -0000, "Andrew Beattie" said:

> Why do you need to run any kind of monitoring client on an IO server the
> GUI / performance monitor already does all of that work for you and
> collects the data on the dedicated EMS server.

Does *ALL* that work for me?

Will it toss you an alert if your sshd goes away, or if somebody's tossing
packets that iptables is blocking for good reasons, or any of the many other
things that a competent sysadmin wants to be alerted on that aren't GPFS, but
which are things that Nagios and Zabbix and similar tools were invented
to track?


-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 832 bytes
Desc: not available
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20201006/822e7ca6/attachment-0002.sig>

From S.J.Thompson at bham.ac.uk  Wed Oct  7 11:28:55 2020
From: S.J.Thompson at bham.ac.uk (Simon Thompson)
Date: Wed, 7 Oct 2020 10:28:55 +0000
Subject: [gpfsug-discuss] Services on DSS/ESS nodes
In-Reply-To: <138651.1602027946@turing-police>
References: <OF37F2C74A.D272F48C-ON002585F6.003BF987-1601722505378@notes.na.collabserv.com>
	<138651.1602027946@turing-police>
Message-ID: <FB188F2A-6E9D-49C5-8663-87547FD35BE3@bham.ac.uk>

Agreed ...

Report to me a pdisk is failing in my monitoring dashboard we use for *everything else*.
Tell me that kswapd is having one of those days.
Tell me rsyslogd has stopped sending for some reason.
Tell me if there are long waiters on the hosts.
Read the ipmi status of the host to tell me an OS drive is failed, or the CMOS battery is flat or ...

Whilst the GUI has a bunch of this stuff, in the real world the rest of us have reporting and dashboarding from many more systems...

Simon

?On 07/10/2020, 00:45, "gpfsug-discuss-bounces at spectrumscale.org on behalf of Valdis Kl?tnieks" <gpfsug-discuss-bounces at spectrumscale.org on behalf of valdis.kletnieks at vt.edu> wrote:

    On Sat, 03 Oct 2020 10:55:05 -0000, "Andrew Beattie" said:

    > Why do you need to run any kind of monitoring client on an IO server the
    > GUI / performance monitor already does all of that work for you and
    > collects the data on the dedicated EMS server.

    Does *ALL* that work for me?

    Will it toss you an alert if your sshd goes away, or if somebody's tossing
    packets that iptables is blocking for good reasons, or any of the many other
    things that a competent sysadmin wants to be alerted on that aren't GPFS, but
    which are things that Nagios and Zabbix and similar tools were invented
    to track?


From jonathan.buzzard at strath.ac.uk  Wed Oct  7 13:14:45 2020
From: jonathan.buzzard at strath.ac.uk (Jonathan Buzzard)
Date: Wed, 7 Oct 2020 13:14:45 +0100
Subject: [gpfsug-discuss] Services on DSS/ESS nodes
In-Reply-To: <FB188F2A-6E9D-49C5-8663-87547FD35BE3@bham.ac.uk>
References: <OF37F2C74A.D272F48C-ON002585F6.003BF987-1601722505378@notes.na.collabserv.com>
	<138651.1602027946@turing-police>
	<FB188F2A-6E9D-49C5-8663-87547FD35BE3@bham.ac.uk>
Message-ID: <e8cf9330-cf1f-e7ad-777e-6aeaade9b29b@strath.ac.uk>

On 07/10/2020 11:28, Simon Thompson wrote:
> Agreed ...
> 
> Report to me a pdisk is failing in my monitoring dashboard we use for *everything else*.
> Tell me that kswapd is having one of those days.
> Tell me rsyslogd has stopped sending for some reason.
> Tell me if there are long waiters on the hosts.
> Read the ipmi status of the host to tell me an OS drive is failed, or the CMOS battery is flat or ...
> 
> Whilst the GUI has a bunch of this stuff, in the real world the rest of us have reporting and dashboarding from many more systems...
> 

The problem is the developers know as much about looking after a system 
in the real world as a tea leaf knows the history of the East India 
Company. IMHO to even ask the question shows a total lack of 
understanding of the issue.

Consequently developers in their ivory towers have a habit of developing 
things that are as useful as a chocolate tea pot. Which putting it 
bluntly a competent sysadmins makes them look like a bunch of twits. I 
would note this is not a problem unique to IBM, it's developers in general.

The appropriate course of action would be not for IBM to develop a 
monitoring tool of their own but to provide a bunch of plugins for the 
popular monitoring tools that customers will already be using to monitor 
their whole IT estate.

Heaven forbid they could even run a poll to find out which ones the 
actual customers of their products are interested in rather than wasting 
effort developing software their customers are not actually interested in.

For my purposes there is I think an alternative. The actual routing of 
the IP packets is not a service, it's a kernel configuration to have the 
kernel route that packets :-) Keepalived just manages a floating IP 
address. There are other options to achieve this. They are clunkier but 
they side step IBM's silly rules.

I would however note at this point that at lots of sites all routing in 
the data centre is done using BGP. It comes in part out of the zero 
trust paradigm. I guess apparently running fail2ban is not permitted 
either. Can I even run firewalld? As you can seen a nothing else policy 
quickly becomes unsustainable IMHO.

There is a disjuncture between the developers in their ivory towers and 
the real world.


JAB.

-- 
Jonathan A. Buzzard                         Tel: +44141-5483420
HPC System Administrator, ARCHIE-WeSt.
University of Strathclyde, John Anderson Building, Glasgow. G4 0NG


From kkr at lbl.gov  Tue Oct 13 22:34:23 2020
From: kkr at lbl.gov (Kristy Kallback-Rose)
Date: Tue, 13 Oct 2020 14:34:23 -0700
Subject: [gpfsug-discuss] SC20 Planning - What questions would you ask a
	panel?
Message-ID: <B8A6C2AA-B4B0-473B-B7EC-C751F0EC9D7C@lbl.gov>

Hi all,

	By now you know SC will be digital this year. We are working towards some SC events for the Spectrum Scale User Group, and using our usual slot of Sunday did not seem like a great idea. So, we?re planning a couple 90-minute sessions and would like to do a panel during one of them. We?ll hope to do live Q&A, like an in-person Ask Me Anything session, but it?s probably a good idea to have a bank of questions ready as well, Also, that way your question may get asked, even if you can?t make the live session ?we?ll record these sessions for later viewing. 

	So, please send your questions for the panel and we?ll get a list built up. Better yet, attend the sessions live! Details to come, but for now, hold these time slots:

November 16th - 8:00 AM Pacific/3:00 PM UTC 

November 18th - 8:00 AM Pacific/3:00 PM UTC 

Best,
Kristy

Kristy Kallback-Rose
Senior HPC Storage Systems Analyst
National Energy Research Scientific Computing Center
Lawrence Berkeley National Laboratory

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20201013/4bb40577/attachment-0002.htm>

From juergen.hannappel at desy.de  Wed Oct 21 17:13:01 2020
From: juergen.hannappel at desy.de (Hannappel, Juergen)
Date: Wed, 21 Oct 2020 18:13:01 +0200 (CEST)
Subject: [gpfsug-discuss] Mounting an nfs share on a CES node
Message-ID: <1195503772.13156505.1603296781279.JavaMail.zimbra@desy.de>

Hi,
I have a CES node exporting some filesystems vis smb and ganesha in a standard CES setup.
Now I want to mount a nfs share from a different, non-CES server on this CES node.
This did not work:
mount -o -fstype=nfs4,minorversion=1,rw,rsize=65536,wsize=65536 some.other.server:/some/path /mnt/
mount.nfs: mount to NFS server 'some.other.server:/some/path' failed: RPC Error: Program unavailable

Does the CES software stack interfere with the nfs client setup? It seems that at least with
rpc-statd there is some conflict:

 systemctl status rpc-statd
? rpc-statd.service - NFS status monitor for NFSv2/3 locking.
   Loaded: loaded (/usr/lib/systemd/system/rpc-statd.service; static; vendor preset: disabled)
   Active: failed (Result: exit-code) since Wed 2020-10-21 17:48:21 CEST; 22min ago
  Process: 19896 ExecStart=/usr/sbin/rpc.statd $STATDARGS (code=exited, status=1/FAILURE)

Oct 21 17:48:21 mynode systemd[1]: Starting NFS status monitor for NFSv2/3 locking....
Oct 21 17:48:21 mynode rpc.statd[19896]: Statd service already running!
Oct 21 17:48:21 mynode systemd[1]: rpc-statd.service: control process exited, code=exited status=1
Oct 21 17:48:21 mynode systemd[1]: Failed to start NFS status monitor for NFSv2/3 locking..
Oct 21 17:48:21 mynode systemd[1]: Unit rpc-statd.service entered failed state.
Oct 21 17:48:21 mynode systemd[1]: rpc-statd.service failed.
-- 
Dr. J?rgen Hannappel  DESY/IT    Tel.  : +49 40 8998-4616


From mnaineni at in.ibm.com  Thu Oct 22 04:38:59 2020
From: mnaineni at in.ibm.com (Malahal R Naineni)
Date: Thu, 22 Oct 2020 03:38:59 +0000
Subject: [gpfsug-discuss] Mounting an nfs share on a CES node
In-Reply-To: <1195503772.13156505.1603296781279.JavaMail.zimbra@desy.de>
References: <1195503772.13156505.1603296781279.JavaMail.zimbra@desy.de>
Message-ID: <OF616CC8F9.AD263CD8-ON00258609.0012EE72-00258609.00140C7B@notes.na.collabserv.com>

An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20201022/5b5db3bb/attachment-0002.htm>

From andi at christiansen.xxx  Tue Oct 27 11:46:02 2020
From: andi at christiansen.xxx (Andi Christiansen)
Date: Tue, 27 Oct 2020 12:46:02 +0100 (CET)
Subject: [gpfsug-discuss] Alternative to Scale S3 API.
Message-ID: <1109480230.484366.1603799162955@privateemail.com>

Hi all,


We have over a longer period used the S3 API within spectrum Scale.. And that has shown that it does not support very many applications because of limitations of the API..


Has anyone got any experience with any other product we can deploy on-top of Spectrum Scale that will give us a true S3 API with full functionalities and able to answer on port 443? As of now we use HAProxy to forware ssl request back and forth from Scale S3 API.


We have looked at MinIO which seems to be fairly simple and maybe might solve a lot of incompatibilities with clients software. But the product seems to be very badly documented at least for me.

The idea is basically that a client uses their backup application(rubrik, veeam etc.) to connect to a domain(for example backup.mycompany.com) with their access and secret key and have access to their bucket only. and it must be over https/ssl.


If someone has any knowledge to minio or any other product that might solve our problem I will be glad to hear from you! ?

Thank you in advance!

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20201027/2ffb4d78/attachment-0002.htm>

From NISHAAN at za.ibm.com  Tue Oct 27 13:38:01 2020
From: NISHAAN at za.ibm.com (Nishaan Docrat)
Date: Tue, 27 Oct 2020 15:38:01 +0200
Subject: [gpfsug-discuss] Alternative to Scale S3 API.
In-Reply-To: <1109480230.484366.1603799162955@privateemail.com>
References: <1109480230.484366.1603799162955@privateemail.com>
Message-ID: <OF408A679D.4CEDC2ED-ON4225860E.00499000-4225860E.004AE49C@notes.na.collabserv.com>

Hi Andi

The current S3 compatibility in Spectrum Scale is delivered via the Swift3
middleware. This middleware has since been replaced by s3api in later
versions of Swift. Spectrum Scale 5.1 will make use of Swift Train release
which will include the new s3api middleware.

I've tested the S3 compatibility with a few applications including Spectrum
Scale itself (i.e. Cloud Data Sharing to another Scale Object store using
S3 API) and Spectrum Protect etc. and haven't had any issue. I've also ran
a few application tools to test for an S3 compliant object stores and again
had no issues.

You can use s3compat to test the current compatibility.. Or you can check
here for the current compatibility..
https://docs.openstack.org/swift/latest/s3_compat.html

Not sure if there is any other way to talk HTTPS without using HAProxy.

In any case, I've documented the process to setup an S3 compliant object
store including supporting virtual-hosted style bucket addressing which you
can find here..

https://www.linkedin.com/feed/update/urn:li:activity:6720227398756909056/

Most storage vendors including minio would not support the full S3 API
stack as alot of the calls are specific to AWS (like billing stuff etc.).

Anyway, good luck with your testing.

Kind Regards

Nishaan Docrat
Client Technical Specialist - Storage Systems
IBM Systems Hardware

Work: +27 (0)11 302 5001
Mobile: +27 (0)81 040 3793
Email: nishaan at za.ibm.com


From:	Andi Christiansen <andi at christiansen.xxx>
To:	"gpfsug-discuss at spectrumscale.org"
            <gpfsug-discuss at spectrumscale.org>
Date:	2020/10/27 13:59
Subject:	[EXTERNAL] [gpfsug-discuss] Alternative to Scale S3 API.
Sent by:	gpfsug-discuss-bounces at spectrumscale.org


Hi all,


We have over a longer period used the S3 API within spectrum Scale.. And
that has shown that it does not support very many applications because of
limitations of the API..


Has anyone got any experience with any other product we can deploy on-top
of Spectrum Scale that will give us a true S3 API with full functionalities
and able to answer on port 443? As of now we use HAProxy to forware ssl
request back and forth from Scale S3 API.


We have looked at MinIO which seems to be fairly simple and maybe might
solve a lot of incompatibilities with clients software. But the product
seems to be very badly documented at least for me.

The idea is basically that a client uses their backup application(rubrik,
veeam etc.) to connect to a domain(for example backup.mycompany.com) with
their access and secret key and have access to their bucket only. and it
must be over https/ssl.


If someone has any knowledge to minio or any other product that might solve
our problem I will be glad to hear from you! ?

Thank you in advance!
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20201027/70cc1bc2/attachment-0002.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 52733301.jpg
Type: image/jpeg
Size: 35643 bytes
Desc: not available
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20201027/70cc1bc2/attachment-0002.jpg>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: graycol.gif
Type: image/gif
Size: 105 bytes
Desc: not available
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20201027/70cc1bc2/attachment-0002.gif>

From andi at christiansen.xxx  Wed Oct 28 06:24:52 2020
From: andi at christiansen.xxx (Andi Christiansen)
Date: Wed, 28 Oct 2020 07:24:52 +0100 (CET)
Subject: [gpfsug-discuss] Alternative to Scale S3 API.
In-Reply-To: <OF408A679D.4CEDC2ED-ON4225860E.00499000-4225860E.004AE49C@notes.na.collabserv.com>
References: <1109480230.484366.1603799162955@privateemail.com>
	<OF408A679D.4CEDC2ED-ON4225860E.00499000-4225860E.004AE49C@notes.na.collabserv.com>
Message-ID: <2126571944.509878.1603866292369@privateemail.com>

Hi Nishaan,

Thanks for you reply.

When you say 5.1? is that 5.1.x.x or 5.0.5.1?

Some of the limitations we have encountered is the multipart upload not being supported and some md5sum that the s3 api does that veeam actually dont like. also interms of the management on the Scale GUI, that has to be on one of the S3 CES nodes in order to be able to show project, container etc... but when you have a HAProxy for enabling SSL then a GUI is not available as they both use port 443? 


i know min.io is not the full stack of S3 API commands but as far as i can read it comes with more features out of the box than Scale S3 does, multipart for an example...

I looked through your documentation and its very close to what we have set up today and found to not work... 

If multipart uploads would be supported today on scale S3 i would think about still using scale for the s3 part but as i expect that you talk about 5.1.x.x i dont see that being released any time soon? and dont know if that is actually going to be supported in that release then i cant wait for that to happen..

Best Regards
Andi Christiansen

>     On 10/27/2020 2:38 PM Nishaan Docrat <nishaan at za.ibm.com> wrote:
> 
> 
> 
>     Hi Andi
> 
>     The current S3 compatibility in Spectrum Scale is delivered via the Swift3 middleware. This middleware has since been replaced by s3api in later versions of Swift. Spectrum Scale 5.1 will make use of Swift Train release which will include the new s3api middleware.
> 
>     I've tested the S3 compatibility with a few applications including Spectrum Scale itself (i.e. Cloud Data Sharing to another Scale Object store using S3 API) and Spectrum Protect etc. and haven't had any issue. I've also ran a few application tools to test for an S3 compliant object stores and again had no issues.
> 
>     You can use s3compat to test the current compatibility.. Or you can check here for the current compatibility.. https://docs.openstack.org/swift/latest/s3_compat.html https://docs.openstack.org/swift/latest/s3_compat.html
> 
>     Not sure if there is any other way to talk HTTPS without using HAProxy.
> 
>     In any case, I've documented the process to setup an S3 compliant object store including supporting virtual-hosted style bucket addressing which you can find here..
> 
>     https://www.linkedin.com/feed/update/urn:li:activity:6720227398756909056/ https://www.linkedin.com/feed/update/urn:li:activity:6720227398756909056/
> 
>     Most storage vendors including minio would not support the full S3 API stack as alot of the calls are specific to AWS (like billing stuff etc.).
> 
>     Anyway, good luck with your testing.
> 
>     Kind Regards
> 
>     Nishaan Docrat
>     Client Technical Specialist - Storage Systems
>     IBM Systems Hardware
> 
>     Work: +27 (0)11 302 5001
>     Mobile: +27 (0)81 040 3793
>     Email: nishaan at za.ibm.com http://www.ibm.com/storage
> 
> 
> 
>     [Inactive hide details for Andi Christiansen ---2020/10/27 13:59:30---Hi all, We have over a longer period used the S3 API withi]Andi Christiansen ---2020/10/27 13:59:30---Hi all, We have over a longer period used the S3 API within spectrum Scale.. And that has shown that
> 
>     From: Andi Christiansen <andi at christiansen.xxx>
>     To: "gpfsug-discuss at spectrumscale.org" <gpfsug-discuss at spectrumscale.org>
>     Date: 2020/10/27 13:59
>     Subject: [EXTERNAL] [gpfsug-discuss] Alternative to Scale S3 API.
>     Sent by: gpfsug-discuss-bounces at spectrumscale.org
> 
> 
>     ---------------------------------------------
> 
> 
> 
>     Hi all,
> 
> 
> 
>     We have over a longer period used the S3 API within spectrum Scale.. And that has shown that it does not support very many applications because of limitations of the API..
> 
> 
> 
>     Has anyone got any experience with any other product we can deploy on-top of Spectrum Scale that will give us a true S3 API with full functionalities and able to answer on port 443? As of now we use HAProxy to forware ssl request back and forth from Scale S3 API.
> 
> 
> 
>     We have looked at MinIO which seems to be fairly simple and maybe might solve a lot of incompatibilities with clients software. But the product seems to be very badly documented at least for me.
> 
>     The idea is basically that a client uses their backup application(rubrik, veeam etc.) to connect to a domain(for example backup.mycompany.com) with their access and secret key and have access to their bucket only. and it must be over https/ssl.
> 
> 
> 
>     If someone has any knowledge to minio or any other product that might solve our problem I will be glad to hear from you! ?
> 
>     Thank you in advance!
>     _______________________________________________
>     gpfsug-discuss mailing list
>     gpfsug-discuss at spectrumscale.org
>     http://gpfsug.org/mailman/listinfo/gpfsug-discuss 
> 
> 
> 
>     _______________________________________________
>     gpfsug-discuss mailing list
>     gpfsug-discuss at spectrumscale.org
>     http://gpfsug.org/mailman/listinfo/gpfsug-discuss
> 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20201028/b16944e4/attachment-0002.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 52733301.jpg
Type: image/jpeg
Size: 35643 bytes
Desc: not available
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20201028/b16944e4/attachment-0002.jpg>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: graycol.gif
Type: image/gif
Size: 105 bytes
Desc: not available
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20201028/b16944e4/attachment-0002.gif>

From NISHAAN at za.ibm.com  Wed Oct 28 06:45:29 2020
From: NISHAAN at za.ibm.com (Nishaan Docrat)
Date: Wed, 28 Oct 2020 08:45:29 +0200
Subject: [gpfsug-discuss] Alternative to Scale S3 API.
In-Reply-To: <2126571944.509878.1603866292369@privateemail.com>
References: <1109480230.484366.1603799162955@privateemail.com>
	<OF408A679D.4CEDC2ED-ON4225860E.00499000-4225860E.004AE49C@notes.na.collabserv.com>
	<2126571944.509878.1603866292369@privateemail.com>
Message-ID: <OF4063F396.B7DA9D54-ON4225860F.0023CA79-4225860F.00251FA3@notes.na.collabserv.com>

Hi Andi

The s3api middleware does support multipart uploads..

https://docs.openstack.org/swift/latest/s3_compat.html

The current version of Swift (PIKE) that is bundled with Spectrum Scale
5.0.X doesn't..
https://www.ibm.com/support/knowledgecenter/en/STXKQY_5.0.5/com.ibm.spectrum.scale.v5r05.doc/bl1adm_ManagingOpenStackACLsviaAmazonS3API.htm

According to the Spectrum Scale Roadmap, 5.1 is due out 2H20.. Not sure if
someone from development can confirm the GA date.

Does Veeam have a test utility? You could always test it using the current
Swift AIO or if you can provide me with a test utility I can test that for
you.

Kind Regards

Nishaan Docrat
Client Technical Specialist - Storage Systems
IBM Systems Hardware

Work: +27 (0)11 302 5001
Mobile: +27 (0)81 040 3793
Email: nishaan at za.ibm.com


From:	Andi Christiansen <andi at christiansen.xxx>
To:	gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>,
            Nishaan Docrat <NISHAAN at za.ibm.com>
Date:	2020/10/28 08:24
Subject:	[EXTERNAL] Re: [gpfsug-discuss] Alternative to Scale S3 API.


Hi Nishaan, Thanks for you reply. When you say 5.1? is that 5.1.x.x or
5.0.5.1? Some...
                                                                            
                                                                            
                   This Message Is From an External Sender                  
                   This message came from outside your organization.        
                                                                            
                                                                            
Hi Nishaan,

Thanks for you reply.

When you say 5.1? is that 5.1.x.x or 5.0.5.1?

Some of the limitations we have encountered is the multipart upload not
being supported and some md5sum that the s3 api does that veeam actually
dont like. also interms of the management on the Scale GUI, that has to be
on one of the S3 CES nodes in order to be able to show project, container
etc... but when you have a HAProxy for enabling SSL then a GUI is not
available as they both use port 443?


i know min.io is not the full stack of S3 API commands but as far as i can
read it comes with more features out of the box than Scale S3 does,
multipart for an example...

I looked through your documentation and its very close to what we have set
up today and found to not work...

If multipart uploads would be supported today on scale S3 i would think
about still using scale for the s3 part but as i expect that you talk about
5.1.x.x i dont see that being released any time soon? and dont know if that
is actually going to be supported in that release then i cant wait for that
to happen..

Best Regards
Andi Christiansen
      On 10/27/2020 2:38 PM Nishaan Docrat <nishaan at za.ibm.com> wrote:


      Hi Andi

      The current S3 compatibility in Spectrum Scale is delivered via the
      Swift3 middleware. This middleware has since been replaced by s3api
      in later versions of Swift. Spectrum Scale 5.1 will make use of Swift
      Train release which will include the new s3api middleware.

      I've tested the S3 compatibility with a few applications including
      Spectrum Scale itself (i.e. Cloud Data Sharing to another Scale
      Object store using S3 API) and Spectrum Protect etc. and haven't had
      any issue. I've also ran a few application tools to test for an S3
      compliant object stores and again had no issues.

      You can use s3compat to test the current compatibility.. Or you can
      check here for the current compatibility..
      https://docs.openstack.org/swift/latest/s3_compat.html

      Not sure if there is any other way to talk HTTPS without using
      HAProxy.

      In any case, I've documented the process to setup an S3 compliant
      object store including supporting virtual-hosted style bucket
      addressing which you can find here..

      https://www.linkedin.com/feed/update/urn:li:activity:6720227398756909056/


      Most storage vendors including minio would not support the full S3
      API stack as alot of the calls are specific to AWS (like billing
      stuff etc.).

      Anyway, good luck with your testing.

      Kind Regards

      Nishaan Docrat
      Client Technical Specialist - Storage Systems
      IBM Systems Hardware

      Work: +27 (0)11 302 5001
      Mobile: +27 (0)81 040 3793
      Email: nishaan at za.ibm.com


      Inactive hide details for Andi Christiansen ---2020/10/27
      13:59:30---Hi all, We have over a longer period used the S3 API withi
      Andi Christiansen ---2020/10/27 13:59:30---Hi all, We have over a
      longer period used the S3 API within spectrum Scale.. And that has
      shown that

      From: Andi Christiansen <andi at christiansen.xxx>
      To: "gpfsug-discuss at spectrumscale.org"
      <gpfsug-discuss at spectrumscale.org>
      Date: 2020/10/27 13:59
      Subject: [EXTERNAL] [gpfsug-discuss] Alternative to Scale S3 API.
      Sent by: gpfsug-discuss-bounces at spectrumscale.org


      Hi all,


      We have over a longer period used the S3 API within spectrum Scale..
      And that has shown that it does not support very many applications
      because of limitations of the API..


      Has anyone got any experience with any other product we can deploy
      on-top of Spectrum Scale that will give us a true S3 API with full
      functionalities and able to answer on port 443? As of now we use
      HAProxy to forware ssl request back and forth from Scale S3 API.


      We have looked at MinIO which seems to be fairly simple and maybe
      might solve a lot of incompatibilities with clients software. But the
      product seems to be very badly documented at least for me.

      The idea is basically that a client uses their backup application
      (rubrik, veeam etc.) to connect to a domain(for example
      backup.mycompany.com) with their access and secret key and have
      access to their bucket only. and it must be over https/ssl.


      If someone has any knowledge to minio or any other product that might
      solve our problem I will be glad to hear from you! ?

      Thank you in advance!
      _______________________________________________
      gpfsug-discuss mailing list
      gpfsug-discuss at spectrumscale.org
      http://gpfsug.org/mailman/listinfo/gpfsug-discuss


      _______________________________________________
      gpfsug-discuss mailing list
      gpfsug-discuss at spectrumscale.org
      http://gpfsug.org/mailman/listinfo/gpfsug-discuss


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20201028/4c5ddab4/attachment-0002.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 19991351.jpg
Type: image/jpeg
Size: 35643 bytes
Desc: not available
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20201028/4c5ddab4/attachment-0002.jpg>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: graycol.gif
Type: image/gif
Size: 105 bytes
Desc: not available
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20201028/4c5ddab4/attachment-0002.gif>

From NISHAAN at za.ibm.com  Wed Oct 28 07:12:55 2020
From: NISHAAN at za.ibm.com (Nishaan Docrat)
Date: Wed, 28 Oct 2020 09:12:55 +0200
Subject: [gpfsug-discuss] Alternative to Scale S3 API.
In-Reply-To: <2126571944.509878.1603866292369@privateemail.com>
References: <1109480230.484366.1603799162955@privateemail.com>
	<OF408A679D.4CEDC2ED-ON4225860E.00499000-4225860E.004AE49C@notes.na.collabserv.com>
	<2126571944.509878.1603866292369@privateemail.com>
Message-ID: <OF5F31EC6E.6235CA80-ON4225860F.0027378A-4225860F.0027A28C@notes.na.collabserv.com>

Hi Andi

Sorry forgot to mention that I was told 5.1 will include the Swift Train
release (2.23). The change from swift3 middleware to s3api was done in the
Queens release (2.18)  so 5.1 will definitely have multipart support.

Kind Regards

Nishaan Docrat
Client Technical Specialist - Storage Systems
IBM Systems Hardware

Work: +27 (0)11 302 5001
Mobile: +27 (0)81 040 3793
Email: nishaan at za.ibm.com


From:	Andi Christiansen <andi at christiansen.xxx>
To:	gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>,
            Nishaan Docrat <NISHAAN at za.ibm.com>
Date:	2020/10/28 08:24
Subject:	[EXTERNAL] Re: [gpfsug-discuss] Alternative to Scale S3 API.


Hi Nishaan, Thanks for you reply. When you say 5.1? is that 5.1.x.x or
5.0.5.1? Some...
                                                                            
                                                                            
                   This Message Is From an External Sender                  
                   This message came from outside your organization.        
                                                                            
                                                                            
Hi Nishaan,

Thanks for you reply.

When you say 5.1? is that 5.1.x.x or 5.0.5.1?

Some of the limitations we have encountered is the multipart upload not
being supported and some md5sum that the s3 api does that veeam actually
dont like. also interms of the management on the Scale GUI, that has to be
on one of the S3 CES nodes in order to be able to show project, container
etc... but when you have a HAProxy for enabling SSL then a GUI is not
available as they both use port 443?


i know min.io is not the full stack of S3 API commands but as far as i can
read it comes with more features out of the box than Scale S3 does,
multipart for an example...

I looked through your documentation and its very close to what we have set
up today and found to not work...

If multipart uploads would be supported today on scale S3 i would think
about still using scale for the s3 part but as i expect that you talk about
5.1.x.x i dont see that being released any time soon? and dont know if that
is actually going to be supported in that release then i cant wait for that
to happen..

Best Regards
Andi Christiansen
      On 10/27/2020 2:38 PM Nishaan Docrat <nishaan at za.ibm.com> wrote:


      Hi Andi

      The current S3 compatibility in Spectrum Scale is delivered via the
      Swift3 middleware. This middleware has since been replaced by s3api
      in later versions of Swift. Spectrum Scale 5.1 will make use of Swift
      Train release which will include the new s3api middleware.

      I've tested the S3 compatibility with a few applications including
      Spectrum Scale itself (i.e. Cloud Data Sharing to another Scale
      Object store using S3 API) and Spectrum Protect etc. and haven't had
      any issue. I've also ran a few application tools to test for an S3
      compliant object stores and again had no issues.

      You can use s3compat to test the current compatibility.. Or you can
      check here for the current compatibility..
      https://docs.openstack.org/swift/latest/s3_compat.html

      Not sure if there is any other way to talk HTTPS without using
      HAProxy.

      In any case, I've documented the process to setup an S3 compliant
      object store including supporting virtual-hosted style bucket
      addressing which you can find here..

      https://www.linkedin.com/feed/update/urn:li:activity:6720227398756909056/


      Most storage vendors including minio would not support the full S3
      API stack as alot of the calls are specific to AWS (like billing
      stuff etc.).

      Anyway, good luck with your testing.

      Kind Regards

      Nishaan Docrat
      Client Technical Specialist - Storage Systems
      IBM Systems Hardware

      Work: +27 (0)11 302 5001
      Mobile: +27 (0)81 040 3793
      Email: nishaan at za.ibm.com


      Inactive hide details for Andi Christiansen ---2020/10/27
      13:59:30---Hi all, We have over a longer period used the S3 API withi
      Andi Christiansen ---2020/10/27 13:59:30---Hi all, We have over a
      longer period used the S3 API within spectrum Scale.. And that has
      shown that

      From: Andi Christiansen <andi at christiansen.xxx>
      To: "gpfsug-discuss at spectrumscale.org"
      <gpfsug-discuss at spectrumscale.org>
      Date: 2020/10/27 13:59
      Subject: [EXTERNAL] [gpfsug-discuss] Alternative to Scale S3 API.
      Sent by: gpfsug-discuss-bounces at spectrumscale.org


      Hi all,


      We have over a longer period used the S3 API within spectrum Scale..
      And that has shown that it does not support very many applications
      because of limitations of the API..


      Has anyone got any experience with any other product we can deploy
      on-top of Spectrum Scale that will give us a true S3 API with full
      functionalities and able to answer on port 443? As of now we use
      HAProxy to forware ssl request back and forth from Scale S3 API.


      We have looked at MinIO which seems to be fairly simple and maybe
      might solve a lot of incompatibilities with clients software. But the
      product seems to be very badly documented at least for me.

      The idea is basically that a client uses their backup application
      (rubrik, veeam etc.) to connect to a domain(for example
      backup.mycompany.com) with their access and secret key and have
      access to their bucket only. and it must be over https/ssl.


      If someone has any knowledge to minio or any other product that might
      solve our problem I will be glad to hear from you! ?

      Thank you in advance!
      _______________________________________________
      gpfsug-discuss mailing list
      gpfsug-discuss at spectrumscale.org
      http://gpfsug.org/mailman/listinfo/gpfsug-discuss


      _______________________________________________
      gpfsug-discuss mailing list
      gpfsug-discuss at spectrumscale.org
      http://gpfsug.org/mailman/listinfo/gpfsug-discuss


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20201028/0aaf9fcd/attachment-0002.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 17810834.jpg
Type: image/jpeg
Size: 35643 bytes
Desc: not available
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20201028/0aaf9fcd/attachment-0002.jpg>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: graycol.gif
Type: image/gif
Size: 105 bytes
Desc: not available
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20201028/0aaf9fcd/attachment-0002.gif>

From luis.bolinches at fi.ibm.com  Wed Oct 28 07:15:21 2020
From: luis.bolinches at fi.ibm.com (Luis Bolinches)
Date: Wed, 28 Oct 2020 07:15:21 +0000
Subject: [gpfsug-discuss] Alternative to Scale S3 API.
In-Reply-To: <OF4063F396.B7DA9D54-ON4225860F.0023CA79-4225860F.00251FA3@notes.na.collabserv.com>
Message-ID: <OF06DC9BE9.BBFA65F5-ON0025860F.0027441B-0025860F.0027DBD0@notes.na.collabserv.com>

An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20201028/91a8e7e5/attachment-0002.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: Image.1__=CDBB0C9CDFB04CE98f9e8a93df938690918cCDB at .jpg
Type: image/jpeg
Size: 35643 bytes
Desc: not available
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20201028/91a8e7e5/attachment-0004.jpg>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: Image.2__=CDBB0C9CDFB04CE98f9e8a93df938690918cCDB at .gif
Type: image/gif
Size: 105 bytes
Desc: not available
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20201028/91a8e7e5/attachment-0004.gif>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: Image.1__=CDBB0C9CDFB04CE98f9e8a93df938690918cCDB at .jpg
Type: image/jpeg
Size: 35643 bytes
Desc: not available
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20201028/91a8e7e5/attachment-0005.jpg>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: Image.2__=CDBB0C9CDFB04CE98f9e8a93df938690918cCDB at .gif
Type: image/gif
Size: 105 bytes
Desc: not available
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20201028/91a8e7e5/attachment-0005.gif>

From NISHAAN at za.ibm.com  Wed Oct 28 07:45:45 2020
From: NISHAAN at za.ibm.com (Nishaan Docrat)
Date: Wed, 28 Oct 2020 09:45:45 +0200
Subject: [gpfsug-discuss] Alternative to Scale S3 API.
In-Reply-To: <OF06DC9BE9.BBFA65F5-ON0025860F.0027441B-0025860F.0027DBD0@notes.na.collabserv.com>
References: <OF4063F396.B7DA9D54-ON4225860F.0023CA79-4225860F.00251FA3@notes.na.collabserv.com>
	<OF06DC9BE9.BBFA65F5-ON0025860F.0027441B-0025860F.0027DBD0@notes.na.collabserv.com>
Message-ID: <OF047A1414.5894B72F-ON4225860F.002A5F54-4225860F.002AA438@notes.na.collabserv.com>

Hi Luis

Thanks for your reply.. It should address Andi's issue as the underlying
Swift version is what is important and the functionality he needs is in the
latest releases (I was told 5.1 includes Swift Train which is the latest
version).

Am sure there is a beta program for Spectrum Scale.. Perhaps Andi should
speak to his software sales rep and ask to be included on it to get access
so that he can test.

Kind Regards

Nishaan Docrat
Client Technical Specialist - Storage Systems
IBM Systems Hardware

Work: +27 (0)11 302 5001
Mobile: +27 (0)81 040 3793
Email: nishaan at za.ibm.com


From:	"Luis Bolinches" <luis.bolinches at fi.ibm.com>
To:	gpfsug-discuss at spectrumscale.org
Cc:	gpfsug-discuss at spectrumscale.org
Date:	2020/10/28 09:29
Subject:	[EXTERNAL] Re: [gpfsug-discuss] Alternative to Scale S3 API.
Sent by:	gpfsug-discuss-bounces at spectrumscale.org


Hi

5.1.x is going GA very soon (TM). Would it address the issues Andi sees on
his environment or not I cannot say.

I can take it with Andi for more details on the GA date


--
Yst?v?llisin terveisin / Kind regards / Saludos cordiales / Salutations /
Salutacions
Luis Bolinches
Consultant IT Specialist
IBM Spectrum Scale development
Mobile Phone: +358503112585

https://www.youracclaim.com/user/luis-bolinches

Ab IBM Finland Oy
Laajalahdentie 23
00330 Helsinki
Uusimaa - Finland

"If you always give you will always have" --  Anonymous


 ----- Original message -----
 From: "Nishaan Docrat" <NISHAAN at za.ibm.com>
 Sent by: gpfsug-discuss-bounces at spectrumscale.org
 To: Andi Christiansen <andi at christiansen.xxx>
 Cc: gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
 Subject: [EXTERNAL] Re: [gpfsug-discuss] Alternative to Scale S3 API.
 Date: Wed, Oct 28, 2020 08:47


 Hi Andi

 The s3api middleware does support multipart uploads..

 https://docs.openstack.org/swift/latest/s3_compat.html

 The current version of Swift (PIKE) that is bundled with Spectrum Scale
 5.0.X doesn't..
 https://www.ibm.com/support/knowledgecenter/en/STXKQY_5.0.5/com.ibm.spectrum.scale.v5r05.doc/bl1adm_ManagingOpenStackACLsviaAmazonS3API.htm


 According to the Spectrum Scale Roadmap, 5.1 is due out 2H20.. Not sure if
 someone from development can confirm the GA date.

 Does Veeam have a test utility? You could always test it using the current
 Swift AIO or if you can provide me with a test utility I can test that for
 you.

 Kind Regards

 Nishaan Docrat
 Client Technical Specialist - Storage Systems
 IBM Systems Hardware

 Work: +27 (0)11 302 5001
 Mobile: +27 (0)81 040 3793
 Email: nishaan at za.ibm.com


 Inactive hide details for Andi Christiansen ---2020/10/28 08:24:55---Hi
 Nishaan, Thanks for you reply.Andi Christiansen ---2020/10/28
 08:24:55---Hi Nishaan, Thanks for you reply.

 From: Andi Christiansen <andi at christiansen.xxx>
 To: gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>,
 Nishaan Docrat <NISHAAN at za.ibm.com>
 Date: 2020/10/28 08:24
 Subject: [EXTERNAL] Re: [gpfsug-discuss] Alternative to Scale S3 API.


 Hi Nishaan, Thanks for you reply. When you say 5.1? is that 5.1.x.x or
 5.0.5.1? Some...
                                                                            
                                                                            
               This Message Is From an External Sender                      
               This message came from outside your organization.            
                                                                            
                                                                            
 Hi Nishaan,

 Thanks for you reply.

 When you say 5.1? is that 5.1.x.x or 5.0.5.1?

 Some of the limitations we have encountered is the multipart upload not
 being supported and some md5sum that the s3 api does that veeam actually
 dont like. also interms of the management on the Scale GUI, that has to be
 on one of the S3 CES nodes in order to be able to show project, container
 etc... but when you have a HAProxy for enabling SSL then a GUI is not
 available as they both use port 443?


 i know min.io is not the full stack of S3 API commands but as far as i can
 read it comes with more features out of the box than Scale S3 does,
 multipart for an example...

 I looked through your documentation and its very close to what we have set
 up today and found to not work...

 If multipart uploads would be supported today on scale S3 i would think
 about still using scale for the s3 part but as i expect that you talk
 about 5.1.x.x i dont see that being released any time soon? and dont know
 if that is actually going to be supported in that release then i cant wait
 for that to happen..

 Best Regards
 Andi Christiansen
             On 10/27/2020 2:38 PM Nishaan Docrat <nishaan at za.ibm.com>
             wrote:


             Hi Andi

             The current S3 compatibility in Spectrum Scale is delivered
             via the Swift3 middleware. This middleware has since been
             replaced by s3api in later versions of Swift. Spectrum Scale
             5.1 will make use of Swift Train release which will include
             the new s3api middleware.

             I've tested the S3 compatibility with a few applications
             including Spectrum Scale itself (i.e. Cloud Data Sharing to
             another Scale Object store using S3 API) and Spectrum Protect
             etc. and haven't had any issue. I've also ran a few
             application tools to test for an S3 compliant object stores
             and again had no issues.

             You can use s3compat to test the current compatibility.. Or
             you can check here for the current compatibility..
             https://docs.openstack.org/swift/latest/s3_compat.html

             Not sure if there is any other way to talk HTTPS without using
             HAProxy.

             In any case, I've documented the process to setup an S3
             compliant object store including supporting virtual-hosted
             style bucket addressing which you can find here..

             https://www.linkedin.com/feed/update/urn:li:activity:6720227398756909056/


             Most storage vendors including minio would not support the
             full S3 API stack as alot of the calls are specific to AWS
             (like billing stuff etc.).

             Anyway, good luck with your testing.

             Kind Regards

             Nishaan Docrat
             Client Technical Specialist - Storage Systems
             IBM Systems Hardware

             Work: +27 (0)11 302 5001
             Mobile: +27 (0)81 040 3793
             Email: nishaan at za.ibm.com


             Inactive hide details for Andi Christiansen ---2020/10/27
             13:59:30---Hi all, We have over a longer period used the S3
             API withiAndi Christiansen ---2020/10/27 13:59:30---Hi all, We
             have over a longer period used the S3 API within spectrum
             Scale.. And that has shown that

             From: Andi Christiansen <andi at christiansen.xxx>
             To: "gpfsug-discuss at spectrumscale.org"
             <gpfsug-discuss at spectrumscale.org>
             Date: 2020/10/27 13:59
             Subject: [EXTERNAL] [gpfsug-discuss] Alternative to Scale S3
             API.
             Sent by: gpfsug-discuss-bounces at spectrumscale.org


             Hi all,


             We have over a longer period used the S3 API within spectrum
             Scale.. And that has shown that it does not support very many
             applications because of limitations of the API..


             Has anyone got any experience with any other product we can
             deploy on-top of Spectrum Scale that will give us a true S3
             API with full functionalities and able to answer on port 443?
             As of now we use HAProxy to forware ssl request back and forth
             from Scale S3 API.


             We have looked at MinIO which seems to be fairly simple and
             maybe might solve a lot of incompatibilities with clients
             software. But the product seems to be very badly documented at
             least for me.

             The idea is basically that a client uses their backup
             application(rubrik, veeam etc.) to connect to a domain(for
             example backup.mycompany.com) with their access and secret key
             and have access to their bucket only. and it must be over
             https/ssl.


             If someone has any knowledge to minio or any other product
             that might solve our problem I will be glad to hear from you!
             ?

             Thank you in advance!
             _______________________________________________
             gpfsug-discuss mailing list
             gpfsug-discuss at spectrumscale.org
             http://gpfsug.org/mailman/listinfo/gpfsug-discuss


             _______________________________________________
             gpfsug-discuss mailing list
             gpfsug-discuss at spectrumscale.org
             http://gpfsug.org/mailman/listinfo/gpfsug-discuss


 _______________________________________________
 gpfsug-discuss mailing list
 gpfsug-discuss at spectrumscale.org
 http://gpfsug.org/mailman/listinfo/gpfsug-discuss


Ellei edell? ole toisin mainittu: / Unless stated otherwise above:
Oy IBM Finland Ab
PL 265, 00101 Helsinki, Finland
Business ID, Y-tunnus: 0195876-3
Registered in Finland
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20201028/3cef7908/attachment-0002.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 16781831.jpg
Type: image/jpeg
Size: 35643 bytes
Desc: not available
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20201028/3cef7908/attachment-0002.jpg>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: graycol.gif
Type: image/gif
Size: 105 bytes
Desc: not available
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20201028/3cef7908/attachment-0002.gif>

From luis.bolinches at fi.ibm.com  Wed Oct 28 07:51:30 2020
From: luis.bolinches at fi.ibm.com (Luis Bolinches)
Date: Wed, 28 Oct 2020 07:51:30 +0000
Subject: [gpfsug-discuss] Alternative to Scale S3 API.
In-Reply-To: <OF047A1414.5894B72F-ON4225860F.002A5F54-4225860F.002AA438@notes.na.collabserv.com>
References: <OF047A1414.5894B72F-ON4225860F.002A5F54-4225860F.002AA438@notes.na.collabserv.com>,
	<OF4063F396.B7DA9D54-ON4225860F.0023CA79-4225860F.00251FA3@notes.na.collabserv.com><OF06DC9BE9.BBFA65F5-ON0025860F.0027441B-0025860F.0027DBD0@notes.na.collabserv.com>
Message-ID: <OF2F6C2E30.34B1F144-ON0025860F.002B29E9-0025860F.002B2B09@notes.na.collabserv.com>

An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20201028/e02ad49e/attachment-0002.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: Image.1__=CDBB0C9CDFB9D9C48f9e8a93df938690918cCDB at .jpg
Type: image/jpeg
Size: 35643 bytes
Desc: not available
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20201028/e02ad49e/attachment-0002.jpg>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: Image.2__=CDBB0C9CDFB9D9C48f9e8a93df938690918cCDB at .gif
Type: image/gif
Size: 105 bytes
Desc: not available
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20201028/e02ad49e/attachment-0002.gif>

From Robert.Oesterlin at nuance.com  Thu Oct 29 11:16:13 2020
From: Robert.Oesterlin at nuance.com (Oesterlin, Robert)
Date: Thu, 29 Oct 2020 11:16:13 +0000
Subject: [gpfsug-discuss] SSUG Digital Expert Talk: 11/4 - AI workloads on
 NVIDIA DGX, Red Hat OpenShift and IBM Spectrum Scale
Message-ID: <77EA43ED-C430-42CA-872E-D2307F244775@nuance.com>

Reminder for our  upcoming expert talk:

SSUG::Digital: Scalable multi-node training for AI workloads on NVIDIA DGX, Red Hat OpenShift and IBM Spectrum Scale
November 4 @ 16:15 - 17:45 GMT

Nvidia and IBM did a complex proof-of-concept to demonstrate the scaling of AI workload using Nvidia DGX, Red Hat OpenShift and IBM Spectrum Scale at the example of ResNet-50 and the segmentation of images using the Audi A2D2 dataset. The project team published an IBM Redpaper with all the technical details and will present the key learnings and results.

Registration link for Webex session: https://www.spectrumscaleug.org/event/ssugdigital-multi-node-training-for-ai-workloads/


Bob Oesterlin
Sr Principal Storage Engineer, Nuance

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20201029/38f0946b/attachment-0002.htm>

From kkr at lbl.gov  Thu Oct 29 21:43:02 2020
From: kkr at lbl.gov (Kristy Kallback-Rose)
Date: Thu, 29 Oct 2020 14:43:02 -0700
Subject: [gpfsug-discuss] SC20 Planning - What questions would you ask a
	panel?
In-Reply-To: <B8A6C2AA-B4B0-473B-B7EC-C751F0EC9D7C@lbl.gov>
References: <B8A6C2AA-B4B0-473B-B7EC-C751F0EC9D7C@lbl.gov>
Message-ID: <3D8238FB-F8A5-48F1-BA5C-57AC93DCDE35@lbl.gov>

Really? There?s nothing you want to ask about GPFS/Spectrum Scale? There will be access to developers and management alike, so I have to imagine you have something to ask? Don?t be shy. 

Please help make this a lively discussion by submitting a question, or two. 

Best,
Kristy

> On Oct 13, 2020, at 2:34 PM, Kristy Kallback-Rose <kkr at lbl.gov> wrote:
> 
> Hi all,
> 
> 	By now you know SC will be digital this year. We are working towards some SC events for the Spectrum Scale User Group, and using our usual slot of Sunday did not seem like a great idea. So, we?re planning a couple 90-minute sessions and would like to do a panel during one of them. We?ll hope to do live Q&A, like an in-person Ask Me Anything session, but it?s probably a good idea to have a bank of questions ready as well, Also, that way your question may get asked, even if you can?t make the live session ?we?ll record these sessions for later viewing. 
> 
> 	So, please send your questions for the panel and we?ll get a list built up. Better yet, attend the sessions live! Details to come, but for now, hold these time slots:
> 
> November 16th - 8:00 AM Pacific/3:00 PM UTC 
> 
> November 18th - 8:00 AM Pacific/3:00 PM UTC 
> 
> Best,
> Kristy
> 
> Kristy Kallback-Rose
> Senior HPC Storage Systems Analyst
> National Energy Research Scientific Computing Center
> Lawrence Berkeley National Laboratory
> 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20201029/ec75fc5e/attachment-0002.htm>

From kkr at lbl.gov  Thu Oct 29 21:49:34 2020
From: kkr at lbl.gov (Kristy Kallback-Rose)
Date: Thu, 29 Oct 2020 14:49:34 -0700
Subject: [gpfsug-discuss] SC20  Sessions - Dates and times are settled,
	please join us!
Message-ID: <8BECE369-B5B4-404F-B4C0-07EE02DE6295@lbl.gov>

Hi all,

	The Spectrum Scale User Group will be hosting two 90 minute sessions at SC20 this year and we hope you can join us. The first one is:

 "Storage for AI" and will be held Monday, Nov. 16th, from 11:00-12:30 EST 

and the second one is 

"What's new in Spectrum Scale 5.1?" and will be held Wednesday, Nov. 18th from 11:00-12:30 EST.  

Please see the calendar at https://www.spectrumscaleug.org/eventslist/2020-11/ and register by clicking on a session on the calendar and then the "Please register here to join the session" link.

Best,
Kristy

Kristy Kallback-Rose
Senior HPC Storage Systems Analyst
National Energy Research Scientific Computing Center
Lawrence Berkeley National Laboratory


From heinrich.billich at id.ethz.ch  Fri Oct 30 12:21:58 2020
From: heinrich.billich at id.ethz.ch (Billich  Heinrich Rainer (ID SD))
Date: Fri, 30 Oct 2020 12:21:58 +0000
Subject: [gpfsug-discuss] 'ganesha_mgr display_export - client not listed
Message-ID: <660DD807-C723-44EF-BC51-57EFB296FFC4@id.ethz.ch>

Hello,

Some nfsv4 client of ganesha does not show up in the output of 'ganesha_mgr display_export'. The client has an active mount, but also shows some nfs issues, some commands did hang, the process just stays in state D (uninterruptible sleep) according to 'ps', but not the whole mount.  

I just wonder if the client's IP should always show up in the output of display_export once the client did issue a mount call and if the absence indicates that something is broken. 
Gutr,gut,
Putting it the other way round: When is a client listed in the output of display_export and when is it removed from the list?

We do collect more debug data, this is just something that catched my eye.
Thank you,

Heiner

We run ganesha 2.7.5-ibm058.05 on a spectrum scale system on RedHat 7.7.

I crosspost to the gpfsug mailing list.

-- 
=======================
Heinrich Billich
ETH Z?rich
Informatikdienste
Tel.: +41 44 632 72 56
heinrich.billich at id.ethz.ch
========================
 
 
# ganesha_mgr  display_export 37
Display export with id 37
export 37: path = /xxxx/yyy, pseudo = /xxx/yyy , tag = /xxx/yyy
 Client type,  CIDR version, CIDR address, CIDR mask, CIDR proto, Anonymous UID, Anonymous GID, Attribute timeout, Options, Set
  a.b.c.198/32,  0,  0,  255,  1,  4294967294,  4294967294,  0,  1126195680, 1081209831
 a.b.c.143/32,  0,  0,  255,  1,  4294967294,  4294967294,  0,  1126195680, 1081209831
 a.b.c.236/32,  0,  0,  255,  1,  4294967294,  4294967294,  0,  1126195680, 1081209831
 a.b.c.34/32,  0,  0,  255,  1,  4294967294,  4294967294,  0,  1126195680, 1081209831
 a.b.c.70/32,  0,  0,  255,  1,  4294967294,  4294967294,  0,  1126195680, 1081209831
 a.b.c.71/32,  0,  0,  255,  1,  4294967294,  4294967294,  0,  1126195680, 1081209831
 *,  0,  0,  0,  0,  4294967294,  4294967294,  0,  1126187490, 1081209831


From skylar2 at uw.edu  Fri Oct 30 14:01:37 2020
From: skylar2 at uw.edu (Skylar Thompson)
Date: Fri, 30 Oct 2020 07:01:37 -0700
Subject: [gpfsug-discuss] SC20 Planning - What questions would you ask a
 panel?
In-Reply-To: <3D8238FB-F8A5-48F1-BA5C-57AC93DCDE35@lbl.gov>
References: <B8A6C2AA-B4B0-473B-B7EC-C751F0EC9D7C@lbl.gov>
	<3D8238FB-F8A5-48F1-BA5C-57AC93DCDE35@lbl.gov>
Message-ID: <20201030140137.hakhxwppcmaoixy6@thargelion>

Here's one:

How is IBM working to improve the integration between TSM and GPFS? We're
in the biomedical space and have some overlapping regulatory requirements
around retention, which translate to complicated INCLUDE/EXCLUDE rules that
mmbackup has always had trouble processing. In particular, we need to be
able to INCLUDE particular paths to set a management class, but then
EXCLUDE particular paths, which results in mmbackup generating 
file lists for dsmc including those excluded paths, which dsmc can exclude
but it logs every single one every time it runs.

On Thu, Oct 29, 2020 at 02:43:02PM -0700, Kristy Kallback-Rose wrote:
> Really? There???s nothing you want to ask about GPFS/Spectrum Scale? There will be access to developers and management alike, so I have to imagine you have something to ask??? Don???t be shy. 
> 
> Please help make this a lively discussion by submitting a question, or two. 
> 
> Best,
> Kristy
> 
> > On Oct 13, 2020, at 2:34 PM, Kristy Kallback-Rose <kkr at lbl.gov> wrote:
> > 
> > Hi all,
> > 
> > 	By now you know SC will be digital this year. We are working towards some SC events for the Spectrum Scale User Group, and using our usual slot of Sunday did not seem like a great idea. So, we???re planning a couple 90-minute sessions and would like to do a panel during one of them. We???ll hope to do live Q&A, like an in-person Ask Me Anything session, but it???s probably a good idea to have a bank of questions ready as well, Also, that way your question may get asked, even if you can???t make the live session ???we???ll record these sessions for later viewing. 
> > 
> > 	So, please send your questions for the panel and we???ll get a list built up. Better yet, attend the sessions live! Details to come, but for now, hold these time slots:
> > 
> > November 16th - 8:00 AM Pacific/3:00 PM UTC 
> > 
> > November 18th - 8:00 AM Pacific/3:00 PM UTC 
> > 
> > Best,
> > Kristy
> > 
> > Kristy Kallback-Rose
> > Senior HPC Storage Systems Analyst
> > National Energy Research Scientific Computing Center
> > Lawrence Berkeley National Laboratory
> > 
> 

> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss


-- 
-- Skylar Thompson (skylar2 at u.washington.edu)
-- Genome Sciences Department (UW Medicine), System Administrator
-- Foege Building S046, (206)-685-7354
-- Pronouns: He/Him/His


From cblack at nygenome.org  Fri Oct 30 14:19:24 2020
From: cblack at nygenome.org (Christopher Black)
Date: Fri, 30 Oct 2020 14:19:24 +0000
Subject: [gpfsug-discuss] SC20  Sessions - Dates and times are settled,
 please join us!
In-Reply-To: <8BECE369-B5B4-404F-B4C0-07EE02DE6295@lbl.gov>
References: <8BECE369-B5B4-404F-B4C0-07EE02DE6295@lbl.gov>
Message-ID: <62E7471D-02B9-4C27-B0F0-4038CCB2C66E@nygenome.org>

Could you talk about upcoming work to address excessive prefetch when reading small fractions of many large files?
Some bioinformatics workloads have a client node reading relatively small regions of multiple 50GB+ files. We've seen this trigger excessive prefetch bandwidth (especially on 16MB block filesystem). Investigation shows that much of the prefetched data is never read, but cache gets full, evicts blocks, then more prefetch happens. We can avoid this by turning prefetch off, but that reduces speed of other workloads that read full files sequentially.  Turning prefetch on and off based on job won't work well for our users.

We've heard this would be addressed in gpfs 5.1 at the earliest and have provided an example workload to devs. They've done some great analysis and determined the problem is worse on large (16M) block filesystems (which are now the recommended and default on new ess filesystems with sub-block allocation enabled).

Best,
Chris

?On 10/29/20, 5:49 PM, "gpfsug-discuss-bounces at spectrumscale.org on behalf of Kristy Kallback-Rose" <gpfsug-discuss-bounces at spectrumscale.org on behalf of kkr at lbl.gov> wrote:

    Hi all,

    The Spectrum Scale User Group will be hosting two 90 minute sessions at SC20 this year and we hope you can join us. The first one is:

     "Storage for AI" and will be held Monday, Nov. 16th, from 11:00-12:30 EST

    and the second one is

    "What's new in Spectrum Scale 5.1?" and will be held Wednesday, Nov. 18th from 11:00-12:30 EST.

    Please see the calendar at https://urldefense.com/v3/__https://www.spectrumscaleug.org/eventslist/2020-11/__;!!C6sPl7C9qQ!G0wT65UH3HoMnjBM6_ZAVfZwWwJz5SoLE5gpB_LM0N8SNSU3TXItF31dfxG_8Pow$  and register by clicking on a session on the calendar and then the "Please register here to join the session" link.

    Best,
    Kristy

    Kristy Kallback-Rose
    Senior HPC Storage Systems Analyst
    National Energy Research Scientific Computing Center
    Lawrence Berkeley National Laboratory

    _______________________________________________
    gpfsug-discuss mailing list
    gpfsug-discuss at spectrumscale.org
    https://urldefense.com/v3/__http://gpfsug.org/mailman/listinfo/gpfsug-discuss__;!!C6sPl7C9qQ!G0wT65UH3HoMnjBM6_ZAVfZwWwJz5SoLE5gpB_LM0N8SNSU3TXItF31df0lybvoA$

________________________________

This message is for the recipient?s use only, and may contain confidential, privileged or protected information. Any unauthorized use or dissemination of this communication is prohibited. If you received this message in error, please immediately notify the sender and destroy all copies of this message. The recipient should check this email and any attachments for the presence of viruses, as we accept no liability for any damage caused by any virus transmitted by this email.