[gpfsug-discuss] OPA HFI and Mellanox HCA on same NSD Server with only ib rdma enabled

Giovanni Bracco giovanni.bracco at enea.it
Wed Feb 3 08:58:37 GMT 2021


We did not explore the issue of the IBM support and for budget 
limitation and for the mandatory integration of the data space between 
the two clusters, we decided to try the setup of the multi-fabric 
infrastructure and up to now it has been working without problems.

Giovanni

On 02/02/21 14:10, Walter Sklenka wrote:
> Hi Giovanni!
> 
> Thank you for your offer! 😊
> 
> it is planned to be implemented in June or so
> 
> We will use RHEL 8.x and newest gpfs version available
> 
> Only one question for this moment if I am allowed:
> 
> Did you ever ran into any problems with IBM support? I mean they say in 
> the FAQ shortly "not supported" , but do they in your environment or do 
> you accept that rdma problems would be needed to be fixed without IBM
> 
> Thank you very much and have great days! And keep healthy!
> 
> Best regards walter
> 
> -----Original Message-----
> From: Giovanni Bracco <giovanni.bracco at enea.it>
> Sent: Montag, 1. Februar 2021 20:42
> To: Walter Sklenka <Walter.Sklenka at EDV-Design.at>
> Cc: gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
> Subject: Re: [gpfsug-discuss] OPA HFI and Mellanox HCA on same NSD 
> Server with only ib rdma enabled
> 
> On 30/01/21 21:01, Walter Sklenka wrote:
> 
>  > Hi Giovanni!
> 
>  > Thats great! Many thanks for your fast and detailed answer!!!!
> 
>  > So this is the way we will go too!
> 
>  >
> 
>  > Have a nice weekend and keep healthy!
> 
>  > Best regards
> 
>  > Walter
> 
>  >
> 
> I suppose you will implement the solution with more recent versions of 
> the software components, so please let me know if everything works!
> 
> If yu have any issues I am ready to discuss!
> 
> Regards
> 
> Giovanni
> 
>  > -----Original Message-----
> 
>  > From: Giovanni Bracco <giovanni.bracco at enea.it 
> <mailto:giovanni.bracco at enea.it>>
> 
>  > Sent: Samstag, 30. Jänner 2021 18:08
> 
>  > To: gpfsug main discussion list <gpfsug-discuss at spectrumscale.org 
> <mailto:gpfsug-discuss at spectrumscale.org>>;
> 
>  > Walter Sklenka <Walter.Sklenka at EDV-Design.at 
> <mailto:Walter.Sklenka at EDV-Design.at>>
> 
>  > Subject: Re: [gpfsug-discuss] OPA HFI and Mellanox HCA on same NSD
> 
>  > Server with only ib rdma enabled
> 
>  >
> 
>  > In our HPC infrastructure we have 6 NSD server, running CentOS 7.4, 
> each of them with with 1 Intel QDR HCA to a QDR Cluster (now 100 nodes 
> SandyBridge cpu it was 300 nodes CentOS 6.5), 1 OPA HCA to the main OPA 
> Cluster (400 nodes Skylake cpu, CentOS 7.3) and 1 Mellanox FDR to DDN 
> storages and it works nicely using RDMA since 2018. GPFS 4.2.3-19.
> 
>  > See
> 
>  > F. Iannone et al., "CRESCO ENEA HPC clusters: a working example of a
> 
>  > multifabric GPFS Spectrum Scale layout," 2019 International Conference
> 
>  > on High Performance Computing & Simulation (HPCS), Dublin, Ireland,
> 
>  > 2019, pp. 1051-1052, doi: 10.1109/HPCS48598.2019.918813
> 
>  >
> 
>  > When setting up the system the main trick has been:
> 
>  > just use CentOS drivers and do not install OFED We do not use IPoIB.
> 
>  >
> 
>  > Giovanni
> 
>  >
> 
>  > On 30/01/21 06:45, Walter Sklenka wrote:
> 
>  >> Hi!
> 
>  >>
> 
>  >> Is it possible to mix OPAcards and Infininiband HCAs on the same server?
> 
>  >>
> 
>  >> In the faq
> 
>  >> https://www.ibm.com/support/knowledgecenter/en/STXKQY/gpfsclustersfaq.
> 
>  >> html#rdma
> 
>  >>
> 
>  >>
> 
>  >> They talk about RDMA :
> 
>  >>
> 
>  >> "RDMA is NOT  supported on a node when both Mellanox HCAs and Intel
> 
>  >> Omni-Path HFIs are ENABLED for RDMA."
> 
>  >>
> 
>  >> So do I understand right: When we do NOT enable  the opa interface we
> 
>  >> can still enable IB ?
> 
>  >>
> 
>  >> The reason I ask  is, that we have a gpfs cluster of 6 NSD Servers
> 
>  >> (wih access to storage)  with opa interfaces which provide access to
> 
>  >> remote cluster  also via OPA.
> 
>  >>
> 
>  >> A new cluster with HDR interfaces will be implemented soon
> 
>  >>
> 
>  >> They shell have access to the same filesystems
> 
>  >>
> 
>  >> When we add HDR interfaces to  NSD servers  and enable rdma on this
> 
>  >> network  while disabling rdma on opa we would accept the worse
> 
>  >> performance via opa . We hope that this provides  still better perf
> 
>  >> and less technical overhead  than using routers
> 
>  >>
> 
>  >> Or am I totally wrong?
> 
>  >>
> 
>  >> Thank you very much and keep healthy!
> 
>  >>
> 
>  >> Best regards
> 
>  >>
> 
>  >> Walter
> 
>  >>
> 
>  >> Mit freundlichen Grüßen
> 
>  >> */Walter Sklenka/*
> 
>  >> */Technical Consultant/*
> 
>  >>
> 
>  >> EDV-Design Informationstechnologie GmbH Giefinggasse 6/1/2, A-1210
> 
>  >> Wien
> 
>  >> Tel: +43 1 29 22 165-31
> 
>  >> Fax: +43 1 29 22 165-90
> 
>  >> E-Mail: sklenka at edv-design.at <mailto:sklenka at edv-design.at> 
> <mailto:sklenka at edv-design.at>
> 
>  >> Internet: www.edv-design.at <http://www.edv-design.at> 
> <http://www.edv-design.at/>
> 
>  >>
> 
>  >>
> 
>  >> _______________________________________________
> 
>  >> gpfsug-discuss mailing list
> 
>  >> gpfsug-discuss at spectrumscale.org
> 
>  >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
> 
>  >>
> 
>  >
> 
>  > --
> 
>  > Giovanni Bracco
> 
>  > phone  +39 351 8804788
> 
>  > E-mail giovanni.bracco at enea.it <mailto:giovanni.bracco at enea.it>
> 
>  > WWW http://www.afs.enea.it/bracco
> 
>  >
> 
> --
> 
> Giovanni Bracco
> 
> phone  +39 351 8804788
> 
> E-mail giovanni.bracco at enea.it <mailto:giovanni.bracco at enea.it>
> 
> WWW http://www.afs.enea.it/bracco
> 

-- 
Giovanni Bracco
phone  +39 351 8804788
E-mail  giovanni.bracco at enea.it
WWW http://www.afs.enea.it/bracco



More information about the gpfsug-discuss mailing list