[gpfsug-discuss] nodes being ejected out of the cluster

Wed Jan 11 19:46:00 GMT 2017

Don't think you can change it without reloading gpfs. Also it should be
turned off for all nodes.. So it's a big change, unfortunately..

-jf
ons. 11. jan. 2017 kl. 20.22 skrev Damir Krstic <damir.krstic at gmail.com>:

> Can this be done live? Meaning can GPFS remain up when I turn this off?
>
> Thanks,
> Damir
>
> On Wed, Jan 11, 2017 at 12:38 PM Jan-Frode Myklebust <janfrode at tanso.net>
> wrote:
>
> And there you have:
>
> [ems1-fdr,compute,gss_ppc64]
> verbsRdmaSend yes
>
> Try turning this off.
>
>
> -jf
> ons. 11. jan. 2017 kl. 18.54 skrev Damir Krstic <damir.krstic at gmail.com>:
>
> Thanks for all the suggestions. Here is our mmlsconfig file. We just
> purchased another GL6. During the installation of the new GL6 IBM will
> upgrade our existing GL6 up to the latest code levels. This will happen
> during the week of 23rd of Jan.
>
> I am skeptical that the upgrade is going to fix the issue.
>
> On our IO servers we are running in connected mode (please note that IB
> interfaces are bonded)
>
> [root at gssio1 ~]# cat /sys/class/net/ib0/mode
>
> connected
>
> [root at gssio1 ~]# cat /sys/class/net/ib1/mode
>
> connected
>
> [root at gssio1 ~]# cat /sys/class/net/ib2/mode
>
> connected
>
> [root at gssio1 ~]# cat /sys/class/net/ib3/mode
>
> connected
>
> [root at gssio2 ~]# cat /sys/class/net/ib0/mode
>
> connected
>
> [root at gssio2 ~]# cat /sys/class/net/ib1/mode
>
> connected
>
> [root at gssio2 ~]# cat /sys/class/net/ib2/mode
>
> connected
>
> [root at gssio2 ~]# cat /sys/class/net/ib3/mode
>
> connected
>
> Our login nodes are also running connected mode as well.
>
> However, all of our compute nodes are running in datagram:
>
> [root at mgt ~]# psh compute cat /sys/class/net/ib0/mode
>
> qnode0758: datagram
>
> qnode0763: datagram
>
> qnode0760: datagram
>
> qnode0772: datagram
>
> qnode0773: datagram
> ....etc.
>
> Here is our mmlsconfig:
>
> [root at gssio1 ~]# mmlsconfig
>
> Configuration data for cluster ess-qstorage.it.northwestern.edu:
>
> ----------------------------------------------------------------
>
> clusterName ess-qstorage.it.northwestern.edu
>
> clusterId 17746506346828356609
>
> dmapiFileHandleSize 32
>
> minReleaseLevel 4.2.0.1
>
> ccrEnabled yes
>
> cipherList AUTHONLY
>
> [gss_ppc64]
>
> nsdRAIDBufferPoolSizePct 80
>
> maxBufferDescs 2m
>
> prefetchPct 5
>
> nsdRAIDTracks 128k
>
> nsdRAIDSmallBufferSize 256k
>
> nsdMaxWorkerThreads 3k
>
> nsdMinWorkerThreads 3k
>
> nsdRAIDSmallThreadRatio 2
>
> nsdRAIDThreadsPerQueue 16
>
> nsdRAIDEventLogToConsole all
>
> nsdRAIDFastWriteFSDataLimit 256k
>
> nsdRAIDFastWriteFSMetadataLimit 1M
>
> nsdRAIDReconstructAggressiveness 1
>
> nsdRAIDFlusherBuffersLowWatermarkPct 20
>
> nsdRAIDFlusherBuffersLimitPct 80
>
> nsdRAIDFlusherTracksLowWatermarkPct 20
>
> nsdRAIDFlusherTracksLimitPct 80
>
> nsdRAIDFlusherFWLogHighWatermarkMB 1000
>
> nsdRAIDFlusherFWLogLimitMB 5000
>
> nsdRAIDFlusherThreadsLowWatermark 1
>
> nsdRAIDFlusherThreadsHighWatermark 512
>
> nsdRAIDBlockDeviceMaxSectorsKB 8192
>
> nsdRAIDBlockDeviceNrRequests 32
>
> nsdRAIDBlockDeviceQueueDepth 16
>
> nsdRAIDBlockDeviceScheduler deadline
>
> nsdRAIDMaxTransientStale2FT 1
>
> nsdRAIDMaxTransientStale3FT 1
>
> nsdMultiQueue 512
>
> syncWorkerThreads 256
>
> nsdInlineWriteMax 32k
>
> maxGeneralThreads 1280
>
> maxReceiverThreads 128
>
> nspdQueues 64
>
> [common]
>
> maxblocksize 16m
>
> [ems1-fdr,compute,gss_ppc64]
>
> numaMemoryInterleave yes
>
> [gss_ppc64]
>
> maxFilesToCache 12k
>
> [ems1-fdr,compute]
>
> maxFilesToCache 128k
>
> [ems1-fdr,compute,gss_ppc64]
>
> flushedDataTarget 1024
>
> flushedInodeTarget 1024
>
> maxFileCleaners 1024
>
> maxBufferCleaners 1024
>
> logBufferCount 20
>
> logWrapAmountPct 2
>
> logWrapThreads 128
>
> maxAllocRegionsPerNode 32
>
> maxBackgroundDeletionThreads 16
>
> maxInodeDeallocPrefetch 128
>
> [gss_ppc64]
>
> maxMBpS 16000
>
> [ems1-fdr,compute]
>
> maxMBpS 10000
>
> [ems1-fdr,compute,gss_ppc64]
>
> worker1Threads 1024
>
> worker3Threads 32
>
> [gss_ppc64]
>
> ioHistorySize 64k
>
> [ems1-fdr,compute]
>
> ioHistorySize 4k
>
> [gss_ppc64]
>
> verbsRdmaMinBytes 16k
>
> [ems1-fdr,compute]
>
> verbsRdmaMinBytes 32k
>
> [ems1-fdr,compute,gss_ppc64]
>
> verbsRdmaSend yes
>
> [gss_ppc64]
>
> verbsRdmasPerConnection 16
>
> [ems1-fdr,compute]
>
> verbsRdmasPerConnection 256
>
> [gss_ppc64]
>
> verbsRdmasPerNode 3200
>
> [ems1-fdr,compute]
>
> verbsRdmasPerNode 1024
>
> [ems1-fdr,compute,gss_ppc64]
>
> verbsSendBufferMemoryMB 1024
>
> verbsRdmasPerNodeOptimize yes
>
> verbsRdmaUseMultiCqThreads yes
>
> [ems1-fdr,compute]
>
> ignorePrefetchLUNCount yes
>
> [gss_ppc64]
>
> scatterBufferSize 256K
>
> [ems1-fdr,compute]
>
> scatterBufferSize 256k
>
> syncIntervalStrict yes
>
> [ems1-fdr,compute,gss_ppc64]
>
> nsdClientCksumTypeLocal ck64
>
> nsdClientCksumTypeRemote ck64
>
> [gss_ppc64]
>
> pagepool 72856M
>
> [ems1-fdr]
>
> pagepool 17544M
>
> [compute]
>
> pagepool 4g
>
> [ems1-fdr,qsched03-ib0,quser10-fdr,compute,gss_ppc64]
>
> verbsRdma enable
>
> [gss_ppc64]
>
> verbsPorts mlx5_0/1 mlx5_0/2 mlx5_1/1 mlx5_1/2
>
> [ems1-fdr]
>
> verbsPorts mlx5_0/1 mlx5_0/2
>
> [qsched03-ib0,quser10-fdr,compute]
>
> verbsPorts mlx4_0/1
>
> [common]
>
> autoload no
>
> [ems1-fdr,compute,gss_ppc64]
>
> maxStatCache 0
>
> [common]
>
> envVar MLX4_USE_MUTEX=1 MLX5_SHUT_UP_BF=1 MLX5_USE_MUTEX=1
>
> deadlockOverloadThreshold 0
>
> deadlockDetectionThreshold 0
>
> adminMode central
>
>
> File systems in cluster ess-qstorage.it.northwestern.edu:
>
> ---------------------------------------------------------
>
> /dev/home
>
> /dev/hpc
>
> /dev/projects
>
> /dev/tthome
>
> On Wed, Jan 11, 2017 at 9:16 AM Luis Bolinches <luis.bolinches at fi.ibm.com>
> wrote:
>
> In addition to what Olaf has said
>
> ESS upgrades include mellanox modules upgrades in the ESS nodes. In fact,
> on those noes you should do not update those solo (unless support says so
> in your PMR), so if that's been the recommendation, I suggest you look at
> it.
>
> Changelog on ESS 4.0.4 (no idea what ESS level you are running)
>
>
>   c) Support of MLNX_OFED_LINUX-3.2-2.0.0.1
>      - Updated from MLNX_OFED_LINUX-3.1-1.0.6.1 (ESS 4.0, 4.0.1, 4.0.2)
>      - Updated from MLNX_OFED_LINUX-3.1-1.0.0.2 (ESS 3.5.x)
>      - Updated from MLNX_OFED_LINUX-2.4-1.0.2 (ESS 3.0.x)
>      - Support for PCIe3 LP 2-port 100 Gb EDR InfiniBand adapter x16 (FC EC3E)
>        - Requires System FW level FW840.20 (SV840_104)
>      - No changes from ESS 4.0.3
>
>
> --
> Ystävällisin terveisin / Kind regards / Saludos cordiales / Salutations
>
> Luis Bolinches
> Lab Services
> http://www-03.ibm.com/systems/services/labservices/
>
> IBM Laajalahdentie 23 (main Entrance) Helsinki, 00330 Finland
> Phone: +358 503112585 <+358%2050%203112585>
>
> "If you continually give you will continually have." Anonymous
>
>
>
> ----- Original message -----
> From: "Olaf Weiser" <olaf.weiser at de.ibm.com>
> Sent by: gpfsug-discuss-bounces at spectrumscale.org
> To: gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
>
> Cc:
> Subject: Re: [gpfsug-discuss] nodes being ejected out of the cluster
> Date: Wed, Jan 11, 2017 5:03 PM
>
> most likely, there's smth wrong with your IB fabric ...
> you say, you run ~ 700 nodes ? ...
> Are you running with *verbsRdmaSend*enabled ? ,if so, please consider to
> disable  - and discuss this within the PMR
> another issue, you may check is  - Are you running the IPoIB in connected
> mode or datagram ... but as I said, please discuss this within the PMR ..
> there are to much dependencies to discuss this here ..
>
>
> cheers
>
>
> Mit freundlichen Grüßen / Kind regards
>
>
> Olaf Weiser
>
> EMEA Storage Competence Center Mainz, German / IBM Systems, Storage
> Platform,
>
> -------------------------------------------------------------------------------------------------------------------------------------------
> IBM Deutschland
> IBM Allee 1
> 71139 Ehningen
> Phone: +49-170-579-44-66 <+49%20170%205794466>
> E-Mail: olaf.weiser at de.ibm.com
>
> -------------------------------------------------------------------------------------------------------------------------------------------
> IBM Deutschland GmbH / Vorsitzender des Aufsichtsrats: Martin Jetter
> Geschäftsführung: Martina Koederitz (Vorsitzende), Susanne Peter, Norbert
> Janzen, Dr. Christian Keller, Ivo Koerner, Markus Koerner
> Sitz der Gesellschaft: Ehningen / Registergericht: Amtsgericht Stuttgart,
> HRB 14562 / WEEE-Reg.-Nr. DE 99369940
>
>
>
> From:        Damir Krstic <damir.krstic at gmail.com>
> To:        gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
> Date:        01/11/2017 03:39 PM
> Subject:        [gpfsug-discuss] nodes being ejected out of the cluster
> Sent by:        gpfsug-discuss-bounces at spectrumscale.org
> ------------------------------
>
>
>
> We are running GPFS 4.2 on our cluster (around 700 compute nodes). Our
> storage (ESS GL6) is also running GPFS 4.2. Compute nodes and storage are
> connected via Infiniband (FDR14). At the time of implementation of ESS, we
> were instructed to enable RDMA in addition to IPoIB. Previously we only ran
> IPoIB on our GPFS3.5 cluster.
>
> Every since the implementation (sometime back in July of 2016) we see a
> lot of compute nodes being ejected. What usually precedes the ejection are
> following messages:
>
> Jan 11 02:03:15 quser13 mmfs: [E] VERBS RDMA rdma send error
> IBV_WC_RNR_RETRY_EXC_ERR to 172.41.2.5 (gssio2-fdr) on mlx4_0 port 1 fabnum
> 0 vendor_err 135
> Jan 11 02:03:15 quser13 mmfs: [E] VERBS RDMA closed connection to
> 172.41.2.5 (gssio2-fdr) on mlx4_0 port 1 fabnum 0 due to send error
> IBV_WC_RNR_RETRY_EXC_ERR index 2
> Jan 11 02:03:26 quser13 mmfs: [E] VERBS RDMA rdma send error
> IBV_WC_RNR_RETRY_EXC_ERR to 172.41.2.5 (gssio2-fdr) on mlx4_0 port 1 fabnum
> 0 vendor_err 135
> Jan 11 02:03:26 quser13 mmfs: [E] VERBS RDMA closed connection to
> 172.41.2.5 (gssio2-fdr) on mlx4_0 port 1 fabnum 0 due to send error
> IBV_WC_WR_FLUSH_ERR index 1
> Jan 11 02:03:26 quser13 mmfs: [E] VERBS RDMA rdma send error
> IBV_WC_RNR_RETRY_EXC_ERR to 172.41.2.5 (gssio2-fdr) on mlx4_0 port 1 fabnum
> 0 vendor_err 135
> Jan 11 02:03:26 quser13 mmfs: [E] VERBS RDMA closed connection to
> 172.41.2.5 (gssio2-fdr) on mlx4_0 port 1 fabnum 0 due to send error
> IBV_WC_RNR_RETRY_EXC_ERR index 2
> Jan 11 02:06:38 quser11 mmfs: [E] VERBS RDMA rdma send error
> IBV_WC_RNR_RETRY_EXC_ERR to 172.41.2.5 (gssio2-fdr) on mlx4_0 port 1 fabnum
> 0 vendor_err 135
> Jan 11 02:06:38 quser11 mmfs: [E] VERBS RDMA closed connection to
> 172.41.2.5 (gssio2-fdr) on mlx4_0 port 1 fabnum 0 due to send error
> IBV_WC_WR_FLUSH_ERR index 400
>
> Even our ESS IO server sometimes ends up being ejected (case in point -
> yesterday morning):
>
> Jan 10 11:23:42 gssio2 mmfs: [E] VERBS RDMA rdma send error
> IBV_WC_RNR_RETRY_EXC_ERR to 172.41.2.1 (gssio1-fdr) on mlx5_1 port 1 fabnum
> 0 vendor_err 135
> Jan 10 11:23:42 gssio2 mmfs: [E] VERBS RDMA closed connection to
> 172.41.2.1 (gssio1-fdr) on mlx5_1 port 1 fabnum 0 due to send error
> IBV_WC_RNR_RETRY_EXC_ERR index 3001
> Jan 10 11:23:43 gssio2 mmfs: [E] VERBS RDMA rdma send error
> IBV_WC_RNR_RETRY_EXC_ERR to 172.41.2.1 (gssio1-fdr) on mlx5_1 port 2 fabnum
> 0 vendor_err 135
> Jan 10 11:23:43 gssio2 mmfs: [E] VERBS RDMA closed connection to
> 172.41.2.1 (gssio1-fdr) on mlx5_1 port 2 fabnum 0 due to send error
> IBV_WC_RNR_RETRY_EXC_ERR index 2671
> Jan 10 11:23:43 gssio2 mmfs: [E] VERBS RDMA rdma send error
> IBV_WC_RNR_RETRY_EXC_ERR to 172.41.2.1 (gssio1-fdr) on mlx5_0 port 2 fabnum
> 0 vendor_err 135
> Jan 10 11:23:43 gssio2 mmfs: [E] VERBS RDMA closed connection to
> 172.41.2.1 (gssio1-fdr) on mlx5_0 port 2 fabnum 0 due to send error
> IBV_WC_RNR_RETRY_EXC_ERR index 2495
> Jan 10 11:23:44 gssio2 mmfs: [E] VERBS RDMA rdma send error
> IBV_WC_RNR_RETRY_EXC_ERR to 172.41.2.1 (gssio1-fdr) on mlx5_0 port 1 fabnum
> 0 vendor_err 135
> Jan 10 11:23:44 gssio2 mmfs: [E] VERBS RDMA closed connection to
> 172.41.2.1 (gssio1-fdr) on mlx5_0 port 1 fabnum 0 due to send error
> IBV_WC_RNR_RETRY_EXC_ERR index 3077
> Jan 10 11:24:11 gssio2 mmfs: [N] Node 172.41.2.1 (gssio1-fdr) lease
> renewal is overdue. Pinging to check if it is alive
>
> I've had multiple PMRs open for this issue, and I am told that our ESS
> needs code level upgrades in order to fix this issue. Looking at the
> errors, I think the issue is Infiniband related, and I am wondering if
> anyone on this list has seen similar issues?
>
> Thanks for your help in advance.
>
> Damir_______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>
>
>
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>
>
>
> Ellei edellä ole toisin mainittu: / Unless stated otherwise above:
> Oy IBM Finland Ab
> PL 265, 00101 Helsinki, Finland
> Business ID, Y-tunnus: 0195876-3
> Registered in Finland
>
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20170111/a677cdbb/attachment-0002.htm>