[gpfsug-discuss] CES doesn't assign addresses to nodes

Jonathon A Anderson jonathon.anderson at colorado.edu
Tue Jan 31 15:13:34 GMT 2017


The tail isn’t the issue; that’ my addition, so that I didn’t have to paste the hundred or so line nodelist into the thread.

The actual command is

tsctl shownodes up | $tr ',' '\n' | $sort -o $upnodefile

But you can see in my tailed output that the last hostname listed is cut-off halfway through the hostname. Less obvious in the example, but true, is the fact that it’s only showing the first 120 hosts, when we have 403 nodes in our gpfs cluster.

[root at sgate2 ~]# tsctl shownodes up | tr ',' '\n' | wc -l
120

[root at sgate2 ~]# mmlscluster | grep '\-opa' | wc -l
403

Perhaps more explicitly, it looks like `tsctl shownodes up` can only transmit 3983 characters.

[root at sgate2 ~]# tsctl shownodes up | wc -c
3983

Again, I’m convinced this is a bug not only because the command doesn’t actually produce a list of all of the up nodes in our cluster; but because the last name listed is incomplete.

[root at sgate2 ~]# tsctl shownodes up | tr ',' '\n' | tail -n 1
shas0260-opa.rc.int.col[root at sgate2 ~]#

I’d continue my investigation within tsctl itself but, alas, it’s a binary with no source code available to me. :)

I’m trying to get this opened as a bug / PMR; but I’m still working through the DDN support infrastructure. Thanks for reporting it, though.

For the record:

[root at sgate2 ~]# rpm -qa | grep -i gpfs
gpfs.base-4.2.1-2.x86_64
gpfs.msg.en_US-4.2.1-2.noarch
gpfs.gplbin-3.10.0-327.el7.x86_64-4.2.1-0.x86_64
gpfs.gskit-8.0.50-57.x86_64
gpfs.gpl-4.2.1-2.noarch
nfs-ganesha-gpfs-2.3.2-0.ibm24.el7.x86_64
gpfs.ext-4.2.1-2.x86_64
gpfs.gplbin-3.10.0-327.36.3.el7.x86_64-4.2.1-2.x86_64
gpfs.docs-4.2.1-2.noarch

~jonathon


From: <gpfsug-discuss-bounces at spectrumscale.org> on behalf of Olaf Weiser <olaf.weiser at de.ibm.com>
Reply-To: gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
Date: Tuesday, January 31, 2017 at 1:30 AM
To: gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
Subject: Re: [gpfsug-discuss] CES doesn't assign addresses to nodes

Hi ...same thing here.. everything after 10 nodes will be truncated..
though I don't have an issue with it ... I 'll open a PMR .. and I recommend you to do the same thing.. ;-)

the reason seems simple.. it is the "| tail" .at the end of the command.. .. which truncates the output to the last 10 items...

should be easy to fix..
cheers
olaf





From:        Jonathon A Anderson <jonathon.anderson at colorado.edu>
To:        "gpfsug-discuss at spectrumscale.org" <gpfsug-discuss at spectrumscale.org>
Date:        01/30/2017 11:11 PM
Subject:        Re: [gpfsug-discuss] CES doesn't assign addresses to nodes
Sent by:        gpfsug-discuss-bounces at spectrumscale.org
________________________________



In trying to figure this out on my own, I’m relatively certain I’ve found a bug in GPFS related to the truncation of output from `tsctl shownodes up`. Any chance someone in development can confirm?


Here are the details of my investigation:


## GPFS is up on sgate2

[root at sgate2 ~]# mmgetstate

Node number  Node name        GPFS state
------------------------------------------
    414      sgate2-opa       active


## but if I tell ces to explicitly put one of our ces addresses on that node, it says that GPFS is down

[root at sgate2 ~]# mmces address move --ces-ip 10.225.71.102 --ces-node sgate2-opa
mmces address move: GPFS is down on this node.
mmces address move: Command failed. Examine previous error messages to determine cause.


## the “GPFS is down on this node” message is defined as code 109 in mmglobfuncs

[root at sgate2 ~]# grep --before-context=1 "GPFS is down on this node." /usr/lpp/mmfs/bin/mmglobfuncs
   109 ) msgTxt=\
"%s: GPFS is down on this node."


## and is generated by printErrorMsg in mmcesnetmvaddress when it detects that the current node is identified as “down” by getDownCesNodeList

[root at sgate2 ~]# grep --before-context=5 'printErrorMsg 109' /usr/lpp/mmfs/bin/mmcesnetmvaddress
 downNodeList=$(getDownCesNodeList)
 for downNode in $downNodeList
 do
   if [[ $toNodeName == $downNode ]]
   then
     printErrorMsg 109 "$mmcmd"


## getDownCesNodeList is the intersection of all ces nodes with GPFS cluster nodes listed in `tsctl shownodes up`

[root at sgate2 ~]# grep --after-context=16 '^function getDownCesNodeList' /usr/lpp/mmfs/bin/mmcesfuncs
function getDownCesNodeList
{
 typeset sourceFile="mmcesfuncs.sh"
 [[ -n $DEBUG || -n $DEBUGgetDownCesNodeList ]] && set -x
 $mmTRACE_ENTER "$*"

 typeset upnodefile=${cmdTmpDir}upnodefile
 typeset downNodeList

 # get all CES nodes
 $sort -o $nodefile $mmfsCesNodes.dae

 $tsctl shownodes up | $tr ',' '\n' | $sort -o $upnodefile

 downNodeList=$($comm -23 $nodefile $upnodefile)
 print -- $downNodeList
}  #----- end of function getDownCesNodeList --------------------


## but not only are the sgate nodes not listed by `tsctl shownodes up`; its output is obviously and erroneously truncated

[root at sgate2 ~]# tsctl shownodes up | tr ',' '\n' | tail
shas0251-opa.rc.int.colorado.edu
shas0252-opa.rc.int.colorado.edu
shas0253-opa.rc.int.colorado.edu
shas0254-opa.rc.int.colorado.edu
shas0255-opa.rc.int.colorado.edu
shas0256-opa.rc.int.colorado.edu
shas0257-opa.rc.int.colorado.edu
shas0258-opa.rc.int.colorado.edu
shas0259-opa.rc.int.colorado.edu
shas0260-opa.rc.int.col[root at sgate2 ~]#


## I expect that this is a bug in GPFS, likely related to a maximum output buffer for `tsctl shownodes up`.



On 1/24/17, 12:48 PM, "Jonathon A Anderson" <jonathon.anderson at colorado.edu> wrote:

   I think I'm having the same issue described here:

   http://www.spectrumscale.org/pipermail/gpfsug-discuss/2016-October/002288.html

   Any advice or further troubleshooting steps would be much appreciated. Full disclosure: I also have a DDN case open. (78804)

   We've got a four-node (snsd{1..4}) DDN gridscaler system. I'm trying to add two CES protocol nodes (sgate{1,2}) to serve NFS.

   Here's the steps I took:

   ---
   mmcrnodeclass protocol -N sgate1-opa,sgate2-opa
   mmcrnodeclass nfs -N sgate1-opa,sgate2-opa
   mmchconfig cesSharedRoot=/gpfs/summit/ces
   mmchcluster --ccr-enable
   mmchnode --ces-enable -N protocol
   mmces service enable NFS
   mmces service start NFS -N nfs
   mmces address add --ces-ip 10.225.71.104,10.225.71.105
   mmces address policy even-coverage
   mmces address move --rebalance
   ---

   This worked the very first time I ran it, but the CES addresses weren't re-distributed after restarting GPFS or a node reboot.

   Things I've tried:

   * disabling ces on the sgate nodes and re-running the above procedure
   * moving the cluster and filesystem managers to different snsd nodes
   * deleting and re-creating the cesSharedRoot directory

   Meanwhile, the following log entry appears in mmfs.log.latest every ~30s:

   ---
   Mon Jan 23 20:31:20 MST 2017: mmcesnetworkmonitor: Found unassigned address 10.225.71.104
   Mon Jan 23 20:31:20 MST 2017: mmcesnetworkmonitor: Found unassigned address 10.225.71.105
   Mon Jan 23 20:31:20 MST 2017: mmcesnetworkmonitor: handleNetworkProblem with lock held: assignIP 10.225.71.104_0-_+,10.225.71.105_0-_+ 1
   Mon Jan 23 20:31:20 MST 2017: mmcesnetworkmonitor: Assigning addresses: 10.225.71.104_0-_+,10.225.71.105_0-_+
   Mon Jan 23 20:31:20 MST 2017: mmcesnetworkmonitor: moveCesIPs: 10.225.71.104_0-_+,10.225.71.105_0-_+
   ---

   Also notable, whenever I add or remove addresses now, I see this in mmsysmonitor.log (among a lot of other entries):

   ---
   2017-01-23T20:40:56.363 sgate1 D ET_cesnetwork Entity state without requireUnique: ces_network_ips_down WARNING No CES relevant NICs detected - Service.calculateAndUpdateState:275
   2017-01-23T20:40:11.364 sgate1 D ET_cesnetwork Update multiple entities at once {'p2p2': 1, 'bond0': 1, 'p2p1': 1} - Service.setLocalState:333
   ---

   For the record, here's the interface I expect to get the address on sgate1:

   ---
   11: bond0: <BROADCAST,MULTICAST,MASTER,UP,LOWER_UP> mtu 9000 qdisc noqueue state UP
   link/ether 3c:fd:fe:08:a7:c0 brd ff:ff:ff:ff:ff:ff
   inet 10.225.71.107/20 brd 10.225.79.255 scope global bond0
   valid_lft forever preferred_lft forever
   inet6 fe80::3efd:feff:fe08:a7c0/64 scope link
   valid_lft forever preferred_lft forever
   ---

   which is a bond of p2p1 and p2p2.

   ---
   6: p2p1: <BROADCAST,MULTICAST,SLAVE,UP,LOWER_UP> mtu 9000 qdisc mq master bond0 state UP qlen 1000
   link/ether 3c:fd:fe:08:a7:c0 brd ff:ff:ff:ff:ff:ff
   7: p2p2: <BROADCAST,MULTICAST,SLAVE,UP,LOWER_UP> mtu 9000 qdisc mq master bond0 state UP qlen 1000
   link/ether 3c:fd:fe:08:a7:c0 brd ff:ff:ff:ff:ff:ff
   ---

   A similar bond0 exists on sgate2.

   I crawled around in /usr/lpp/mmfs/lib/mmsysmon/CESNetworkService.py for a while trying to figure it out, but have been unsuccessful so far.



_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss




-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20170131/a256b271/attachment-0002.htm>


More information about the gpfsug-discuss mailing list