<font size=2 face="sans-serif">ok.. so obviously ... it seems , that we

have several issues.. </font><br><font size=2 face="sans-serif">the 3983 characters is obviously a defect

 </font><br><font size=2 face="sans-serif">have you already raised a PMR , if so

, can you send me the number ?</font><br><br><br><br><br><font size=1 color=#5f5f5f face="sans-serif">From:      

 </font><font size=1 face="sans-serif">Jonathon A Anderson

<jonathon.anderson@colorado.edu></font><br><font size=1 color=#5f5f5f face="sans-serif">To:      

 </font><font size=1 face="sans-serif">gpfsug main discussion

list <gpfsug-discuss@spectrumscale.org></font><br><font size=1 color=#5f5f5f face="sans-serif">Date:      

 </font><font size=1 face="sans-serif">01/31/2017 04:14 PM</font><br><font size=1 color=#5f5f5f face="sans-serif">Subject:    

   </font><font size=1 face="sans-serif">Re: [gpfsug-discuss]

CES doesn't assign addresses to nodes</font><br><font size=1 color=#5f5f5f face="sans-serif">Sent by:    

   </font><font size=1 face="sans-serif">gpfsug-discuss-bounces@spectrumscale.org</font><br><hr noshade><br><br><br><font size=2 face="Calibri">The tail isn’t the issue; that’ my addition,

so that I didn’t have to paste the hundred or so line nodelist into the

thread.</font><br><font size=2 face="Calibri"> </font><br><font size=2 face="Calibri">The actual command is</font><br><font size=2 face="Calibri"> </font><br><font size=2 face="Calibri">tsctl shownodes up | $tr ',' '\n' | $sort

-o $upnodefile</font><br><font size=2 face="Calibri"> </font><br><font size=2 face="Calibri">But you can see in my tailed output that

the last hostname listed is cut-off halfway through the hostname. Less

obvious in the example, but true, is the fact that it’s only showing the

first 120 hosts, when we have 403 nodes in our gpfs cluster.</font><br><font size=2 face="Calibri"> </font><br><font size=2 face="Calibri">[root@sgate2 ~]# tsctl shownodes up | tr

',' '\n' | wc -l</font><br><font size=2 face="Calibri">120</font><br><font size=2 face="Calibri"> </font><br><font size=2 face="Calibri">[root@sgate2 ~]# mmlscluster | grep '\-opa'

| wc -l</font><br><font size=2 face="Calibri">403</font><br><font size=2 face="Calibri"> </font><br><font size=2 face="Calibri">Perhaps more explicitly, it looks like

`tsctl shownodes up` can only transmit 3983 characters.</font><br><font size=2 face="Calibri"> </font><br><font size=2 face="Calibri">[root@sgate2 ~]# tsctl shownodes up | wc

-c</font><br><font size=2 face="Calibri">3983</font><br><font size=2 face="Calibri"> </font><br><font size=2 face="Calibri">Again, I’m convinced this is a bug not

only because the command doesn’t actually produce a list of all of the

up nodes in our cluster; but because the last name listed is incomplete.</font><br><font size=2 face="Calibri"> </font><br><font size=2 face="Calibri">[root@sgate2 ~]# tsctl shownodes up | tr

',' '\n' | tail -n 1</font><br><font size=2 face="Calibri">shas0260-opa.rc.int.col[root@sgate2 ~]#</font><br><font size=2 face="Calibri"> </font><br><font size=2 face="Calibri">I’d continue my investigation within tsctl

itself but, alas, it’s a binary with no source code available to me. :)</font><br><font size=2 face="Calibri"> </font><br><font size=2 face="Calibri">I’m trying to get this opened as a bug

/ PMR; but I’m still working through the DDN support infrastructure. Thanks

for reporting it, though.</font><br><font size=2 face="Calibri"> </font><br><font size=2 face="Calibri">For the record:</font><br><font size=2 face="Calibri"> </font><br><font size=2 face="Calibri">[root@sgate2 ~]# rpm -qa | grep -i gpfs</font><br><font size=2 face="Calibri"><b>gpfs</b>.base-4.2.1-2.x86_64</font><br><font size=2 face="Calibri"><b>gpfs</b>.msg.en_US-4.2.1-2.noarch</font><br><font size=2 face="Calibri"><b>gpfs</b>.gplbin-3.10.0-327.el7.x86_64-4.2.1-0.x86_64</font><br><font size=2 face="Calibri"><b>gpfs</b>.gskit-8.0.50-57.x86_64</font><br><font size=2 face="Calibri"><b>gpfs</b>.gpl-4.2.1-2.noarch</font><br><font size=2 face="Calibri">nfs-ganesha-<b>gpfs</b>-2.3.2-0.ibm24.el7.x86_64</font><br><font size=2 face="Calibri"><b>gpfs</b>.ext-4.2.1-2.x86_64</font><br><font size=2 face="Calibri"><b>gpfs</b>.gplbin-3.10.0-327.36.3.el7.x86_64-4.2.1-2.x86_64</font><br><font size=2 face="Calibri"><b>gpfs</b>.docs-4.2.1-2.noarch</font><br><font size=2 face="Calibri"> </font><br><font size=2 face="Calibri">~jonathon</font><br><font size=2 face="Calibri"> </font><br><font size=2 face="Calibri"> </font><br><font size=3 face="Calibri"><b>From: </b><gpfsug-discuss-bounces@spectrumscale.org>

on behalf of Olaf Weiser <olaf.weiser@de.ibm.com><b><br>Reply-To: </b>gpfsug main discussion list <gpfsug-discuss@spectrumscale.org><b><br>Date: </b>Tuesday, January 31, 2017 at 1:30 AM<b><br>To: </b>gpfsug main discussion list <gpfsug-discuss@spectrumscale.org><b><br>Subject: </b>Re: [gpfsug-discuss] CES doesn't assign addresses to nodes</font><br><font size=3 face="Times New Roman"> </font><br><font size=2 face="sans-serif">Hi ...same thing here.. everything after

10 nodes will be truncated.. <br>though I don't have an issue with it ... I 'll open a PMR .. and I recommend

you to do the same thing.. ;-) </font><font size=3 face="Times New Roman"><br></font><font size=2 face="sans-serif"><br>the reason seems simple.. it is the <i>"| tail" .</i>at the end

of the command.. .. which truncates the output to the last 10 items...

</font><font size=3 face="Times New Roman"><br></font><font size=2 face="sans-serif"><br>should be easy to fix.. <br>cheers<br>olaf</font><font size=3 face="Times New Roman"><br><br><br><br><br></font><font size=1 color=#5f5f5f face="sans-serif"><br>From:        </font><font size=1 face="sans-serif">Jonathon

A Anderson <jonathon.anderson@colorado.edu></font><font size=1 color=#5f5f5f face="sans-serif"><br>To:        </font><font size=1 face="sans-serif">"gpfsug-discuss@spectrumscale.org"

<gpfsug-discuss@spectrumscale.org></font><font size=1 color=#5f5f5f face="sans-serif"><br>Date:        </font><font size=1 face="sans-serif">01/30/2017

11:11 PM</font><font size=1 color=#5f5f5f face="sans-serif"><br>Subject:        </font><font size=1 face="sans-serif">Re:

[gpfsug-discuss] CES doesn't assign addresses to nodes</font><font size=1 color=#5f5f5f face="sans-serif"><br>Sent by:        </font><font size=1 face="sans-serif">gpfsug-discuss-bounces@spectrumscale.org</font><div align=center><hr noshade></div><br><font size=3 face="Times New Roman"><br><br></font><font size=2><br>In trying to figure this out on my own, I’m relatively certain I’ve found

a bug in GPFS related to the truncation of output from `tsctl shownodes

up`. Any chance someone in development can confirm?<br><br><br>Here are the details of my investigation:<br><br><br>## GPFS is up on sgate2<br><br>[root@sgate2 ~]# mmgetstate<br><br>Node number  Node name        GPFS state <br>------------------------------------------<br>    414      sgate2-opa       active<br><br><br>## but if I tell ces to explicitly put one of our ces addresses on that

node, it says that GPFS is down<br><br>[root@sgate2 ~]# mmces address move --ces-ip 10.225.71.102 --ces-node sgate2-opa<br>mmces address move: GPFS is down on this node.<br>mmces address move: Command failed. Examine previous error messages to

determine cause.<br><br><br>## the “GPFS is down on this node” message is defined as code 109 in

mmglobfuncs<br><br>[root@sgate2 ~]# grep --before-context=1 "GPFS is down on this node."

/usr/lpp/mmfs/bin/mmglobfuncs<br>   109 ) msgTxt=\<br>"%s: GPFS is down on this node."<br><br><br>## and is generated by printErrorMsg in mmcesnetmvaddress when it detects

that the current node is identified as “down” by getDownCesNodeList<br><br>[root@sgate2 ~]# grep --before-context=5 'printErrorMsg 109' /usr/lpp/mmfs/bin/mmcesnetmvaddress<br> downNodeList=$(getDownCesNodeList)<br> for downNode in $downNodeList<br> do<br>   if [[ $toNodeName == $downNode ]]<br>   then<br>     printErrorMsg 109 "$mmcmd"<br><br><br>## getDownCesNodeList is the intersection of all ces nodes with GPFS cluster

nodes listed in `tsctl shownodes up`<br><br>[root@sgate2 ~]# grep --after-context=16 '^function getDownCesNodeList'

/usr/lpp/mmfs/bin/mmcesfuncs<br>function getDownCesNodeList<br>{<br> typeset sourceFile="mmcesfuncs.sh"<br> [[ -n $DEBUG || -n $DEBUGgetDownCesNodeList ]] && set -x<br> $mmTRACE_ENTER "$*"<br><br> typeset upnodefile=${cmdTmpDir}upnodefile<br> typeset downNodeList<br><br> # get all CES nodes<br> $sort -o $nodefile $mmfsCesNodes.dae<br><br> $tsctl shownodes up | $tr ',' '\n' | $sort -o $upnodefile<br><br> downNodeList=$($comm -23 $nodefile $upnodefile)<br> print -- $downNodeList<br>}  #----- end of function getDownCesNodeList --------------------<br><br><br>## but not only are the sgate nodes not listed by `tsctl shownodes up`;

its output is obviously and erroneously truncated<br><br>[root@sgate2 ~]# tsctl shownodes up | tr ',' '\n' | tail<br>shas0251-opa.rc.int.colorado.edu<br>shas0252-opa.rc.int.colorado.edu<br>shas0253-opa.rc.int.colorado.edu<br>shas0254-opa.rc.int.colorado.edu<br>shas0255-opa.rc.int.colorado.edu<br>shas0256-opa.rc.int.colorado.edu<br>shas0257-opa.rc.int.colorado.edu<br>shas0258-opa.rc.int.colorado.edu<br>shas0259-opa.rc.int.colorado.edu<br>shas0260-opa.rc.int.col[root@sgate2 ~]#<br><br><br>## I expect that this is a bug in GPFS, likely related to a maximum output

buffer for `tsctl shownodes up`.<br><br><br><br>On 1/24/17, 12:48 PM, "Jonathon A Anderson" <jonathon.anderson@colorado.edu>

wrote:    I think I'm having the same issue described here:        <a href="http://www.spectrumscale.org/pipermail/gpfsug-discuss/2016-October/002288.html"><font size=2 color=blue>http://www.spectrumscale.org/pipermail/gpfsug-discuss/2016-October/002288.html</a><font size=2>        Any advice or further troubleshooting steps would be much appreciated.

Full disclosure: I also have a DDN case open. (78804)<br>   <br>   We've got a four-node (snsd{1..4}) DDN gridscaler system. I'm trying

to add two CES protocol nodes (sgate{1,2}) to serve NFS. <br>   <br>   Here's the steps I took: <br>   <br>   --- <br>   mmcrnodeclass protocol -N sgate1-opa,sgate2-opa <br>   mmcrnodeclass nfs -N sgate1-opa,sgate2-opa <br>   mmchconfig cesSharedRoot=/gpfs/summit/ces <br>   mmchcluster --ccr-enable <br>   mmchnode --ces-enable -N protocol <br>   mmces service enable NFS <br>   mmces service start NFS -N nfs <br>   mmces address add --ces-ip 10.225.71.104,10.225.71.105 <br>   mmces address policy even-coverage <br>   mmces address move --rebalance <br>   --- <br>   <br>   This worked the very first time I ran it, but the CES addresses

weren't re-distributed after restarting GPFS or a node reboot. <br>   <br>   Things I've tried: <br>   <br>   * disabling ces on the sgate nodes and re-running the above procedure

<br>   * moving the cluster and filesystem managers to different snsd

nodes <br>   * deleting and re-creating the cesSharedRoot directory <br>   <br>   Meanwhile, the following log entry appears in mmfs.log.latest every

~30s: <br>   <br>   --- <br>   Mon Jan 23 20:31:20 MST 2017: mmcesnetworkmonitor: Found unassigned

address 10.225.71.104 <br>   Mon Jan 23 20:31:20 MST 2017: mmcesnetworkmonitor: Found unassigned

address 10.225.71.105 <br>   Mon Jan 23 20:31:20 MST 2017: mmcesnetworkmonitor: handleNetworkProblem

with lock held: assignIP 10.225.71.104_0-_+,10.225.71.105_0-_+ 1 <br>   Mon Jan 23 20:31:20 MST 2017: mmcesnetworkmonitor: Assigning addresses:

10.225.71.104_0-_+,10.225.71.105_0-_+ <br>   Mon Jan 23 20:31:20 MST 2017: mmcesnetworkmonitor: moveCesIPs:

10.225.71.104_0-_+,10.225.71.105_0-_+ <br>   --- <br>   <br>   Also notable, whenever I add or remove addresses now, I see this

in mmsysmonitor.log (among a lot of other entries): <br>   <br>   --- <br>   2017-01-23T20:40:56.363 sgate1 D ET_cesnetwork Entity state without

requireUnique: ces_network_ips_down WARNING No CES relevant NICs detected

- Service.calculateAndUpdateState:275 <br>   2017-01-23T20:40:11.364 sgate1 D ET_cesnetwork Update multiple

entities at once {'p2p2': 1, 'bond0': 1, 'p2p1': 1} - Service.setLocalState:333

<br>   --- <br>   <br>   For the record, here's the interface I expect to get the address

on sgate1: <br>   <br>   --- <br>   11: bond0: <BROADCAST,MULTICAST,MASTER,UP,LOWER_UP> mtu 9000

qdisc noqueue state UP <br>   link/ether 3c:fd:fe:08:a7:c0 brd ff:ff:ff:ff:ff:ff <br>   inet 10.225.71.107/20 brd 10.225.79.255 scope global bond0 <br>   valid_lft forever preferred_lft forever <br>   inet6 fe80::3efd:feff:fe08:a7c0/64 scope link <br>   valid_lft forever preferred_lft forever <br>   --- <br>   <br>   which is a bond of p2p1 and p2p2. <br>   <br>   --- <br>   6: p2p1: <BROADCAST,MULTICAST,SLAVE,UP,LOWER_UP> mtu 9000

qdisc mq master bond0 state UP qlen 1000 <br>   link/ether 3c:fd:fe:08:a7:c0 brd ff:ff:ff:ff:ff:ff <br>   7: p2p2: <BROADCAST,MULTICAST,SLAVE,UP,LOWER_UP> mtu 9000

qdisc mq master bond0 state UP qlen 1000 <br>   link/ether 3c:fd:fe:08:a7:c0 brd ff:ff:ff:ff:ff:ff <br>   --- <br>   <br>   A similar bond0 exists on sgate2. <br>   <br>   I crawled around in /usr/lpp/mmfs/lib/mmsysmon/CESNetworkService.py

for a while trying to figure it out, but have been unsuccessful so far.<br>   <br>   <br><br>_______________________________________________<br>gpfsug-discuss mailing list<br>gpfsug-discuss at spectrumscale.org</font><font size=3 color=blue face="Times New Roman"><u><br></u></font><a href="http://gpfsug.org/mailman/listinfo/gpfsug-discuss"><font size=2 color=blue><u>http://gpfsug.org/mailman/listinfo/gpfsug-discuss</u></font></a><font size=3 face="Times New Roman"><br><br><br><br></font><tt><font size=2>_______________________________________________<br>gpfsug-discuss mailing list<br>gpfsug-discuss at spectrumscale.org<br></font></tt><a href="http://gpfsug.org/mailman/listinfo/gpfsug-discuss"><tt><font size=2>http://gpfsug.org/mailman/listinfo/gpfsug-discuss</font></tt></a><tt><font size=2><br></font></tt><br><br><BR>