<html>
<head>
<meta content="text/html; charset=windows-1252"
http-equiv="Content-Type">
</head>
<body text="#000000" bgcolor="#FFFFFF">
<font size="-1">Hello,<br>
no, still didn't anything because we have to drain 2PB data , into
a slower storage.. so it will take few weeks. I expect doing it
the second half of August.<br>
Will let you all know the results once done and properly tested.<br>
<br>
Salvatore <br>
</font><br>
<div class="moz-cite-prefix">On 22/07/15 13:58, Muhammad Habib
wrote:<br>
</div>
<blockquote
cite="mid:CANLvgREUbru2OKdgorXanE3TQBZ0QZCrGkyJ8GX9C=0QP2yLqg@mail.gmail.com"
type="cite">
<div dir="ltr">
<div>did you implement it ? looks ok. All daemon traffic
should be going through black network including inter-cluster
daemon traffic ( assume black subnet routable). All data
traffic should be going through the blue network. You may
need to run iptrace or tcpdump to make sure proper network are
in use. You can always open a PMR if you having issue during
the configuration . <br>
<br>
</div>
Thanks<br>
</div>
<div class="gmail_extra"><br>
<div class="gmail_quote">On Wed, Jul 15, 2015 at 5:19 AM,
Salvatore Di Nardo <span dir="ltr"><<a
moz-do-not-send="true" href="mailto:sdinardo@ebi.ac.uk"
target="_blank">sdinardo@ebi.ac.uk</a>></span> wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0
.8ex;border-left:1px #ccc solid;padding-left:1ex">
<div text="#000000" bgcolor="#FFFFFF"> Thanks for the
input.. this is actually very interesting! <br>
<br>
Reading here: <a moz-do-not-send="true"
href="https://www.ibm.com/developerworks/community/wikis/home?lang=en#%21/wiki/General+Parallel+File+System+%28GPFS%29/page/GPFS+Network+Communication+Overview"
target="_blank">https://www.ibm.com/developerworks/community/wikis/home?lang=en#!/wiki/General+Parallel+File+System+%28GPFS%29/page/GPFS+Network+Communication+Overview</a>
, <br>
specifically the " Using more than one network" part it
seems to me that this way we should be able to split the
lease/token/ping from the data.<br>
<br>
Supposing that I implement a GSS cluster with only NDS and
a second cluster with only clients:<br>
<br>
<img alt="" src="cid:part3.01090508.02020908@ebi.ac.uk"
height="440" width="776"><br>
<br>
As far i understood if on the NDS cluster add first the
subnet <a moz-do-not-send="true"
href="http://10.20.0.0/16" target="_blank">10.20.0.0/16</a>
and then 10.30.0.0 is should use the internal network for
all the node-to-node comunication, leaving the <a
moz-do-not-send="true" href="http://10.30.0.0/30"
target="_blank">10.30.0.0/30</a> only for data traffic
witht he remote cluster ( the clients). Similarly, in the
client cluster, adding first <a moz-do-not-send="true"
href="http://10.10.0.0/16" target="_blank">10.10.0.0/16</a>
and then 10.30.0.0, will guarantee than the node-to-node
comunication pass trough a different interface there the
data is passing. Since the client are just "clients" the
traffic trough <a moz-do-not-send="true"
href="http://10.10.0.0/16" target="_blank">10.10.0.0/16</a>
should be minimal (only token ,lease, ping and so on ) and
not affected by the rest. Should be possible at this point
move aldo the "admin network" on the internal interface,
so we effectively splitted all the "non data" traffic on a
dedicated interface.<br>
<br>
I'm wondering if I'm missing something, and in case i
didn't, what could be the real traffic in the internal
(black) networks ( 1g link its fine or i still need 10g
for that). Another thing I I'm wondering its the load of
the "non data" traffic between the clusters.. i suppose
some "daemon traffic" goes trough the blue interface for
the inter-cluster communication. <br>
<br>
<br>
Any thoughts ?<br>
<br>
Salvatore<br>
<br>
<div>On 13/07/15 18:19, Muhammad Habib wrote:<br>
</div>
<blockquote type="cite">
<div dir="ltr">
<div>
<div>Did you look at "subnets" parameter used with
"mmchconfig" command. I think you can use order
list of subnets for daemon communication and then
actual daemon interface can be used for data
transfer. When the GPFS will start it will use
actual daemon interface for communication ,
however , once its started , it will use the IPs
from the subnet list whichever coming first in the
list. To further validate , you can put network
sniffer before you do actual implementation or
alternatively you can open a PMR with IBM. <br>
<br>
</div>
If your cluster having expel situation , you may
fine tune your cluster e.g. increase ping timeout
period , having multiple NSD servers and
distributing filesystems across these NSD servers.
Also critical servers can have HBA cards installed
for direct I/O through fiber. <br>
<br>
</div>
Thanks<br>
</div>
<div class="gmail_extra"><br>
<div class="gmail_quote">On Mon, Jul 13, 2015 at 11:22
AM, Jason Hick <span dir="ltr"><<a
moz-do-not-send="true"
href="mailto:jhick@lbl.gov" target="_blank">jhick@lbl.gov</a>></span>
wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0
.8ex;border-left:1px #ccc solid;padding-left:1ex">
<div dir="auto">
<div>Hi,</div>
<div><br>
</div>
<div>Yes having separate data and management
networks has been critical for us for keeping
health monitoring/communication unimpeded by
data movement.</div>
<div><br>
</div>
<div>Not as important, but you can also tune the
networks differently (packet sizes, buffer
sizes, SAK, etc) which can help.</div>
<div><br>
</div>
<div>Jason</div>
<div><br>
On Jul 13, 2015, at 7:25 AM, Vic Cornell <<a
moz-do-not-send="true"
href="mailto:viccornell@gmail.com"
target="_blank">viccornell@gmail.com</a>>
wrote:<br>
<br>
</div>
<blockquote type="cite">
<div>Hi Salvatore,
<div><br>
</div>
<div>I agree that that is what the manual -
and some of the wiki entries say.</div>
<div><br>
</div>
<div>However , when we have had problems
(typically congestion) with ethernet
networks in the past (20GbE or 40GbE) we
have resolved them by setting up a
separate “Admin” network.</div>
<div><br>
</div>
<div>The before and after cluster health we
have seen measured in number of expels and
waiters has been very marked.</div>
<div><br>
</div>
<div>Maybe someone “in the know” could
comment on this split.</div>
<div><br>
</div>
<div>Regards,</div>
<div><br>
</div>
<div>Vic</div>
<div><br>
<br>
<div>
<blockquote type="cite">
<div>On 13 Jul 2015, at 14:29,
Salvatore Di Nardo <<a
moz-do-not-send="true"
href="mailto:sdinardo@ebi.ac.uk"
target="_blank">sdinardo@ebi.ac.uk</a>>
wrote:</div>
<br>
<div>
<div text="#000000"
bgcolor="#FFFFFF"> <font
size="-1">Hello Vic.<br>
We are currently draining our
gpfs to do all the recabling to
add a management network, but
looking what the admin interface
does ( man mmchnode ) it says
something different:<br>
<br>
</font>
<blockquote>
<blockquote><big><font size="-1"><big><tt>--admin-interface={hostname
| ip_address}</tt></big></font></big><br>
<big><font size="-1"><big><tt>
Specifies the name of
the node to be used by
GPFS administration
commands when
communicating between
nodes. The admin node
name must be specified
as an IP</tt></big></font></big><br>
<big><font size="-1"><big><tt>
address or a hostname
that is resolved by
the host command to
the desired IP
address. If the
keyword DEFAULT is
specified, the admin
interface for the</tt></big></font></big><br>
<big><font size="-1"><big><tt>
node is set to be
equal to the daemon
interface for the
node.</tt></big></font></big><br>
</blockquote>
</blockquote>
<font size="-1"><br>
So, seems used only for commands
propagation, hence have nothing
to do with the node-to-node
traffic. Infact the other
interface description is:<br>
</font><big><font size="-1"><big><tt><br>
</tt></big></font></big>
<blockquote>
<blockquote><big><font size="-1"><big><tt> --daemon-interface={hostname
| ip_address}</tt></big></font></big><br>
<big><font size="-1"><big><tt>
Specifies the host
name or IP address </tt><tt><u><b>to
be used by the
GPFS daemons for
node-to-node
communication</b></u></tt><tt>.
The host name or IP
address must refer to
the commu-</tt></big></font></big><br>
<big><font size="-1"><big><tt>
nication adapter over
which the GPFS daemons
communicate. Alias
interfaces are not
allowed. Use the
original address or a
name that is
resolved by the</tt></big></font></big><br>
<big><font size="-1"><big><tt>
host command to that
original address.</tt></big></font></big></blockquote>
</blockquote>
<font size="-1"><br>
The "expired lease" issue and
file locking mechanism a( most
of our expells happens when 2
clients try to write in the same
file) are exactly node-to
node-comunication, so im
wondering what's the point to
separate the "admin network". I
want to be sure to plan the
right changes before we do a so
massive task. We are talking
about adding a new interface on
700 clients, so the recabling
work its not small. <br>
<br>
<br>
Regards,<br>
Salvatore<br>
<br>
<br>
</font><br>
<div>On 13/07/15 14:00, Vic
Cornell wrote:<br>
</div>
<blockquote type="cite"> Hi
Salavatore,
<div><br>
</div>
<div><span
style="white-space:pre-wrap"></span>Does
your GSS have the facility for
a 1GbE “management” network?
If so I think that changing
the “admin” node names of the
cluster members to a set of
IPs on the management network
would give you the split that
you need.</div>
<div><br>
</div>
<div>What about the clients? Can
they also connect to a
separate admin network?</div>
<div><br>
</div>
<div>Remember that if you are
using multi-cluster all of the
nodes in both networks must
share the same admin network.</div>
<div>
<div>
<div><br>
</div>
</div>
</div>
<div>
<div
style="font-family:Helvetica;font-size:14px;font-style:normal;font-variant:normal;font-weight:normal;letter-spacing:normal;line-height:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px">
<span>Kind Regards,</span></div>
<div
style="font-family:Helvetica;font-size:14px;font-style:normal;font-variant:normal;font-weight:normal;letter-spacing:normal;line-height:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px">
<span><br>
</span></div>
<div
style="font-family:Helvetica;font-size:14px;font-style:normal;font-variant:normal;font-weight:normal;letter-spacing:normal;line-height:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px">
<span>Vic</span></div>
<span
style="font-family:Helvetica;font-size:14px;font-style:normal;font-variant:normal;font-weight:normal;letter-spacing:normal;line-height:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px"><br>
</span><span></span> </div>
<br>
<div>
<blockquote type="cite">
<div>On 13 Jul 2015, at
13:31, Salvatore Di Nardo
<<a
moz-do-not-send="true"
href="mailto:sdinardo@ebi.ac.uk"
target="_blank">sdinardo@ebi.ac.uk</a>>
wrote:</div>
<br>
<div>
<div text="#000000"
bgcolor="#FFFFFF"><font
size="-1">Anyone? </font>
<br>
<br>
<div>On 10/07/15 11:07,
Salvatore Di Nardo
wrote:<br>
</div>
<blockquote type="cite"><font
size="-1">Hello
guys.<br>
Quite a while ago i
mentioned that we
have a big expel
issue on our gss (
first gen) and white
a lot people
suggested that the
root cause could be
that we use the same
interface for all
the traffic, and
that we should split
the data network
from the admin
network. Finally we
could plan a
downtime and we are
migrating the data
out so, i can soon
safelly play with
the change, but
looking what exactly
i should to do i'm a
bit puzzled. Our
mmlscluster looks
like this:<br>
<br>
</font>
<blockquote>
<blockquote>
<blockquote><tt><font
size="-1">GPFS
cluster
information</font></tt><tt><br>
</tt><tt><font
size="-1">========================</font></tt><tt><br>
</tt><tt><font
size="-1">
GPFS cluster
name:
<a
moz-do-not-send="true"
href="http://gss.ebi.ac.uk/" target="_blank"> GSS.ebi.ac.uk</a></font></tt><tt><br>
</tt><tt><font
size="-1">
GPFS cluster
id:
17987981184946329605</font></tt><tt><br>
</tt><tt><font
size="-1">
GPFS UID
domain:
<a
moz-do-not-send="true"
href="http://gss.ebi.ac.uk/" target="_blank"> GSS.ebi.ac.uk</a></font></tt><tt><br>
</tt><tt><font
size="-1">
Remote shell
command:
/usr/bin/ssh</font></tt><tt><br>
</tt><tt><font
size="-1">
Remote file
copy command:
/usr/bin/scp</font></tt><tt><br>
</tt><tt><br>
</tt><tt><font
size="-1">GPFS
cluster
configuration
servers:</font></tt><tt><br>
</tt><tt><font
size="-1">-----------------------------------</font></tt><tt><br>
</tt><tt><font
size="-1">
Primary
server: <a
moz-do-not-send="true" href="http://gss01a.ebi.ac.uk/" target="_blank">
gss01a.ebi.ac.uk</a></font></tt><tt><br>
</tt><tt><font
size="-1">
Secondary
server: <a
moz-do-not-send="true"
href="http://gss02b.ebi.ac.uk/" target="_blank"> gss02b.ebi.ac.uk</a></font></tt><tt><br>
</tt><tt><br>
</tt><tt><font
size="-1"> Node
Daemon node
name IP
address Admin
node name
Designation</font></tt><tt><br>
</tt><tt><font
size="-1">-----------------------------------------------------------------------</font></tt><tt><br>
</tt><tt><font
size="-1">
1 <a
moz-do-not-send="true"
href="http://gss01a.ebi.ac.uk/" target="_blank"> gss01a.ebi.ac.uk</a>
10.7.28.2 <a
moz-do-not-send="true" href="http://gss01a.ebi.ac.uk/" target="_blank">gss01a.ebi.ac.uk</a>
quorum-manager</font></tt><tt><br>
</tt><tt><font
size="-1">
2 <a
moz-do-not-send="true"
href="http://gss01b.ebi.ac.uk/" target="_blank"> gss01b.ebi.ac.uk</a>
10.7.28.3 <a
moz-do-not-send="true" href="http://gss01b.ebi.ac.uk/" target="_blank">gss01b.ebi.ac.uk</a>
quorum-manager</font></tt><tt><br>
</tt><tt><font
size="-1">
3 <a
moz-do-not-send="true"
href="http://gss02a.ebi.ac.uk/" target="_blank"> gss02a.ebi.ac.uk</a>
10.7.28.67 <a
moz-do-not-send="true" href="http://gss02a.ebi.ac.uk/" target="_blank">gss02a.ebi.ac.uk</a>
quorum-manager</font></tt><tt><br>
</tt><tt><font
size="-1">
4 <a
moz-do-not-send="true"
href="http://gss02b.ebi.ac.uk/" target="_blank"> gss02b.ebi.ac.uk</a>
10.7.28.66 <a
moz-do-not-send="true" href="http://gss02b.ebi.ac.uk/" target="_blank">gss02b.ebi.ac.uk</a>
quorum-manager</font></tt><tt><br>
</tt><tt><font
size="-1">
5 <a
moz-do-not-send="true"
href="http://gss03a.ebi.ac.uk/" target="_blank"> gss03a.ebi.ac.uk</a>
10.7.28.34 <a
moz-do-not-send="true" href="http://gss03a.ebi.ac.uk/" target="_blank">gss03a.ebi.ac.uk</a>
quorum-manager</font></tt><tt><br>
</tt><tt><font
size="-1">
6 <a
moz-do-not-send="true"
href="http://gss03b.ebi.ac.uk/" target="_blank"> gss03b.ebi.ac.uk</a>
10.7.28.35 <a
moz-do-not-send="true" href="http://gss03b.ebi.ac.uk/" target="_blank">gss03b.ebi.ac.uk</a>
quorum-manager</font></tt><tt><br>
</tt></blockquote>
</blockquote>
</blockquote>
<font size="-1"><br>
It was my
understanding that
the "admin node"
should use a
different interface
( a 1g link copper
should be fine),
while the daemon
node is where the
data was passing ,
so should point to
the bonded 10g
interfaces. but
when i read the
mmchnode man page i
start to be quite
confused. It says:<br>
<br>
</font><font size="-1"><tt>
--daemon-interface={hostname
| ip_address}</tt><tt><br>
</tt><tt>
Specifies the
host name or IP
address <u><b>to
be used by the
GPFS daemons
for
node-to-node
communication</b></u>.
The host name or
IP address must
refer to the
communication
adapter over which
the GPFS daemons
communicate. <br>
Alias
interfaces are not
allowed. Use the
original address
or a name that is
resolved by the
host command to
that original
address.</tt><tt>
</tt><tt><br>
</tt><tt>
</tt><tt><br>
</tt><tt>
--admin-interface={hostname
| ip_address}</tt><tt><br>
</tt><tt>
Specifies the name
of the node to be
used by GPFS
administration
commands when
communicating
between nodes. The
admin node name
must be specified
as an IP address
or a hostname that
is resolved by
the host command
<br>
to</tt><tt>
</tt><tt>the desired
IP address. If
the keyword
DEFAULT is
specified, the
admin interface
for the node is
set to be equal to
the daemon
interface for the
node.</tt><tt><br>
</tt></font><font
size="-1"><br>
What exactly means
"node-to
node-communications"
? <br>
Means DATA or also
the "lease renew",
and the token
communication
between the clients
to get/steal the
locks to be able to
manage concurrent
write to thr same
file? <br>
Since we are getting
expells ( especially
when several clients
contends the same
file ) i assumed i
have to split this
type of packages
from the data
stream, but reading
the documentation it
looks to me that
those internal
comunication between
nodes use the
daemon-interface
wich i suppose are
used also for the
data. so HOW exactly
i can split them?<br>
</font><font size="-1"><br>
</font><font size="-1"><br>
Thanks in advance,<br>
Salvatore<br>
</font><font size="-1"><br>
</font><br>
<fieldset></fieldset>
<br>
<pre>_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at <a moz-do-not-send="true" href="http://gpfsug.org/" target="_blank">gpfsug.org</a>
<a moz-do-not-send="true" href="http://gpfsug.org/mailman/listinfo/gpfsug-discuss" target="_blank">http://gpfsug.org/mailman/listinfo/gpfsug-discuss</a>
</pre>
</blockquote>
<br>
</div>
_______________________________________________<br>
gpfsug-discuss mailing
list<br>
gpfsug-discuss at <a
moz-do-not-send="true"
href="http://gpfsug.org/"
target="_blank">gpfsug.org</a><br>
<a moz-do-not-send="true"
href="http://gpfsug.org/mailman/listinfo/gpfsug-discuss" target="_blank">http://gpfsug.org/mailman/listinfo/gpfsug-discuss</a><br>
</div>
</blockquote>
</div>
<br>
</blockquote>
<br>
</div>
_______________________________________________<br>
gpfsug-discuss mailing list<br>
gpfsug-discuss at <a
moz-do-not-send="true"
href="http://gpfsug.org"
target="_blank">gpfsug.org</a><br>
<a moz-do-not-send="true"
href="http://gpfsug.org/mailman/listinfo/gpfsug-discuss"
target="_blank">http://gpfsug.org/mailman/listinfo/gpfsug-discuss</a><br>
</div>
</blockquote>
</div>
<br>
</div>
</div>
</blockquote>
<blockquote type="cite">
<div><span>_______________________________________________</span><br>
<span>gpfsug-discuss mailing list</span><br>
<span>gpfsug-discuss at <a
moz-do-not-send="true"
href="http://gpfsug.org" target="_blank">gpfsug.org</a></span><br>
<span><a moz-do-not-send="true"
href="http://gpfsug.org/mailman/listinfo/gpfsug-discuss"
target="_blank">http://gpfsug.org/mailman/listinfo/gpfsug-discuss</a></span><br>
</div>
</blockquote>
</div>
<br>
_______________________________________________<br>
gpfsug-discuss mailing list<br>
gpfsug-discuss at <a moz-do-not-send="true"
href="http://gpfsug.org" rel="noreferrer"
target="_blank">gpfsug.org</a><br>
<a moz-do-not-send="true"
href="http://gpfsug.org/mailman/listinfo/gpfsug-discuss"
rel="noreferrer" target="_blank">http://gpfsug.org/mailman/listinfo/gpfsug-discuss</a><br>
<br>
</blockquote>
</div>
<br>
<br clear="all">
<span class="HOEnZb"><font color="#888888"> <br>
-- <br>
<div>This communication contains confidential
information intended only for the persons to
whom it is addressed. Any other distribution,
copying or disclosure is strictly prohibited. If
you have received this communication in error,
please notify the sender and delete this e-mail
message immediately.<br>
<br>
Le présent message contient des renseignements
de nature confidentielle réservés uniquement à
l'usage du destinataire. Toute diffusion,
distribution, divulgation, utilisation ou
reproduction de la présente communication, et de
tout fichier qui y est joint, est strictement
interdite. Si vous avez reçu le présent message
électronique par erreur, veuillez informer
immédiatement l'expéditeur et supprimer le
message de votre ordinateur et de votre serveur.</div>
</font></span></div>
<span class="HOEnZb"><font color="#888888"> <br>
<fieldset></fieldset>
<br>
<pre>_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at <a moz-do-not-send="true" href="http://gpfsug.org" target="_blank">gpfsug.org</a>
<a moz-do-not-send="true" href="http://gpfsug.org/mailman/listinfo/gpfsug-discuss" target="_blank">http://gpfsug.org/mailman/listinfo/gpfsug-discuss</a>
</pre>
</font></span></blockquote>
<br>
</div>
<br>
_______________________________________________<br>
gpfsug-discuss mailing list<br>
gpfsug-discuss at <a moz-do-not-send="true"
href="http://gpfsug.org" rel="noreferrer" target="_blank">gpfsug.org</a><br>
<a moz-do-not-send="true"
href="http://gpfsug.org/mailman/listinfo/gpfsug-discuss"
rel="noreferrer" target="_blank">http://gpfsug.org/mailman/listinfo/gpfsug-discuss</a><br>
<br>
</blockquote>
</div>
<br>
<br clear="all">
<br>
-- <br>
<div class="gmail_signature">This communication contains
confidential information intended only for the persons to whom
it is addressed. Any other distribution, copying or disclosure
is strictly prohibited. If you have received this
communication in error, please notify the sender and delete
this e-mail message immediately.<br>
<br>
Le présent message contient des renseignements de nature
confidentielle réservés uniquement à l'usage du destinataire.
Toute diffusion, distribution, divulgation, utilisation ou
reproduction de la présente communication, et de tout fichier
qui y est joint, est strictement interdite. Si vous avez reçu
le présent message électronique par erreur, veuillez informer
immédiatement l'expéditeur et supprimer le message de votre
ordinateur et de votre serveur.</div>
</div>
<br>
<fieldset class="mimeAttachmentHeader"></fieldset>
<br>
<pre wrap="">_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at gpfsug.org
<a class="moz-txt-link-freetext" href="http://gpfsug.org/mailman/listinfo/gpfsug-discuss">http://gpfsug.org/mailman/listinfo/gpfsug-discuss</a>
</pre>
</blockquote>
<br>
</body>
</html>