[gpfsug-discuss] data interface and management infercace.

Salvatore Di Nardo sdinardo at ebi.ac.uk
Wed Jul 22 14:51:04 BST 2015


Hello,
no, still didn't anything because we have to drain 2PB data , into a 
slower storage.. so it will take few weeks. I expect doing it the second 
half of August.
Will let you all know the results once done and  properly tested.

Salvatore

On 22/07/15 13:58, Muhammad Habib wrote:
> did you implement it ? looks ok.   All daemon traffic should be going 
> through black network including inter-cluster daemon traffic ( assume 
> black subnet routable). All data traffic should be going through the 
> blue network.  You may need to run iptrace or tcpdump to make sure 
> proper network are in use.  You can always open a PMR if you having 
> issue during the configuration .
>
> Thanks
>
> On Wed, Jul 15, 2015 at 5:19 AM, Salvatore Di Nardo 
> <sdinardo at ebi.ac.uk <mailto:sdinardo at ebi.ac.uk>> wrote:
>
>     Thanks for the input.. this is actually very interesting!
>
>     Reading here:
>     https://www.ibm.com/developerworks/community/wikis/home?lang=en#!/wiki/General+Parallel+File+System+%28GPFS%29/page/GPFS+Network+Communication+Overview
>     <https://www.ibm.com/developerworks/community/wikis/home?lang=en#%21/wiki/General+Parallel+File+System+%28GPFS%29/page/GPFS+Network+Communication+Overview>
>     ,
>     specifically the " Using more than one network" part it seems to
>     me that this way we should be able to split the lease/token/ping
>     from the data.
>
>     Supposing that I implement a GSS cluster with only NDS and a
>     second cluster with only clients:
>
>
>
>     As far i understood if on the NDS cluster add first the subnet
>     10.20.0.0/16 <http://10.20.0.0/16> and then 10.30.0.0 is should
>     use the internal network for all the node-to-node comunication,
>     leaving the 10.30.0.0/30 <http://10.30.0.0/30> only for data
>     traffic witht he remote cluster ( the clients). Similarly, in the
>     client cluster, adding first 10.10.0.0/16 <http://10.10.0.0/16>
>     and then 10.30.0.0, will guarantee than the node-to-node
>     comunication pass trough a different interface there the data is
>     passing. Since the client are just "clients" the traffic trough
>     10.10.0.0/16 <http://10.10.0.0/16> should be minimal (only token
>     ,lease, ping and so on ) and not affected by the rest. Should be
>     possible at this point move aldo the "admin network" on the
>     internal interface, so we effectively splitted all the "non data"
>     traffic on a dedicated interface.
>
>     I'm wondering if I'm missing something, and in case i didn't, what
>     could be the real traffic in the internal (black) networks ( 1g
>     link its fine or i still need 10g for that). Another thing I I'm
>     wondering its the load of the "non data" traffic between the
>     clusters.. i suppose some "daemon traffic" goes trough the blue
>     interface for the inter-cluster communication.
>
>
>     Any thoughts ?
>
>     Salvatore
>
>     On 13/07/15 18:19, Muhammad Habib wrote:
>>     Did you look at "subnets" parameter used with "mmchconfig"
>>     command. I think you can use order list of subnets for daemon
>>     communication and then actual daemon interface can be used for
>>     data transfer.  When the GPFS will start it will use actual
>>     daemon interface for communication , however , once its started ,
>>     it will use the IPs from the subnet list whichever coming first
>>     in the list.   To further validate , you can put network sniffer
>>     before you do actual implementation or alternatively you can open
>>     a PMR with IBM.
>>
>>     If your cluster having expel situation , you may fine tune your
>>     cluster e.g. increase ping timeout period , having multiple NSD
>>     servers and distributing filesystems across these NSD servers.
>>     Also critical servers can have HBA cards installed for direct I/O
>>     through fiber.
>>
>>     Thanks
>>
>>     On Mon, Jul 13, 2015 at 11:22 AM, Jason Hick <jhick at lbl.gov
>>     <mailto:jhick at lbl.gov>> wrote:
>>
>>         Hi,
>>
>>         Yes having separate data and management networks has been
>>         critical for us for keeping health monitoring/communication
>>         unimpeded by data movement.
>>
>>         Not as important, but you can also tune the networks
>>         differently (packet sizes, buffer sizes, SAK, etc) which can
>>         help.
>>
>>         Jason
>>
>>         On Jul 13, 2015, at 7:25 AM, Vic Cornell
>>         <viccornell at gmail.com <mailto:viccornell at gmail.com>> wrote:
>>
>>>         Hi Salvatore,
>>>
>>>         I agree that that is what the manual - and some of the wiki
>>>         entries say.
>>>
>>>         However , when we have had problems (typically congestion)
>>>         with ethernet networks in the past (20GbE or 40GbE) we have
>>>         resolved them by setting up a separate “Admin” network.
>>>
>>>         The before and after cluster health we have seen measured in
>>>         number of expels and waiters has been very marked.
>>>
>>>         Maybe someone “in the know” could comment on this split.
>>>
>>>         Regards,
>>>
>>>         Vic
>>>
>>>
>>>>         On 13 Jul 2015, at 14:29, Salvatore Di Nardo
>>>>         <sdinardo at ebi.ac.uk <mailto:sdinardo at ebi.ac.uk>> wrote:
>>>>
>>>>         Hello Vic.
>>>>         We are currently draining our gpfs to do all the recabling
>>>>         to add a management network, but looking what the admin
>>>>         interface does ( man mmchnode ) it says something different:
>>>>
>>>>                 --admin-interface={hostname | ip_address}
>>>>                 Specifies the name of the node to be used by GPFS
>>>>                 administration commands when communicating between
>>>>                 nodes. The admin node name must be specified as an IP
>>>>                 address or a hostname that is resolved by the host
>>>>                 command to the desired IP address.  If the keyword
>>>>                 DEFAULT is specified, the admin interface  for  the
>>>>                 node is set to be equal to the daemon interface for
>>>>                 the node.
>>>>
>>>>
>>>>         So, seems used only for commands propagation,  hence have
>>>>         nothing to do with the node-to-node traffic. Infact the
>>>>         other interface description is:
>>>>
>>>>                  --daemon-interface={hostname | ip_address}
>>>>                 Specifies the host name or IP address _*to be used
>>>>                 by the GPFS daemons for node-to-node
>>>>                 communication*_. The host name or IP address must
>>>>                 refer to the commu-
>>>>                 nication adapter over which the GPFS daemons
>>>>                 communicate. Alias interfaces are not allowed. Use
>>>>                 the original address or a name that  is resolved 
>>>>                 by  the
>>>>                 host command to that original address.
>>>>
>>>>
>>>>         The "expired lease" issue and file locking mechanism a(
>>>>         most of our expells happens when 2 clients try to write in
>>>>         the same file) are exactly node-to node-comunication, so 
>>>>         im wondering what's the point to separate the "admin
>>>>         network".  I want to be sure to plan the right changes
>>>>         before we do a so massive task. We are talking about adding
>>>>         a new interface on 700 clients, so the recabling work its
>>>>         not small.
>>>>
>>>>
>>>>         Regards,
>>>>         Salvatore
>>>>
>>>>
>>>>
>>>>         On 13/07/15 14:00, Vic Cornell wrote:
>>>>>         Hi Salavatore,
>>>>>
>>>>>         Does your GSS have the facility for a 1GbE “management”
>>>>>         network? If so I think that changing the “admin” node
>>>>>         names of the cluster members to a set of IPs on the
>>>>>         management network would give you the split that you need.
>>>>>
>>>>>         What about the clients? Can they also connect to a
>>>>>         separate admin network?
>>>>>
>>>>>         Remember that if you are using multi-cluster all of the
>>>>>         nodes in both networks must share the same admin network.
>>>>>
>>>>>         Kind Regards,
>>>>>
>>>>>         Vic
>>>>>
>>>>>
>>>>>>         On 13 Jul 2015, at 13:31, Salvatore Di Nardo
>>>>>>         <sdinardo at ebi.ac.uk <mailto:sdinardo at ebi.ac.uk>> wrote:
>>>>>>
>>>>>>         Anyone?
>>>>>>
>>>>>>         On 10/07/15 11:07, Salvatore Di Nardo wrote:
>>>>>>>         Hello guys.
>>>>>>>         Quite a while ago i mentioned that we have a big  expel
>>>>>>>         issue on our gss ( first gen) and white a lot people
>>>>>>>         suggested that the root cause could be that we use the
>>>>>>>         same interface for all the traffic, and that we should
>>>>>>>         split the data network from the admin network. Finally
>>>>>>>         we could plan a downtime and we are migrating the data
>>>>>>>         out so, i can soon safelly play with the change, but
>>>>>>>         looking what exactly i should to do i'm a bit puzzled.
>>>>>>>         Our mmlscluster looks like this:
>>>>>>>
>>>>>>>                     GPFS cluster information
>>>>>>>                     ========================
>>>>>>>                     GPFS cluster name: GSS.ebi.ac.uk
>>>>>>>                     <http://gss.ebi.ac.uk/>
>>>>>>>                     GPFS cluster id: 17987981184946329605
>>>>>>>                     GPFS UID domain: GSS.ebi.ac.uk
>>>>>>>                     <http://gss.ebi.ac.uk/>
>>>>>>>                     Remote shell command: /usr/bin/ssh
>>>>>>>                     Remote file copy command: /usr/bin/scp
>>>>>>>
>>>>>>>                     GPFS cluster configuration servers:
>>>>>>>                     -----------------------------------
>>>>>>>                     Primary server: gss01a.ebi.ac.uk
>>>>>>>                     <http://gss01a.ebi.ac.uk/>
>>>>>>>                     Secondary server: gss02b.ebi.ac.uk
>>>>>>>                     <http://gss02b.ebi.ac.uk/>
>>>>>>>
>>>>>>>                      Node Daemon node name    IP address  Admin
>>>>>>>                     node name Designation
>>>>>>>                     -----------------------------------------------------------------------
>>>>>>>                     1 gss01a.ebi.ac.uk
>>>>>>>                     <http://gss01a.ebi.ac.uk/> 10.7.28.2
>>>>>>>                     gss01a.ebi.ac.uk <http://gss01a.ebi.ac.uk/>
>>>>>>>                     quorum-manager
>>>>>>>                     2 gss01b.ebi.ac.uk
>>>>>>>                     <http://gss01b.ebi.ac.uk/> 10.7.28.3
>>>>>>>                     gss01b.ebi.ac.uk <http://gss01b.ebi.ac.uk/>
>>>>>>>                     quorum-manager
>>>>>>>                     3 gss02a.ebi.ac.uk
>>>>>>>                     <http://gss02a.ebi.ac.uk/> 10.7.28.67
>>>>>>>                     gss02a.ebi.ac.uk <http://gss02a.ebi.ac.uk/>
>>>>>>>                     quorum-manager
>>>>>>>                     4 gss02b.ebi.ac.uk
>>>>>>>                     <http://gss02b.ebi.ac.uk/> 10.7.28.66
>>>>>>>                     gss02b.ebi.ac.uk <http://gss02b.ebi.ac.uk/>
>>>>>>>                     quorum-manager
>>>>>>>                     5 gss03a.ebi.ac.uk
>>>>>>>                     <http://gss03a.ebi.ac.uk/> 10.7.28.34
>>>>>>>                     gss03a.ebi.ac.uk <http://gss03a.ebi.ac.uk/>
>>>>>>>                     quorum-manager
>>>>>>>                     6 gss03b.ebi.ac.uk
>>>>>>>                     <http://gss03b.ebi.ac.uk/> 10.7.28.35
>>>>>>>                     gss03b.ebi.ac.uk <http://gss03b.ebi.ac.uk/>
>>>>>>>                     quorum-manager
>>>>>>>
>>>>>>>
>>>>>>>         It was my understanding that the "admin node" should use
>>>>>>>         a different interface ( a 1g link copper should be
>>>>>>>         fine), while the daemon node is where the data was
>>>>>>>         passing , so should point to the bonded 10g interfaces. 
>>>>>>>         but when i read the mmchnode man page i start to be
>>>>>>>         quite confused. It says:
>>>>>>>
>>>>>>>         --daemon-interface={hostname | ip_address}
>>>>>>>         Specifies  the host  name or IP address _*to be used by
>>>>>>>         the GPFS daemons for node-to-node communication*_. The
>>>>>>>         host name or IP address must refer to the communication
>>>>>>>         adapter over which the GPFS daemons communicate.
>>>>>>>                  Alias interfaces are not allowed. Use the
>>>>>>>         original address or a name that is resolved by the host
>>>>>>>         command to that original address.
>>>>>>>
>>>>>>>         --admin-interface={hostname | ip_address}
>>>>>>>         Specifies the name of the node to be used by GPFS
>>>>>>>         administration commands when communicating between
>>>>>>>         nodes. The admin node name must be specified as an IP
>>>>>>>         address or a hostname that is resolved by the  host command
>>>>>>>                  tothe desired IP address.  If the keyword
>>>>>>>         DEFAULT is specified, the admin interface for the node
>>>>>>>         is set to be equal to the daemon interface for the node.
>>>>>>>
>>>>>>>         What exactly means "node-to node-communications" ?
>>>>>>>         Means DATA or also the "lease renew", and the token
>>>>>>>         communication between the clients to get/steal the locks
>>>>>>>         to be able to manage concurrent write to thr same file?
>>>>>>>         Since we are getting expells ( especially when several
>>>>>>>         clients contends the same file ) i assumed i have to
>>>>>>>         split this type of packages from the data stream, but
>>>>>>>         reading the documentation it looks to me that those
>>>>>>>         internal comunication between nodes use the
>>>>>>>         daemon-interface wich i suppose are used also for the
>>>>>>>         data. so HOW exactly i can split them?
>>>>>>>
>>>>>>>
>>>>>>>         Thanks in advance,
>>>>>>>         Salvatore
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>         _______________________________________________
>>>>>>>         gpfsug-discuss mailing list
>>>>>>>         gpfsug-discuss atgpfsug.org  <http://gpfsug.org/>
>>>>>>>         http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>>>>>>
>>>>>>         _______________________________________________
>>>>>>         gpfsug-discuss mailing list
>>>>>>         gpfsug-discuss at gpfsug.org <http://gpfsug.org/>
>>>>>>         http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>>>>>
>>>>
>>>>         _______________________________________________
>>>>         gpfsug-discuss mailing list
>>>>         gpfsug-discuss at gpfsug.org <http://gpfsug.org>
>>>>         http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>>>
>>>         _______________________________________________
>>>         gpfsug-discuss mailing list
>>>         gpfsug-discuss at gpfsug.org <http://gpfsug.org>
>>>         http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>>
>>         _______________________________________________
>>         gpfsug-discuss mailing list
>>         gpfsug-discuss at gpfsug.org <http://gpfsug.org>
>>         http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>>
>>
>>
>>
>>     -- 
>>     This communication contains confidential information intended
>>     only for the persons to whom it is addressed. Any other
>>     distribution, copying or disclosure is strictly prohibited. If
>>     you have received this communication in error, please notify the
>>     sender and delete this e-mail message immediately.
>>
>>     Le présent message contient des renseignements de nature
>>     confidentielle réservés uniquement à l'usage du destinataire.
>>     Toute diffusion, distribution, divulgation, utilisation ou
>>     reproduction de la présente communication, et de tout fichier qui
>>     y est joint, est strictement interdite. Si vous avez reçu le
>>     présent message électronique par erreur, veuillez informer
>>     immédiatement l'expéditeur et supprimer le message de votre
>>     ordinateur et de votre serveur.
>>
>>
>>     _______________________________________________
>>     gpfsug-discuss mailing list
>>     gpfsug-discuss atgpfsug.org  <http://gpfsug.org>
>>     http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>
>
>     _______________________________________________
>     gpfsug-discuss mailing list
>     gpfsug-discuss at gpfsug.org <http://gpfsug.org>
>     http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>
>
>
>
> -- 
> This communication contains confidential information intended only for 
> the persons to whom it is addressed. Any other distribution, copying 
> or disclosure is strictly prohibited. If you have received this 
> communication in error, please notify the sender and delete this 
> e-mail message immediately.
>
> Le présent message contient des renseignements de nature 
> confidentielle réservés uniquement à l'usage du destinataire. Toute 
> diffusion, distribution, divulgation, utilisation ou reproduction de 
> la présente communication, et de tout fichier qui y est joint, est 
> strictement interdite. Si vous avez reçu le présent message 
> électronique par erreur, veuillez informer immédiatement l'expéditeur 
> et supprimer le message de votre ordinateur et de votre serveur.
>
>
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at gpfsug.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20150722/c54553d5/attachment-0003.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: image/jpeg
Size: 28904 bytes
Desc: not available
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20150722/c54553d5/attachment-0003.jpe>


More information about the gpfsug-discuss mailing list