[gpfsug-discuss] infiniband fabric instability effects

david_johnson at brown.edu david_johnson at brown.edu
Fri Sep 13 10:14:06 BST 2019


Restarting subnet manager in general is fairly harmless. It will cause a heavy sweep of the fabric when it comes back up, but there should be no LID renumbering. Traffic may be held up during the scanning and rebuild of the routing tables. 
 Losing a subnet manager for a period of time would prevent newly booted nodes from receiving a LID but existing nodes will continue to function. 
Adding or deleting inter-switch links should probably be avoided if the subnet manager is down.  I would also avoid changing the routing algorithm while in production.  
Moving a non ha subnet manager from primary to backup and back again has worked for us without disruption, but I would try to do this in a maintenance window. 

  -- ddj
Dave Johnson

> On Sep 13, 2019, at 4:48 AM, Lehmann, Greg (IM&T, Pullenvale) <Greg.Lehmann at csiro.au> wrote:
> 
> Hi All,
>                 I was wondering what effect restarting the subnet manager has on an active Spectrum Scale filesystem. Is there any scope for data loss or corruption? A 2nd similar scenario of slightly longer duration is failover to a secondary subnet manager because the primary has crashed. What effect would that have on the filesystem?
>  
> Cheers,
>  
> Greg Lehmann
> Senior High Performance Data Specialist
> Data Services | Scientific Computing Platforms
> Information Management and Technology  |  CSIRO 
> Greg.Lehmann at csiro.au  |  +61 7 3327 4137 |
> 1 Technology Court, Pullenvale, QLD 4069
>  
> CSIRO acknowledges the Traditional Owners of the land, sea and waters, of the area that we live and work on across Australia. We acknowledge their continuing connection to their culture and we pay our respects to their Elders past and present.
>  
> The information contained in this email may be confidential or privileged. Any unauthorised use or disclosure is prohibited. If you have received this email in error, please delete it immediately and notify the sender by return email. Thank you. To the extent permitted by law, CSIRO does not represent, warrant and/or guarantee that the integrity of this communication has been maintained or that the communication is free of errors, virus, interception or interference.
>  
> Please consider the environment before printing this email.
>  
> CSIRO Australia’s National Science Agency  |  csiro.au
>  
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20190913/491fcfd7/attachment-0002.htm>


More information about the gpfsug-discuss mailing list