[gpfsug-discuss] GPFS autoload - wait for IB portstobecomeactive

Frederick Stock stockf at us.ibm.com
Fri Mar 16 12:05:29 GMT 2018


I have my doubts that mmdiag can be used in this script.  In general the 
guidance is to avoid or be very careful with mm* commands in a callback 
due to the potential for deadlock.

Fred
__________________________________________________
Fred Stock | IBM Pittsburgh Lab | 720-430-8821
stockf at us.ibm.com



From:   Jan-Frode Myklebust <janfrode at tanso.net>
To:     gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
Date:   03/16/2018 04:30 AM
Subject:        Re: [gpfsug-discuss] GPFS autoload - wait for IB ports 
tobecomeactive
Sent by:        gpfsug-discuss-bounces at spectrumscale.org



Thanks Olaf, but we don't use NetworkManager on this cluster..

I now created this simple script:


-------------------------------------------------------------------------------------------------------------------------------------------------------------
#! /bin/bash -
#
# Fail mmstartup if not all configured IB ports are active.
#
# Install with:
#
# mmaddcallback fail-if-ibfail --command /var/mmfs/etc/fail-if-ibfail 
--event preStartup --sync --onerror shutdown
#

for port in $(/usr/lpp/mmfs/bin/mmdiag --config|grep verbsPorts | cut -f 
4- -d " ")
do
grep  -q ACTIVE /sys/class/infiniband/${port%/*}/ports/${port##*/}/state 
|| exit 1
done
-------------------------------------------------------------------------------------------------------------------------------------------------------------

which I haven't tested, but assume should work. Suggestions for 
improvements would be much appreciated!



  -jf


On Thu, Mar 15, 2018 at 6:30 PM, Olaf Weiser <olaf.weiser at de.ibm.com> 
wrote:

you can try :
systemctl enable  NetworkManager-wait-online
ln -s '/usr/lib/systemd/system/NetworkManager-wait-online.service' 
'/etc/systemd/system/multi-user.target.wants/NetworkManager-wait-online.service'


in many cases .. it helps .. 





From:        Jan-Frode Myklebust <janfrode at tanso.net>
To:        gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
Date:        03/15/2018 06:18 PM
Subject:        Re: [gpfsug-discuss] GPFS autoload - wait for IB ports to 
       becomeactive
Sent by:        gpfsug-discuss-bounces at spectrumscale.org



I found some discussion on this at 
https://www.ibm.com/developerworks/community/forums/html/threadTopic?id=77777777-0000-0000-0000-000014471957&ps=25
and there it's claimed that none of the callback events are early enough 
to resolve this. That we need a pre-preStartup trigger. Any idea if this 
has changed -- or is the callback option then only to do a "--onerror 
shutdown" if it has failed to connect IB ?


On Thu, Mar 8, 2018 at 1:42 PM, Frederick Stock <stockf at us.ibm.com> wrote:
You could also use the GPFS prestartup callback (mmaddcallback) to execute 
a script synchronously that waits for the IB ports to become available 
before returning and allowing GPFS to continue.  Not systemd integrated 
but it should work.

Fred
__________________________________________________
Fred Stock | IBM Pittsburgh Lab | 720-430-8821
stockf at us.ibm.com



From:        david_johnson at brown.edu
To:        gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
Date:        03/08/2018 07:34 AM
Subject:        Re: [gpfsug-discuss] GPFS autoload - wait for IB ports to 
become        active
Sent by:        gpfsug-discuss-bounces at spectrumscale.org




Until IBM provides a solution, here is my workaround. Add it so it runs 
before the gpfs script, I call it from our custom xcat diskless boot 
scripts. Based on rhel7, not fully systemd integrated. YMMV!

Regards, 
 — ddj
——-
[ddj at storage041 ~]$ cat /etc/init.d/ibready 
#! /bin/bash
#
# chkconfig: 2345 06 94
# /etc/rc.d/init.d/ibready
# written in 2016 David D Johnson (ddj <at> brown.edu)
#
### BEGIN INIT INFO
# Provides:             ibready
# Required-Start:
# Required-Stop:
# Default-Stop:
# Description: Block until infiniband is ready
# Short-Description: Block until infiniband is ready
### END INIT INFO

RETVAL=0
if [[ -d /sys/class/infiniband ]] 
then
        IBDEVICE=$(dirname $(grep -il infiniband 
/sys/class/infiniband/*/ports/1/link* | head -n 1))
fi
# See how we were called.
case "$1" in
  start)
        if [[ -n $IBDEVICE && -f $IBDEVICE/state ]]
        then
                echo -n "Polling for InfiniBand link up: "
                for (( count = 60; count > 0; count-- ))
                do
                        if grep -q ACTIVE $IBDEVICE/state
                        then
                                echo ACTIVE
                                break
                        fi
                        echo -n "."
                        sleep 5
                done
                if (( count <= 0 ))
                then
                        echo DOWN - $0 timed out
                fi
        fi
        ;;
  stop|restart|reload|force-reload|condrestart|try-restart)
        ;;
  status)
        if [[ -n $IBDEVICE && -f $IBDEVICE/state ]]
        then
                echo "$IBDEVICE is $(< $IBDEVICE/state) $(< 
$IBDEVICE/rate)"
        else
                echo "No IBDEVICE found"
        fi
        ;;
  *)
        echo "Usage: ibready 
{start|stop|status|restart|reload|force-reload|condrestart|try-restart}"
        exit 2
esac
exit ${RETVAL}
————

  -- ddj
Dave Johnson

On Mar 8, 2018, at 6:10 AM, Caubet Serrabou Marc (PSI) <marc.caubet at psi.ch
> wrote:

Hi all,

with autoload = yes we do not ensure that GPFS will be started after the 
IB link becomes up. Is there a way to force GPFS waiting to start until IB 
ports are up? This can be probably done by adding something like 
After=network-online.target and Wants=network-online.target in the systemd 
file but I would like to know if this is natively possible from the GPFS 
configuration.

Thanks a lot,
Marc                
_________________________________________
Paul Scherrer Institut 
High Performance Computing
Marc Caubet Serrabou
WHGA/036
5232 Villigen PSI
Switzerland

Telephone: +41 56 310 46 67
E-Mail: marc.caubet at psi.ch
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=p_1XEUyoJ7-VJxF_w8h9gJh8_Wj0Pey73LCLLoxodpw&m=u-EMob09-dkE6jZbD3dTjBi3vWhmDXtxiOK3nqFyIgY&s=JCfJgq6pZnKUI6d-rIgJXVcdZh7vmA5ypB1_goP_FFA&e=





_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss

_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss




_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss

_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=p_1XEUyoJ7-VJxF_w8h9gJh8_Wj0Pey73LCLLoxodpw&m=xImYTxt4pm1o5znVn5Vdoka2uxgsTRpmlCGdEWhB9vw&s=veOZZz80aBzoCTKusx6WOpVlYs64eNkp5pM9kbHgvic&e=





-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20180316/277f5975/attachment-0002.htm>


More information about the gpfsug-discuss mailing list