[gpfsug-discuss] CCR troubles - CCR vs systemd

Aaron Knister aaron.s.knister at nasa.gov
Thu Jul 28 22:29:22 BST 2016


Hi Marc,

I've seen systemd be overly helpful (read: not at all helpful) when it 
observes state changing outside of its control. There was a bug I 
encountered with GPFS (although the real issue may have been systemd, 
but the fix was put into GPFS) by which GPFS filesystems would get 
unmounted a split second after they were mounted, by systemd. The fs 
would mount but systemd decided the /dev/$fs device wasn't "ready" so it 
helpfully unmounted the filesystem. I don't know much about systemd 
(avoiding it) but based on my experience with it I could certainly see a 
case where systemd may actively kill the sdrserv process shortly after 
it's started by the mm* commands if systemd doesn't expect it to be running.

I'd be curious to see the output of /var/adm/ras/mmsdrserv.log from the 
manager nodes to see if sdrserv is indeed starting but getting harpooned 
by systemd.

-Aaron

On 7/28/16 4:16 PM, Marc A Kaplan wrote:
> Allow me to restate and demonstrate:
>
> Even if systemd or any explicit kill signals destroy any/all running
> mmcr* and mmsdr* processes,
>
> simply running mmlsconfig will fire up new mmcr* and mmsdr* processes.
>  For example:
>
> ## I used kill -9 to kill all mmccr, mmsdr, lxtrace, ... processes
>
> [root at n2 gpfs-git]# ps auwx | grep mm
> root      9891  0.0  0.0 112640   980 pts/1    S+   12:57   0:00 grep
> --color=auto mm
>
> [root at n2 gpfs-git]# mmlsconfig
> Configuration data for cluster madagascar.frozen:
> -------------------------------------------------
> clusterName madagascar.frozen
>    ...
> worker1Threads 1022
> adminMode central
>
> File systems in cluster madagascar.frozen:
> ------------------------------------------
> /dev/mak
> /dev/x1
> /dev/yy
> /dev/zz
>
> ## mmlsconfig "needs" ccr and sdrserv, so if it doesn't see them, it
> restarts them!
>
> [root at n2 gpfs-git]# ps auwx | grep mm
> root      9929  0.0  0.0 114376  1696 pts/1    S    12:58   0:00
> /usr/lpp/mmfs/bin/mmksh /usr/lpp/mmfs/bin/mmccrmonitor 15
> root     10110  0.0  0.0  20536   128 ?        Ss   12:58   0:00
> /usr/lpp/mmfs/bin/lxtrace-3.10.0-123.el7.x86_64 on /tmp/mmfs/lxtrac
> root     10125  0.0  0.0 493264 11064 ?        Ssl  12:58   0:00
> /usr/lpp/mmfs/bin/mmsdrserv 1191 10 10 /var/adm/ras/mmsdrserv.log 1
> root     10358  0.0  0.0 1700488 17636 ?       Sl   12:58   0:00 python
> /usr/lpp/mmfs/bin/mmsysmon.py
> root     10440  0.0  0.0 114376   804 pts/1    S    12:59   0:00
> /usr/lpp/mmfs/bin/mmksh /usr/lpp/mmfs/bin/mmccrmonitor 15
> root     10442  0.0  0.0 112640   976 pts/1    S+   12:59   0:00 grep
> --color=auto mm
>
>
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>

-- 
Aaron Knister
NASA Center for Climate Simulation (Code 606.2)
Goddard Space Flight Center
(301) 286-2776



More information about the gpfsug-discuss mailing list