[gpfsug-discuss] CCR troubles - CCR vs systemd
Aaron Knister
aaron.s.knister at nasa.gov
Thu Jul 28 22:29:22 BST 2016
Hi Marc,
I've seen systemd be overly helpful (read: not at all helpful) when it
observes state changing outside of its control. There was a bug I
encountered with GPFS (although the real issue may have been systemd,
but the fix was put into GPFS) by which GPFS filesystems would get
unmounted a split second after they were mounted, by systemd. The fs
would mount but systemd decided the /dev/$fs device wasn't "ready" so it
helpfully unmounted the filesystem. I don't know much about systemd
(avoiding it) but based on my experience with it I could certainly see a
case where systemd may actively kill the sdrserv process shortly after
it's started by the mm* commands if systemd doesn't expect it to be running.
I'd be curious to see the output of /var/adm/ras/mmsdrserv.log from the
manager nodes to see if sdrserv is indeed starting but getting harpooned
by systemd.
-Aaron
On 7/28/16 4:16 PM, Marc A Kaplan wrote:
> Allow me to restate and demonstrate:
>
> Even if systemd or any explicit kill signals destroy any/all running
> mmcr* and mmsdr* processes,
>
> simply running mmlsconfig will fire up new mmcr* and mmsdr* processes.
> For example:
>
> ## I used kill -9 to kill all mmccr, mmsdr, lxtrace, ... processes
>
> [root at n2 gpfs-git]# ps auwx | grep mm
> root 9891 0.0 0.0 112640 980 pts/1 S+ 12:57 0:00 grep
> --color=auto mm
>
> [root at n2 gpfs-git]# mmlsconfig
> Configuration data for cluster madagascar.frozen:
> -------------------------------------------------
> clusterName madagascar.frozen
> ...
> worker1Threads 1022
> adminMode central
>
> File systems in cluster madagascar.frozen:
> ------------------------------------------
> /dev/mak
> /dev/x1
> /dev/yy
> /dev/zz
>
> ## mmlsconfig "needs" ccr and sdrserv, so if it doesn't see them, it
> restarts them!
>
> [root at n2 gpfs-git]# ps auwx | grep mm
> root 9929 0.0 0.0 114376 1696 pts/1 S 12:58 0:00
> /usr/lpp/mmfs/bin/mmksh /usr/lpp/mmfs/bin/mmccrmonitor 15
> root 10110 0.0 0.0 20536 128 ? Ss 12:58 0:00
> /usr/lpp/mmfs/bin/lxtrace-3.10.0-123.el7.x86_64 on /tmp/mmfs/lxtrac
> root 10125 0.0 0.0 493264 11064 ? Ssl 12:58 0:00
> /usr/lpp/mmfs/bin/mmsdrserv 1191 10 10 /var/adm/ras/mmsdrserv.log 1
> root 10358 0.0 0.0 1700488 17636 ? Sl 12:58 0:00 python
> /usr/lpp/mmfs/bin/mmsysmon.py
> root 10440 0.0 0.0 114376 804 pts/1 S 12:59 0:00
> /usr/lpp/mmfs/bin/mmksh /usr/lpp/mmfs/bin/mmccrmonitor 15
> root 10442 0.0 0.0 112640 976 pts/1 S+ 12:59 0:00 grep
> --color=auto mm
>
>
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>
--
Aaron Knister
NASA Center for Climate Simulation (Code 606.2)
Goddard Space Flight Center
(301) 286-2776
More information about the gpfsug-discuss
mailing list