[gpfsug-discuss] Unexpected data in message/Bad message

IBM Spectrum Scale scale at us.ibm.com
Sun Nov 11 18:07:17 GMT 2018


Hi Aaron,

The header dump shows all zeroes were received for the header. So no valid
magic, version, originator, etc. The "512 more bytes" would have been the
meat after the header. Very unexpected hence the shutdown.

Logs around that event involving the machines noted in that trace would be
required to evaluate further. This is not common.

Regards, The Spectrum Scale (GPFS) team

------------------------------------------------------------------------------------------------------------------

If you feel that your question can benefit other users of  Spectrum Scale
(GPFS), then please post it to the public IBM developerWroks Forum at
https://www.ibm.com/developerworks/community/forums/html/forum?id=11111111-0000-0000-0000-000000000479.


If your query concerns a potential software error in Spectrum Scale (GPFS)
and you have an IBM software maintenance contract please contact
1-800-237-5511 in the United States or your local IBM Service Center in
other countries.

The forum is informally monitored as time permits and should not be used
for priority messages to the Spectrum Scale (GPFS) team.



From:	Aaron Knister <aaron.s.knister at nasa.gov>
To:	gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
Date:	11/07/2018 06:38 PM
Subject:	[gpfsug-discuss] Unexpected data in message/Bad message
Sent by:	gpfsug-discuss-bounces at spectrumscale.org



We're experiencing client nodes falling out of the cluster with errors
that look like this:

Tue Nov  6 15:10:34.939 2018: [E] Unexpected data in message. Header
dump: 00000000 0000 0000 00000047 00000000 00 00 0000 00000000 00000000
0000 0000
Tue Nov  6 15:10:34.942 2018: [E] [0/0] 512 more bytes were available:
Tue Nov  6 15:10:34.965 2018: [N] Close connection to 10.100.X.X
nsdserver1 <c0n71> (Unexpected error 120)
Tue Nov  6 15:10:34.966 2018: [E] Network error on 10.100.X.X nsdserver1
<c0n71>, Check connectivity
Tue Nov  6 15:10:36.726 2018: [N] Restarting mmsdrserv
Tue Nov  6 15:10:38.850 2018: [E] Bad message
Tue Nov  6 15:10:38.851 2018: [X] The mmfs daemon is shutting down
abnormally.
Tue Nov  6 15:10:38.852 2018: [N] mmfsd is shutting down.
Tue Nov  6 15:10:38.853 2018: [N] Reason for shutdown: LOGSHUTDOWN called

The cluster is running various PTF Levels of 4.1.1.

Has anyone seen this before? I'm struggling to understand what it means
from a technical point of view. Was GPFS expecting a larger message than
it received? Did it receive all of the bytes it expected and some of it
was corrupt? It says "512 more bytes were available" but then doesn't
show any additional bytes.

Thanks!

-Aaron

--
Aaron Knister
NASA Center for Climate Simulation (Code 606.2)
Goddard Space Flight Center
(301) 286-2776
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss




-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20181111/eeecc70d/attachment-0002.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: graycol.gif
Type: image/gif
Size: 105 bytes
Desc: not available
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20181111/eeecc70d/attachment-0002.gif>


More information about the gpfsug-discuss mailing list