[gpfsug-discuss] Same file opened by many nodes / processes

IBM Spectrum Scale scale at us.ibm.com
Tue Jul 10 23:15:01 BST 2018


Regarding the permissions on the file I assume you are not using ACLs, 
correct?  If you are then you would need to check what the ACL allows.

Is your metadata on separate NSDs?  Having metadata on separate NSDs, and 
preferably fast NSDs, would certainly help your mmbackup scanning.

Have you looked at the information from netstat or similar network tools 
to see how your network is performing?  Faster networks generally require 
a bit of OS tuning and some GPFS tuning to optimize their performance.



Regards, The Spectrum Scale (GPFS) team

------------------------------------------------------------------------------------------------------------------
If you feel that your question can benefit other users of  Spectrum Scale 
(GPFS), then please post it to the public IBM developerWroks Forum at 
https://www.ibm.com/developerworks/community/forums/html/forum?id=11111111-0000-0000-0000-000000000479
. 

If your query concerns a potential software error in Spectrum Scale (GPFS) 
and you have an IBM software maintenance contract please contact 
1-800-237-5511 in the United States or your local IBM Service Center in 
other countries. 

The forum is informally monitored as time permits and should not be used 
for priority messages to the Spectrum Scale (GPFS) team.



From:   Peter Childs <p.childs at qmul.ac.uk>
To:     gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
Date:   07/10/2018 05:23 PM
Subject:        Re: [gpfsug-discuss] Same file opened by many nodes / 
processes
Sent by:        gpfsug-discuss-bounces at spectrumscale.org



Oh the cluster is 296 nodes currently with a set size of 300 (mmcrfs -n 
300)

We're currently looking to upgrade the 1G connected nodes to 10G within 
the next few months.



Peter Childs
Research Storage
ITS Research and Teaching Support
Queen Mary, University of London


---- Peter Childs wrote ----

The reason I think the metanode is moving around is I'd done a limited 
amount of trying to track it down using "mmfsadm saferdump file" and it 
moved before I'd tracked down the correct metanode. But I might have been 
chasing ghosts, so it may be operating normally and nothing to worry 
about.

The user reading the file only has read access to it from the file 
permissions, 

Mmbackup has only slowed down while this job has been running. As I say 
the scan for what to backup usally takes 40-60 minutes, but is currently 
taking 3-4 hours with these jobs running. I've seen it take 3 days when 
our storage went bad (slow and failing disks) but that is usally a sign of 
a bad disk and pulling the disk and rebuilding the RAID "fixed" that 
straight away. I cant see anything like that currently however.

It might be that its network congestion were suffering from and nothing to 
do with token management but as the mmpmon bytes read data is running very 
high with this job and the load is spread over 50+ nodes it's difficult to 
see one culprit. It's a mixed speed ethernet network mainly 10GB connected 
although the nodes in question are legacy with only 1GB connections (and 
40GB to the back of the storage.

We're currently running 4.2.3-8

Peter Childs
Research Storage
ITS Research and Teaching Support
Queen Mary, University of London

---- IBM Spectrum Scale wrote ----

What is in the dump that indicates the metanode is moving around?  Could 
you please provide an example of what you are seeing?

You noted that the access is all read only, is the file opened for read 
only or for read and write?

What makes you state that this particular file is interfering with the 
scan done by mmbackup?  Reading a file, no matter how large should 
significantly impact a policy scan.

What version of Spectrum Scale are you running and how large is your 
cluster?

Regards, The Spectrum Scale (GPFS) team

------------------------------------------------------------------------------------------------------------------
If you feel that your question can benefit other users of  Spectrum Scale 
(GPFS), then please post it to the public IBM developerWroks Forum at 
https://www.ibm.com/developerworks/community/forums/html/forum?id=11111111-0000-0000-0000-000000000479
. 

If your query concerns a potential software error in Spectrum Scale (GPFS) 
and you have an IBM software maintenance contract please contact 
1-800-237-5511 in the United States or your local IBM Service Center in 
other countries. 

The forum is informally monitored as time permits and should not be used 
for priority messages to the Spectrum Scale (GPFS) team.



From:        Peter Childs <p.childs at qmul.ac.uk>
To:        "gpfsug-discuss at spectrumscale.org" 
<gpfsug-discuss at spectrumscale.org>
Date:        07/10/2018 10:51 AM
Subject:        [gpfsug-discuss] Same file opened by many nodes / 
processes
Sent by:        gpfsug-discuss-bounces at spectrumscale.org



We have an situation where the same file is being read by around 5000
"jobs" this is an array job in uge with a tc set, so the file in
question is being opened by about 100 processes/jobs at the same time.

Its a ~200GB file so copying the file locally first is not an easy
answer, and these jobs are causing issues with mmbackup scanning the
file system, in that the scan is taking 3 hours instead of the normal
40-60 minutes.

This is read only access to the file, I don't know the specifics about
the job.

It looks like the metanode is moving around a fair amount (given what I
can see from mmfsadm saferdump file)

I'm wondering if we there is anything we can do to improve things or
that can be tuned within GPFS, I'm don't think we have an issue with
token management, but would increasing maxFileToCache on our token
manager node help say?

Is there anything else I should look at, to try and attempt to allow
GPFS to share this file better.

Thanks in advance

Peter Childs

-- 
Peter Childs
ITS Research Storage
Queen Mary, University of London
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss



_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss





-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20180710/1bd47b73/attachment-0002.htm>


More information about the gpfsug-discuss mailing list