[gpfsug-discuss] Looking for a way to see which node is having an impact on server?
Vic Cornell
viccornell at gmail.com
Tue Dec 10 10:13:20 GMT 2013
Have you looked at mmpmon? Its a bit much for 600 nodes but if you run it with a reasonable interface specified then the output shouldn't be too hard to parse.
Quick recipe:
create a file called mmpmon.conf that looks like
################# cut here #########################
nlist add node1 node2 node3 node4 node5
io_s
reset
################# cut here #########################
Where node1,node2 etc are your node names - it might be as well to do this for batches of 50 or so.
then run something like:
/usr/lpp/mmfs/bin/mmpmon -i mmpmon.conf -d 10000 -r 0 -p
That will give you a set of stats for all of your named nodes aggregated over a 10 second period
Dont run more than one of these as each one will reset the stats for the other :-)
parse out the stats with something like:
awk -F_ '{if ($2=="io"){print $8,$16/1024/1024,$18/1024/1024}}'
which will give you read and write throughput.
The docs (GPFS advanced Administration Guide) are reasonable.
Cheers,
Vic Cornell
viccornell at gmail.com
On 9 Dec 2013, at 19:52, Alex Chekholko <chekh at stanford.edu> wrote:
> Hi Richard,
>
> I would just use something like 'iftop' to look at the traffic between the nodes. Or 'collectl'. Or 'dstat'.
>
> e.g. dstat -N eth0 --gpfs --gpfs-ops --top-cpu-adv --top-io 2 10
> http://dag.wiee.rs/home-made/dstat/
>
> For the NSD balance question, since GPFS stripes the blocks evenly across all the NSDs, they will end up balanced over time. Or you can rebalance manually with 'mmrestripefs -b' or similar.
>
> It is unlikely that particular files ended up on a single NSD, unless the other NSDs are totally full.
>
> Regards,
> Alex
>
> On 12/06/2013 04:31 PM, Richard Lefebvre wrote:
>> Hi,
>>
>> I'm looking for a way to see which node (or nodes) is having an impact
>> on the gpfs server nodes which is slowing the whole file system? What
>> happens, usually, is a user is doing some I/O that doesn't fit the
>> configuration of the gpfs file system and the way it was explain on how
>> to use it efficiently. It is usually by doing a lot of unbuffered byte
>> size, very random I/O on the file system that was made for large files
>> and large block size.
>>
>> My problem is finding out who is doing that. I haven't found a way to
>> pinpoint the node or nodes that could be the source of the problem, with
>> over 600 client nodes.
>>
>> I tried to use "mmlsnodes -N waiters -L" but there is too much waiting
>> that I cannot pinpoint on something.
>>
>> I must be missing something simple. Anyone got any help?
>>
>> Note: there is another thing I'm trying to pinpoint. A temporary
>> imbalance was created by adding a new NSD. It seems that a group of
>> files have been created on that same NSD and a user keeps hitting that
>> NSD causing a high load. I'm trying to pinpoint the origin of that too.
>> At least until everything is balance back. But will balancing spread
>> those files since they are already on the most empty NSD?
>>
>> Richard
>> _______________________________________________
>> gpfsug-discuss mailing list
>> gpfsug-discuss at gpfsug.org
>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>>
>
> --
> Alex Chekholko chekh at stanford.edu
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at gpfsug.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
More information about the gpfsug-discuss
mailing list