[gpfsug-discuss] Tuning: single client, single thread, small files - native Scale vs NFS

Mon Oct 15 23:34:50 BST 2018

Hi Alexander,

1. >>When writing to GPFS directly I'm able to write ~1800 files / second 
in a test setup. 
>>This is roughly the same on the protocol nodes (NSD client), as well as 
on the ESS IO nodes (NSD server). 

2. >> When writing to the NFS export on the protocol node itself (to avoid 
any network effects) I'm only able to write ~230 files / second.

IMHO #2, writing to the NFS export on the protocol node should be same as 
#1. Protocol node is also a NSD client and when you write from a protocol 
node, it will use the NSD protocol to write to the ESS IO nodes. In #1, 
you cite seeing ~1800 files from protocol node and in #2 you cite seeing 
~230 file/sec which seem to contradict each other. 

>>Writing to the NFS export from another node (now including network 
latency) gives me ~220 files / second.

IMHO, this workload "single client, single thread, small files, single 
directory - tar xf" is synchronous is nature and will result in single 
outstanding file to be sent from the NFS client to the CES node. Hence, 
the performance will be limited by network latency/capability between the 
NFS client and CES node for small IO size (~5KB file size). 

Also, what is the network interconnect/interface between the NFS client 
and CES node?  Is the network 10GigE since @220 file/s for 5KiB file-size 
will saturate 1 x 10GigE link. 

220 files/sec * 5KiB file size ==> ~1.126 GB/s. 

>> I'm aware that 'the real thing' would be to work with larger files in a 
multithreaded manner from multiple nodes - and that this scenario will 
scale quite well.

Yes, larger file-size + multiple threads + multiple NFS client nodes will 
help to scale performance further by having more NFS I/O requests 
scheduled/pipelined over the network and  processed on the  CES nodes. 

>> I just want to ensure that I'm not missing something obvious over 
reiterating that massage to customers.

Adding NFS experts/team, for advise. 

My two cents.

Best Regards,
-Kums

From:   "Alexander Saupp" <Alexander.Saupp at de.ibm.com>
To:     gpfsug-discuss at spectrumscale.org
Date:   10/15/2018 02:20 PM
Subject:        [gpfsug-discuss] Tuning: single client, single thread, 
small files - native Scale vs NFS
Sent by:        gpfsug-discuss-bounces at spectrumscale.org

Dear Spectrum Scale mailing list,

I'm part of IBM Lab Services - currently i'm having multiple customers 
asking me for optimization of a similar workloads.

The task is to tune a Spectrum Scale system (comprising ESS and CES 
protocol nodes) for the following workload: 
A single Linux NFS client mounts an NFS export, extracts a flat tar 
archive with lots of ~5KB files. 
I'm measuring the speed at which those 5KB files are written (`time tar xf 
archive.tar`). 

I do understand that Spectrum Scale is not designed for such workload 
(single client, single thread, small files, single directory), and that 
such benchmark in not appropriate to benmark the system. 
Yet I find myself explaining the performance for such scenario (git 
clone..) quite frequently, as customers insist that optimization of that 
scenario would impact individual users as it shows task duration.
I want to make sure that I have optimized the system as much as possible 
for the given workload, and that I have not overlooked something obvious.

When writing to GPFS directly I'm able to write ~1800 files / second in a 
test setup. 
This is roughly the same on the protocol nodes (NSD client), as well as on 
the ESS IO nodes (NSD server). 
When writing to the NFS export on the protocol node itself (to avoid any 
network effects) I'm only able to write ~230 files / second.
Writing to the NFS export from another node (now including network 
latency) gives me ~220 files / second.

There seems to be a huge performance degradation by adding NFS-Ganesha to 
the software stack alone. I wonder what can be done to minimize the 
impact.

- Ganesha doesn't seem to support 'async' or 'no_wdelay' options... 
anything equivalent available?
- Is there and expected advantage of using the network-latency tuned 
profile, as opposed to the ESS default throughput-performance profile?
- Are there other relevant Kernel params?
- Is there an expected advantage of raising the number of threads (NSD 
server (nsd*WorkerThreads) / NSD client (workerThreads) / Ganesha 
(NB_WORKER)) for the given workload (single client, single thread, small 
files)?
- Are there other relevant GPFS params?
- Impact of Sync replication, disk latency, etc is understood. 
- I'm aware that 'the real thing' would be to work with larger files in a 
multithreaded manner from multiple nodes - and that this scenario will 
scale quite well.
I just want to ensure that I'm not missing something obvious over 
reiterating that massage to customers.

Any help was greatly appreciated - thanks much in advance!
Alexander Saupp
IBM Germany

Mit freundlichen Grüßen / Kind regards

Alexander Saupp

IBM Systems, Storage Platform, EMEA Storage Competence Center

Phone:
+49 7034-643-1512
IBM Deutschland GmbH

Mobile:
+49-172 7251072
Am Weiher 24
Email:
alexander.saupp at de.ibm.com
65451 Kelsterbach

Germany

IBM Deutschland GmbH / Vorsitzender des Aufsichtsrats: Martin Jetter
Geschäftsführung: Matthias Hartmann (Vorsitzender), Norbert Janzen, Stefan 
Lutz, Nicole Reimer, Dr. Klaus Seifert, Wolfgang Wendt
Sitz der Gesellschaft: Ehningen / Registergericht: Amtsgericht Stuttgart, 
HRB 14562 / WEEE-Reg.-Nr. DE 99369940 

_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20181015/2f8ddb2d/attachment-0002.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: image/gif
Size: 45 bytes
Desc: not available
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20181015/2f8ddb2d/attachment-0008.gif>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: image/gif
Size: 1851 bytes
Desc: not available
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20181015/2f8ddb2d/attachment-0009.gif>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: image/gif
Size: 45 bytes
Desc: not available
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20181015/2f8ddb2d/attachment-0010.gif>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: image/gif
Size: 45 bytes
Desc: not available
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20181015/2f8ddb2d/attachment-0011.gif>