[gpfsug-discuss] HAWC (Highly available write cache)

Mon Aug 1 20:50:09 BST 2016

Hi Tejas,

Do you know the workload in the VM?

The workload which enters into HAWC may or may not be the same as the
workload that eventually goes into the data pool....it all depends on
whether the 4KB writes entering HAWC can be coalesced or not.  For example,
sequential 4KB writes can all be coalesced into a single large chunk.  So
4KB writes into HAWC will convert into 8MB writes to data pool (in your
system).  But random 4KB writes into HAWC may end up being 4KB writes into
the data pool if there are no adjoining 4KB writes (i.e., if 4KB blocks are
all dispersed, they can't be coalesced).  The goal of HAWC though, whether
the 4KB blocks are coalesced or not, is to reduce app latency by ensuring
that writing the blocks back to the data pool is done in the background.
So while 4KB blocks may still be hitting the data pool, hopefully the
application is seeing the latency of your presumably lower latency system
pool.

Dean

From:	Tejas Rao <raot at bnl.gov>
To:	gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
Date:	08/01/2016 12:06 PM
Subject:	Re: [gpfsug-discuss] HAWC (Highly available write cache)
Sent by:	gpfsug-discuss-bounces at spectrumscale.org

In my case GPFS storage is used to store VM images (KVM) and hence the
small IO.

I always see lots of small 4K writes and the GPFS filesystem block size is
8MB. I thought the reason for the small writes is that the linux kernel
requests GPFS to initiate a periodic sync which by default is every 5
seconds and can be controlled by "vm.dirty_writeback_centisecs".

I thought HAWC would help in such cases and would harden (coalesce) the
small writes in the "system" pool and would flush to the "data" pool in
larger block size.

Note - I am not doing direct i/o explicitly.

On 8/1/2016 14:49, Sven Oehme wrote:
      when you say 'synchronous write' what do you mean by that  ?
      if you are talking about using direct i/o (O_DIRECT flag), they don't
      leverage HAWC data path, its by design.

      sven

      On Mon, Aug 1, 2016 at 11:36 AM, Tejas Rao <raot at bnl.gov> wrote:
        I have enabled write cache (HAWC) by running the below commands.
        The recovery logs are supposedly placed in the replicated system
        metadata pool (SSDs). I do not have a "system.log" pool as it is
        only needed if recovery logs are stored on the client nodes.

        mmchfs gpfs01 --write-cache-threshold 64K
        mmchfs gpfs01 -L 1024M
        mmchconfig logPingPongSector=no

        I have recycled the daemon on all nodes in the cluster (including
        the NSD nodes).

        I still see small synchronous writes (4K) from the clients going to
        the data drives (data pool). I am checking this by looking at
        "mmdiag --iohist" output. Should they not be going to the system
        pool?

        Do I need to do something else? How can I confirm that HAWC is
        working as advertised?

        Thanks.

        _______________________________________________
        gpfsug-discuss mailing list
        gpfsug-discuss at spectrumscale.org
        http://gpfsug.org/mailman/listinfo/gpfsug-discuss

      _______________________________________________
      gpfsug-discuss mailing list
      gpfsug-discuss at spectrumscale.org
      http://gpfsug.org/mailman/listinfo/gpfsug-discuss

_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20160801/64ce99c4/attachment-0002.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: graycol.gif
Type: image/gif
Size: 105 bytes
Desc: not available
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20160801/64ce99c4/attachment-0002.gif>