[gpfsug-discuss] GPFS and Flash/SSD Storage tiered storage

Alex Chekholko alex at calicolabs.com
Tue Feb 27 22:25:30 GMT 2018


Hi,

My experience has been that you could spend the same money to just make
your main pool more performant.  Instead of doing two data transfers (one
from cold pool to AFM or hot pools, one from AFM/hot to client), you can
just make the direct access of the data faster by adding more resources to
your main pool.

Regards,
Alex

On Thu, Feb 22, 2018 at 5:27 PM, <valleru at cbio.mskcc.org> wrote:

> Thanks, I will try the file heat feature but i am really not sure, if it
> would work - since the code can access cold files too, and not necessarily
> files recently accessed/hot files.
>
> With respect to LROC. Let me explain as below:
>
> The use case is that -
> The code initially reads headers (small region of data) from thousands of
> files as the first step. For example about 30,000 of them with each about
> 300MB to 500MB in size.
> After the first step, with the help of those headers - it mmaps/seeks
> across various regions of a set of files in parallel.
> Since its all small IOs and it was really slow at reading from GPFS over
> the network directly from disks - Our idea was to use AFM which i believe
> fetches all file data into flash/ssds, once the initial few blocks of the
> files are read.
> But again - AFM seems to not solve the problem, so i want to know if LROC
> behaves in the same way as AFM, where all of the file data is prefetched in
> full block size utilizing all the worker threads  - if few blocks of the
> file is read initially.
>
> Thanks,
> Lohit
>
> On Feb 22, 2018, 4:52 PM -0500, IBM Spectrum Scale <scale at us.ibm.com>,
> wrote:
>
> My apologies for not being more clear on the flash storage pool.  I meant
> that this would be just another GPFS storage pool in the same cluster, so
> no separate AFM cache cluster.  You would then use the file heat feature to
> ensure more frequently accessed files are migrated to that all flash
> storage pool.
>
> As for LROC could you please clarify what you mean by a few headers/stubs
> of the file?  In reading the LROC documentation and the LROC variables
> available in the mmchconfig command I think you might want to take a look a
> the lrocDataStubFileSize variable since it seems to apply to your situation.
>
> Regards, The Spectrum Scale (GPFS) team
>
> ------------------------------------------------------------
> ------------------------------------------------------
> If you feel that your question can benefit other users of  Spectrum Scale
> (GPFS), then please post it to the public IBM developerWroks Forum at
> https://www.ibm.com/developerworks/community/
> forums/html/forum?id=11111111-0000-0000-0000-000000000479.
>
> If your query concerns a potential software error in Spectrum Scale (GPFS)
> and you have an IBM software maintenance contract please contact
> 1-800-237-5511 <(800)%20237-5511> in the United States or your local IBM
> Service Center in other countries.
>
> The forum is informally monitored as time permits and should not be used
> for priority messages to the Spectrum Scale (GPFS) team.
>
>
>
> From:        valleru at cbio.mskcc.org
> To:        gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
> Cc:        gpfsug-discuss-bounces at spectrumscale.org
> Date:        02/22/2018 04:21 PM
> Subject:        Re: [gpfsug-discuss] GPFS and Flash/SSD Storage tiered
> storage
> Sent by:        gpfsug-discuss-bounces at spectrumscale.org
> ------------------------------
>
>
>
> Thank you.
>
> I am sorry if i was not clear, but the metadata pool is all on SSDs in the
> GPFS clusters that we use. Its just the data pool that is on Near-Line
> Rotating disks.
> I understand that AFM might not be able to solve the issue, and I will try
> and see if file heat works for migrating the files to flash tier.
> You mentioned an all flash storage pool for heavily used files - so you
> mean a different GPFS cluster just with flash storage, and to manually copy
> the files to flash storage whenever needed?
> The IO performance that i am talking is prominently for reads, so you
> mention that LROC can work in the way i want it to? that is prefetch all
> the files into LROC cache, after only few headers/stubs of data are read
> from those files?
> I thought LROC only keeps that block of data that is prefetched from the
> disk, and will not prefetch the whole file if a stub of data is read.
> Please do let me know, if i understood it wrong.
>
> On Feb 22, 2018, 4:08 PM -0500, IBM Spectrum Scale <scale at us.ibm.com>,
> wrote:
> I do not think AFM is intended to solve the problem you are trying to
> solve.  If I understand your scenario correctly you state that you are
> placing metadata on NL-SAS storage.  If that is true that would not be wise
> especially if you are going to do many metadata operations.  I suspect your
> performance issues are partially due to the fact that metadata is being
> stored on NL-SAS storage.  You stated that you did not think the file heat
> feature would do what you intended but have you tried to use it to see if
> it could solve your problem?  I would think having metadata on SSD/flash
> storage combined with a all flash storage pool for your heavily used files
> would perform well.  If you expect IO usage will be such that there will be
> far more reads than writes then LROC should be beneficial to your overall
> performance.
>
> Regards, The Spectrum Scale (GPFS) team
>
> ------------------------------------------------------------
> ------------------------------------------------------
> If you feel that your question can benefit other users of  Spectrum Scale
> (GPFS), then please post it to the public IBM developerWroks Forum at
> *https://www.ibm.com/developerworks/community/forums/html/forum?id=11111111-0000-0000-0000-000000000479*
> <https://www.ibm.com/developerworks/community/forums/html/forum?id=11111111-0000-0000-0000-000000000479>
> .
>
> If your query concerns a potential software error in Spectrum Scale (GPFS)
> and you have an IBM software maintenance contract please contact
> 1-800-237-5511 <(800)%20237-5511> in the United States or your local IBM
> Service Center in other countries.
>
> The forum is informally monitored as time permits and should not be used
> for priority messages to the Spectrum Scale (GPFS) team.
>
>
>
> From:        valleru at cbio.mskcc.org
> To:        gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
> Date:        02/22/2018 03:11 PM
> Subject:        [gpfsug-discuss] GPFS and Flash/SSD Storage tiered storage
> Sent by:        gpfsug-discuss-bounces at spectrumscale.org
> ------------------------------
>
>
>
> Hi All,
>
> I am trying to figure out a GPFS tiering architecture with flash storage
> in front end and near line storage as backend, for Supercomputing
>
> The Backend storage will be a GPFS storage on near line of about 8-10PB.
> The backend storage will/can be tuned to give out large streaming bandwidth
> and enough metadata disks to make the stat of all these files fast enough.
>
> I was thinking if it would be possible to use a GPFS flash cluster or GPFS
> SSD cluster in front end that uses AFM and acts as a cache cluster with the
> backend GPFS cluster.
>
> At the end of this .. the workflow that i am targeting is where:
>
>
>> If the compute nodes read headers of thousands of large files ranging from
> 100MB to 1GB, the AFM cluster should be able to bring up enough threads to
> bring up all of the files from the backend to the faster SSD/Flash GPFS
> cluster.
> The working set might be about 100T, at a time which i want to be on a
> faster/low latency tier, and the rest of the files to be in slower tier
> until they are read by the compute nodes.
>>
>
> I do not want to use GPFS policies to achieve the above, is because i am
> not sure - if policies could be written in a way, that files are moved from
> the slower tier to faster tier depending on how the jobs interact with the
> files.
> I know that the policies could be written depending on the heat, and
> size/format but i don’t think thes policies work in a similar way as above.
>
> I did try the above architecture, where an SSD GPFS cluster acts as an AFM
> cache cluster before the near line storage. However the AFM cluster was
> really really slow, It took it about few hours to copy the files from near
> line storage to AFM cache cluster.
> I am not sure if AFM is not designed to work this way, or if AFM is not
> tuned to work as fast as it should.
>
> I have tried LROC too, but it does not behave the same way as i guess AFM
> works.
>
> Has anyone tried or know if GPFS supports an architecture - where the fast
> tier can bring up thousands of threads and copy the files almost
> instantly/asynchronously from the slow tier, whenever the jobs from compute
> nodes reads few blocks from these files?
> I understand that with respect to hardware - the AFM cluster should be
> really fast, as well as the network between the AFM cluster and the backend
> cluster.
>
> Please do also let me know, if the above workflow can be done using GPFS
> policies and be as fast as it is needed to be.
>
> Regards,
> Lohit
>
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org
>
> *https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=IbxtjdkPAM2Sbon4Lbbi4w&m=kMYZhGPhwadAbNHucw79NJgyYAJAMgxyFZKEW-kMeqk&s=AT1gb89TzzE7nt58h8DYyhYkybvBY8mbXvdPjtaRRpU&e=*
> <https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=IbxtjdkPAM2Sbon4Lbbi4w&m=kMYZhGPhwadAbNHucw79NJgyYAJAMgxyFZKEW-kMeqk&s=AT1gb89TzzE7nt58h8DYyhYkybvBY8mbXvdPjtaRRpU&e=>
>
>
>
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss_______
> ________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org
> https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.
> org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=
> IbxtjdkPAM2Sbon4Lbbi4w&m=DuqESC-4ycoY5GoHpYeH1T8baq0JWY8QfkN8z
> 6b8jPw&s=zNUAH3mFyzxcvXtrep_OroKiwR88QouIrcdN8TLJK8M&e=
>
>
>
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>
>
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20180227/be7c09c4/attachment-0002.htm>


More information about the gpfsug-discuss mailing list