<div dir="ltr">Aaron, Thanks for jumping onboard. It's nice to see others confirming this. Sometimes I feel alone on this topic. <div><br><div>It's should also be possible to use ZFS with ZVOLs presented as block devices for a backing store for NSDs. I'm not claiming it's stable, nor a good idea, nor performant.. but should be possible. :) There are various reports about it. Might be at least worth looking in to compared to Linux "md raid" if one truly needs an all-software solution that already exists.  Something to think about and test over.</div></div></div><div class="gmail_extra"><br><div class="gmail_quote">On Wed, Nov 30, 2016 at 11:15 PM, Aaron Knister <span dir="ltr"><<a href="mailto:aaron.s.knister@nasa.gov" target="_blank">aaron.s.knister@nasa.gov</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Thanks Zach, I was about to echo similar sentiments and you saved me a ton of typing :)<br>

<br>

Bob, I know this doesn't help you today since I'm pretty sure its not yet available, but if one scours the interwebs they can find mention of something called Mestor.<br>

<br>

There's very very limited information here:<br>

<br>

- <a href="https://indico.cern.ch/event/531810/contributions/2306222/attachments/1357265/2053960/Spectrum_Scale-HEPIX_V1a.pdf" rel="noreferrer" target="_blank">https://indico.cern.ch/event/5<wbr>31810/contributions/2306222/at<wbr>tachments/1357265/2053960/Spec<wbr>trum_Scale-HEPIX_V1a.pdf</a><br>

- <a href="https://www.yumpu.com/en/document/view/5544551/ibm-system-x-gpfs-storage-server-stfc" rel="noreferrer" target="_blank">https://www.yumpu.com/en/docum<wbr>ent/view/5544551/ibm-system-x-<wbr>gpfs-storage-server-stfc</a> (slide 20)<br>

<br>

Sounds like if it were available it would fit this use case very well.<br>

<br>

I also had preliminary success with using sheepdog (<a href="https://sheepdog.github.io/sheepdog/" rel="noreferrer" target="_blank">https://sheepdog.github.io/sh<wbr>eepdog/</a>) as a backing store for GPFS in a similar situation. It's perhaps at a very high conceptually level similar to Mestor. You erasure code your data across the nodes w/ the SAS disks and then present those block devices to your NSD servers. I proved it could work but never tried to to much with it because the requirements changed.<br>

<br>

My money would be on your first option-- creating local RAIDs and then replicating to give you availability in the event a node goes offline.<br>

<br>

-Aaron<span class=""><br>

<br>

<br>

On 11/30/16 10:59 PM, Zachary Giles wrote:<br>

</span><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><span class="">

Just remember that replication protects against data availability, not<br>

integrity. GPFS still requires the underlying block device to return<br>

good data.<br>

<br>

If you're using it on plain disks (SAS or SSD), and the drive returns<br>

corrupt data, GPFS won't know any better and just deliver it to the<br>

client. Further, if you do a partial read followed by a write, both<br>

replicas could be destroyed. There's also no efficient way to force use<br>

of a second replica if you realize the first is bad, short of taking the<br>

first entirely offline. In that case while migrating data, there's no<br>

good way to prevent read-rewrite of other corrupt data on your drive<br>

that has the "good copy" while restriping off a faulty drive.<br>

<br>

Ideally RAID would have a goal of only returning data that passed the<br>

RAID algorithm, so shouldn't be corrupt, or made good by recreating from<br>

parity. However, as we all know RAID controllers are definitely prone to<br>

failures as well for many reasons, but at least a drive can go bad in<br>

various ways (bad sectors, slow, just dead, poor SSD cell wear, etc)<br>

without (hopefully) silent corruption..<br>

<br>

Just something to think about while considering replication ..<br>

<br>

<br>

<br>

On Wed, Nov 30, 2016 at 11:28 AM, Uwe Falke <<a href="mailto:UWEFALKE@de.ibm.com" target="_blank">UWEFALKE@de.ibm.com</a><br></span><div><div class="h5">

<mailto:<a href="mailto:UWEFALKE@de.ibm.com" target="_blank">UWEFALKE@de.ibm.com</a>>> wrote:<br>

<br>

    I have once set up a small system with just a few SSDs in two NSD<br>

    servers,<br>

    providin a scratch file system in a computing cluster.<br>

    No RAID, two replica.<br>

    works, as long the admins do not do silly things (like rebooting servers<br>

    in sequence without checking for disks being up in between).<br>

    Going for RAIDs without GPFS replication protects you against single<br>

    disk<br>

    failures, but you're lost if just one of your NSD servers goes off.<br>

<br>

    FPO makes sense only sense IMHO if your NSD servers are also processing<br>

    the data (and then you need to control that somehow).<br>

<br>

    Other ideas? what else can you do with GPFS and local disks than<br>

    what you<br>

    considered? I suppose nothing reasonable ...<br>

<br>

<br>

    Mit freundlichen Grüßen / Kind regards<br>

<br>

<br>

    Dr. Uwe Falke<br>

<br>

    IT Specialist<br>

    High Performance Computing Services / Integrated Technology Services /<br>

    Data Center Services<br>

    ------------------------------<wbr>------------------------------<wbr>------------------------------<wbr>------------------------------<wbr>-------------------<br>

    IBM Deutschland<br>

    Rathausstr. 7<br>

    09111 Chemnitz<br></div></div>

    Phone: <a href="tel:%2B49%20371%206978%202165" value="+4937169782165" target="_blank">+49 371 6978 2165</a> <tel:%2B49%20371%206978%202165<wbr>><br>

    Mobile: <a href="tel:%2B49%20175%20575%202877" value="+491755752877" target="_blank">+49 175 575 2877</a> <tel:%2B49%20175%20575%202877><br>

    E-Mail: <a href="mailto:uwefalke@de.ibm.com" target="_blank">uwefalke@de.ibm.com</a> <mailto:<a href="mailto:uwefalke@de.ibm.com" target="_blank">uwefalke@de.ibm.com</a>><span class=""><br>

    ------------------------------<wbr>------------------------------<wbr>------------------------------<wbr>------------------------------<wbr>-------------------<br>

    IBM Deutschland Business & Technology Services GmbH / Geschäftsführung:<br>

    Frank Hammer, Thorsten Moehring<br>

    Sitz der Gesellschaft: Ehningen / Registergericht: Amtsgericht<br>

    Stuttgart,<br>

    HRB 17122<br>

<br>

<br>

<br>

<br>

    From:   "Oesterlin, Robert" <<a href="mailto:Robert.Oesterlin@nuance.com" target="_blank">Robert.Oesterlin@nuance.com</a><br></span>

    <mailto:<a href="mailto:Robert.Oesterlin@nuance.com" target="_blank">Robert.Oesterlin@nuanc<wbr>e.com</a>>><span class=""><br>

    To:     gpfsug main discussion list<br>

    <<a href="mailto:gpfsug-discuss@spectrumscale.org" target="_blank">gpfsug-discuss@spectrumscale.<wbr>org</a><br></span>

    <mailto:<a href="mailto:gpfsug-discuss@spectrumscale.org" target="_blank">gpfsug-discuss@spectru<wbr>mscale.org</a>>><span class=""><br>

    Date:   11/30/2016 03:34 PM<br>

    Subject:        [gpfsug-discuss] Strategies - servers with local SAS<br>

    disks<br>

    Sent by:        <a href="mailto:gpfsug-discuss-bounces@spectrumscale.org" target="_blank">gpfsug-discuss-bounces@spectru<wbr>mscale.org</a><br></span>

    <mailto:<a href="mailto:gpfsug-discuss-bounces@spectrumscale.org" target="_blank">gpfsug-discuss-bounces<wbr>@spectrumscale.org</a>><span class=""><br>

<br>

<br>

<br>

    Looking for feedback/strategies in setting up several GPFS servers with<br>

    local SAS. They would all be part of the same file system. The<br>

    systems are<br>

    all similar in configuration - 70 4TB drives.<br>

<br>

    Options I?m considering:<br>

<br>

    - Create RAID arrays of the disks on each server (worried about the RAID<br>

    rebuild time when a drive fails with 4, 6, 8TB drives)<br>

    - No RAID with 2 replicas, single drive per NSD. When a drive fails,<br>

    recreate the NSD ? but then I need to fix up the data replication via<br>

    restripe<br>

    - FPO ? with multiple failure groups -  letting the system manage<br>

    replica<br>

    placement and then have GPFS due the restripe on disk failure<br>

    automatically<br>

<br>

    Comments or other ideas welcome.<br>

<br>

    Bob Oesterlin<br>

    Sr Principal Storage Engineer, Nuance<br></span>

    <a href="tel:507-269-0413" value="+15072690413" target="_blank">507-269-0413</a> <tel:<a href="tel:507-269-0413" value="+15072690413" target="_blank">507-269-0413</a>><br>

<br>

     _____________________________<wbr>__________________<br>

    gpfsug-discuss mailing list<br>

    gpfsug-discuss at <a href="http://spectrumscale.org" rel="noreferrer" target="_blank">spectrumscale.org</a> <<a href="http://spectrumscale.org" rel="noreferrer" target="_blank">http://spectrumscale.org</a>><br>

    <a href="http://gpfsug.org/mailman/listinfo/gpfsug-discuss" rel="noreferrer" target="_blank">http://gpfsug.org/mailman/list<wbr>info/gpfsug-discuss</a><span class=""><br>

    <<a href="http://gpfsug.org/mailman/listinfo/gpfsug-discuss" rel="noreferrer" target="_blank">http://gpfsug.org/mailman/lis<wbr>tinfo/gpfsug-discuss</a>><br>

<br>

<br>

<br>

<br>

    ______________________________<wbr>_________________<br>

    gpfsug-discuss mailing list<br></span>

    gpfsug-discuss at <a href="http://spectrumscale.org" rel="noreferrer" target="_blank">spectrumscale.org</a> <<a href="http://spectrumscale.org" rel="noreferrer" target="_blank">http://spectrumscale.org</a>><br>

    <a href="http://gpfsug.org/mailman/listinfo/gpfsug-discuss" rel="noreferrer" target="_blank">http://gpfsug.org/mailman/list<wbr>info/gpfsug-discuss</a><span class=""><br>

    <<a href="http://gpfsug.org/mailman/listinfo/gpfsug-discuss" rel="noreferrer" target="_blank">http://gpfsug.org/mailman/lis<wbr>tinfo/gpfsug-discuss</a>><br>

<br>

<br>

<br>

<br>

--<br>

Zach Giles<br>

</span><a href="mailto:zgiles@gmail.com" target="_blank">zgiles@gmail.com</a> <mailto:<a href="mailto:zgiles@gmail.com" target="_blank">zgiles@gmail.com</a>><span class=""><br>

<br>

<br>

______________________________<wbr>_________________<br>

gpfsug-discuss mailing list<br>

gpfsug-discuss at <a href="http://spectrumscale.org" rel="noreferrer" target="_blank">spectrumscale.org</a><br>

<a href="http://gpfsug.org/mailman/listinfo/gpfsug-discuss" rel="noreferrer" target="_blank">http://gpfsug.org/mailman/list<wbr>info/gpfsug-discuss</a><br>

<br>

</span></blockquote><span class="HOEnZb"><font color="#888888">

<br>

-- <br>

Aaron Knister<br>

NASA Center for Climate Simulation (Code 606.2)<br>

Goddard Space Flight Center<br>

<a href="tel:%28301%29%20286-2776" value="+13012862776" target="_blank">(301) 286-2776</a></font></span><div class="HOEnZb"><div class="h5"><br>

______________________________<wbr>_________________<br>

gpfsug-discuss mailing list<br>

gpfsug-discuss at <a href="http://spectrumscale.org" rel="noreferrer" target="_blank">spectrumscale.org</a><br>

<a href="http://gpfsug.org/mailman/listinfo/gpfsug-discuss" rel="noreferrer" target="_blank">http://gpfsug.org/mailman/list<wbr>info/gpfsug-discuss</a><br>

</div></div></blockquote></div><br><br clear="all"><div><br></div>-- <br><div class="gmail_signature" data-smartmail="gmail_signature">Zach Giles<br><a href="mailto:zgiles@gmail.com" target="_blank">zgiles@gmail.com</a></div>

</div>