<html><body><p><font size="2">Hi<br></font><font size="2"><br></font><font size="2">For reads only have you look at possibility of using LROC?  <br></font><font size="2"><br></font><font size="2">For writes in the setup you mention you are down to maximum of half your network speed (best case) assuming no restripes  no reboots on going at any given time. <br></font><font size="2"><br></font><font size="2"><br></font><font size="2"><br></font><font size="2">--<br></font><font size="2">Ystävällisin terveisin / Kind regards / Saludos cordiales / Salutations<br></font><font size="2">Luis Bolinches<br></font><font size="2">Consultant IT Specialist<br></font><font size="2">Mobile Phone: +358503112585<br></font><font size="2"><a href="https://www.youracclaim.com/user/luis-bolinches">https://www.youracclaim.com/user/luis-bolinches</a><br></font><font size="2"><br></font><font size="2">"If you always give you will always have" --  Anonymous<br></font><font size="2"><br></font><font size="2">> On 14 Mar 2018, at 5.28, Lukas Hejtmanek <xhejtman@ics.muni.cz> wrote:<br></font><font size="2">> <br></font><font size="2">> Hello,<br></font><font size="2">> <br></font><font size="2">> thank you for insight. Well, the point is, that I will get ~60 with 120 NVMe<br></font><font size="2">> disks in it, each about 2TB size. It means that I will have 240TB in NVMe SSD<br></font><font size="2">> that could build nice shared scratch. Moreover, I have no different HW or place <br></font><font size="2">> to put these SSDs into. They have to be in the compute nodes.<br></font><font size="2">> <br></font><font size="2">>> On Tue, Mar 13, 2018 at 10:48:21AM -0700, Alex Chekholko wrote:<br></font><font size="2">>> I would like to discourage you from building a large distributed clustered<br></font><font size="2">>> filesystem made of many unreliable components.  You will need to<br></font><font size="2">>> overprovision your interconnect and will also spend a lot of time in<br></font><font size="2">>> "healing" or "degraded" state.<br></font><font size="2">>> <br></font><font size="2">>> It is typically cheaper to centralize the storage into a subset of nodes<br></font><font size="2">>> and configure those to be more highly available.  E.g. of your 60 nodes,<br></font><font size="2">>> take 8 and put all the storage into those and make that a dedicated GPFS<br></font><font size="2">>> cluster with no compute jobs on those nodes.  Again, you'll still need<br></font><font size="2">>> really beefy and reliable interconnect to make this work.<br></font><font size="2">>> <br></font><font size="2">>> Stepping back; what is the actual problem you're trying to solve?  I have<br></font><font size="2">>> certainly been in that situation before, where the problem is more like: "I<br></font><font size="2">>> have a fixed hardware configuration that I can't change, and I want to try<br></font><font size="2">>> to shoehorn a parallel filesystem onto that."<br></font><font size="2">>> <br></font><font size="2">>> I would recommend looking closer at your actual workloads.  If this is a<br></font><font size="2">>> "scratch" filesystem and file access is mostly from one node at a time,<br></font><font size="2">>> it's not very useful to make two additional copies of that data on other<br></font><font size="2">>> nodes, and it will only slow you down.<br></font><font size="2">>> <br></font><font size="2">>> Regards,<br></font><font size="2">>> Alex<br></font><font size="2">>> <br></font><font size="2">>> On Tue, Mar 13, 2018 at 7:16 AM, Lukas Hejtmanek <xhejtman@ics.muni.cz><br></font><font size="2">>> wrote:<br></font><font size="2">>> <br></font><font size="2">>>>> On Tue, Mar 13, 2018 at 10:37:43AM +0000, John Hearns wrote:<br></font><font size="2">>>>> Lukas,<br></font><font size="2">>>>> It looks like you are proposing a setup which uses your compute servers<br></font><font size="2">>>> as storage servers also?<br></font><font size="2">>>> <br></font><font size="2">>>> yes, exactly. I would like to utilise NVMe SSDs that are in every compute<br></font><font size="2">>>> servers.. Using them as a shared scratch area with GPFS is one of the<br></font><font size="2">>>> options.<br></font><font size="2">>>> <br></font><font size="2">>>>> <br></font><font size="2">>>>>  *   I'm thinking about the following setup:<br></font><font size="2">>>>> ~ 60 nodes, each with two enterprise NVMe SSDs, FDR IB interconnected<br></font><font size="2">>>>> <br></font><font size="2">>>>> There is nothing wrong with this concept, for instance see<br></font><font size="2">>>>> <a href="https://urldefense.proofpoint.com/v2/url?u=https-3A__www.beegfs.io_wiki_BeeOND&d=DwIFBA&c=jf_iaSHvJObTbx-siA1ZOg&r=1mZ896psa5caYzBeaugTlc7TtRejJp3uvKYxas3S7Xc&m=clDRNGIKhf6SYQ2ZZpZvBniUiqx1GU1bYEUbcbCunuo&s=ZUDwVonh6dmGRFw0n9p9QPC2-DFuVyY75gOuD02c07I&e=">https://urldefense.proofpoint.com/v2/url?u=https-3A__www.beegfs.io_wiki_BeeOND&d=DwIFBA&c=jf_iaSHvJObTbx-siA1ZOg&r=1mZ896psa5caYzBeaugTlc7TtRejJp3uvKYxas3S7Xc&m=clDRNGIKhf6SYQ2ZZpZvBniUiqx1GU1bYEUbcbCunuo&s=ZUDwVonh6dmGRFw0n9p9QPC2-DFuVyY75gOuD02c07I&e=</a><br></font><font size="2">>>>> <br></font><font size="2">>>>> I have an NVMe filesystem which uses 60 drives, but there are 10 servers.<br></font><font size="2">>>>> You should look at "failure zones" also.<br></font><font size="2">>>> <br></font><font size="2">>>> you still need the storage servers and local SSDs to use only for caching,<br></font><font size="2">>>> do<br></font><font size="2">>>> I understand correctly?<br></font><font size="2">>>> <br></font><font size="2">>>>> <br></font><font size="2">>>>> From: gpfsug-discuss-bounces@spectrumscale.org [<a href="mailto:gpfsug-discuss-">mailto:gpfsug-discuss-</a><br></font><font size="2">>>> bounces@spectrumscale.org] On Behalf Of Knister, Aaron S.<br></font><font size="2">>>> (GSFC-606.2)[COMPUTER SCIENCE CORP]<br></font><font size="2">>>>> Sent: Monday, March 12, 2018 4:14 PM<br></font><font size="2">>>>> To: gpfsug main discussion list <gpfsug-discuss@spectrumscale.org><br></font><font size="2">>>>> Subject: Re: [gpfsug-discuss] Preferred NSD<br></font><font size="2">>>>> <br></font><font size="2">>>>> Hi Lukas,<br></font><font size="2">>>>> <br></font><font size="2">>>>> Check out FPO mode. That mimics Hadoop's data placement features. You<br></font><font size="2">>>> can have up to 3 replicas both data and metadata but still the downside,<br></font><font size="2">>>> though, as you say is the wrong node failures will take your cluster down.<br></font><font size="2">>>>> <br></font><font size="2">>>>> You might want to check out something like Excelero's NVMesh (note: not<br></font><font size="2">>>> an endorsement since I can't give such things) which can create logical<br></font><font size="2">>>> volumes across all your NVMe drives. The product has erasure coding on<br></font><font size="2">>>> their roadmap. I'm not sure if they've released that feature yet but in<br></font><font size="2">>>> theory it will give better fault tolerance *and* you'll get more efficient<br></font><font size="2">>>> usage of your SSDs.<br></font><font size="2">>>>> <br></font><font size="2">>>>> I'm sure there are other ways to skin this cat too.<br></font><font size="2">>>>> <br></font><font size="2">>>>> -Aaron<br></font><font size="2">>>>> <br></font><font size="2">>>>> <br></font><font size="2">>>>> <br></font><font size="2">>>>> On March 12, 2018 at 10:59:35 EDT, Lukas Hejtmanek <xhejtman@ics.muni.cz<br></font><font size="2">>>> <<a href="mailto:xhejtman@ics.muni.cz">mailto:xhejtman@ics.muni.cz</a>>> wrote:<br></font><font size="2">>>>> Hello,<br></font><font size="2">>>>> <br></font><font size="2">>>>> I'm thinking about the following setup:<br></font><font size="2">>>>> ~ 60 nodes, each with two enterprise NVMe SSDs, FDR IB interconnected<br></font><font size="2">>>>> <br></font><font size="2">>>>> I would like to setup shared scratch area using GPFS and those NVMe<br></font><font size="2">>>> SSDs. Each<br></font><font size="2">>>>> SSDs as on NSD.<br></font><font size="2">>>>> <br></font><font size="2">>>>> I don't think like 5 or more data/metadata replicas are practical here.<br></font><font size="2">>>> On the<br></font><font size="2">>>>> other hand, multiple node failures is something really expected.<br></font><font size="2">>>>> <br></font><font size="2">>>>> Is there a way to instrument that local NSD is strongly preferred to<br></font><font size="2">>>> store<br></font><font size="2">>>>> data? I.e. node failure most probably does not result in unavailable<br></font><font size="2">>>> data for<br></font><font size="2">>>>> the other nodes?<br></font><font size="2">>>>> <br></font><font size="2">>>>> Or is there any other recommendation/solution to build shared scratch<br></font><font size="2">>>> with<br></font><font size="2">>>>> GPFS in such setup? (Do not do it including.)<br></font><font size="2">>>>> <br></font><font size="2">>>>> --<br></font><font size="2">>>>> Lukáš Hejtmánek<br></font><font size="2">>>>> _______________________________________________<br></font><font size="2">>>>> gpfsug-discuss mailing list<br></font><font size="2">>>>> gpfsug-discuss at spectrumscale.org<br></font><font size="2">>>>> <a href="https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwIFBA&c=jf_iaSHvJObTbx-siA1ZOg&r=1mZ896psa5caYzBeaugTlc7TtRejJp3uvKYxas3S7Xc&m=clDRNGIKhf6SYQ2ZZpZvBniUiqx1GU1bYEUbcbCunuo&s=ZLEoHFOFkjfuvNw57WqNn6-EVjHbASRHgnmRc2YYXpM&e=">https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwIFBA&c=jf_iaSHvJObTbx-siA1ZOg&r=1mZ896psa5caYzBeaugTlc7TtRejJp3uvKYxas3S7Xc&m=clDRNGIKhf6SYQ2ZZpZvBniUiqx1GU1bYEUbcbCunuo&s=ZLEoHFOFkjfuvNw57WqNn6-EVjHbASRHgnmRc2YYXpM&e=</a><br></font><font size="2">>>>> -- The information contained in this communication and any attachments<br></font><font size="2">>>> is confidential and may be privileged, and is for the sole use of the<br></font><font size="2">>>> intended recipient(s). Any unauthorized review, use, disclosure or<br></font><font size="2">>>> distribution is prohibited. Unless explicitly stated otherwise in the body<br></font><font size="2">>>> of this communication or the attachment thereto (if any), the information<br></font><font size="2">>>> is provided on an AS-IS basis without any express or implied warranties or<br></font><font size="2">>>> liabilities. To the extent you are relying on this information, you are<br></font><font size="2">>>> doing so at your own risk. If you are not the intended recipient, please<br></font><font size="2">>>> notify the sender immediately by replying to this message and destroy all<br></font><font size="2">>>> copies of this message and any attachments. Neither the sender nor the<br></font><font size="2">>>> company/group of companies he or she represents shall be liable for the<br></font><font size="2">>>> proper and complete transmission of the information contained in this<br></font><font size="2">>>> communication, or for any delay in its receipt.<br></font><font size="2">>>> <br></font><font size="2">>>>> _______________________________________________<br></font><font size="2">>>>> gpfsug-discuss mailing list<br></font><font size="2">>>>> gpfsug-discuss at spectrumscale.org<br></font><font size="2">>>>> <a href="https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwIFBA&c=jf_iaSHvJObTbx-siA1ZOg&r=1mZ896psa5caYzBeaugTlc7TtRejJp3uvKYxas3S7Xc&m=clDRNGIKhf6SYQ2ZZpZvBniUiqx1GU1bYEUbcbCunuo&s=ZLEoHFOFkjfuvNw57WqNn6-EVjHbASRHgnmRc2YYXpM&e=">https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwIFBA&c=jf_iaSHvJObTbx-siA1ZOg&r=1mZ896psa5caYzBeaugTlc7TtRejJp3uvKYxas3S7Xc&m=clDRNGIKhf6SYQ2ZZpZvBniUiqx1GU1bYEUbcbCunuo&s=ZLEoHFOFkjfuvNw57WqNn6-EVjHbASRHgnmRc2YYXpM&e=</a><br></font><font size="2">>>> <br></font><font size="2">>>> <br></font><font size="2">>>> --<br></font><font size="2">>>> Lukáš Hejtmánek<br></font><font size="2">>>> <br></font><font size="2">>>> Linux Administrator only because<br></font><font size="2">>>>  Full Time Multitasking Ninja<br></font><font size="2">>>>  is not an official job title<br></font><font size="2">>>> _______________________________________________<br></font><font size="2">>>> gpfsug-discuss mailing list<br></font><font size="2">>>> gpfsug-discuss at spectrumscale.org<br></font><font size="2">>>> <a href="https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwIFBA&c=jf_iaSHvJObTbx-siA1ZOg&r=1mZ896psa5caYzBeaugTlc7TtRejJp3uvKYxas3S7Xc&m=clDRNGIKhf6SYQ2ZZpZvBniUiqx1GU1bYEUbcbCunuo&s=ZLEoHFOFkjfuvNw57WqNn6-EVjHbASRHgnmRc2YYXpM&e=">https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwIFBA&c=jf_iaSHvJObTbx-siA1ZOg&r=1mZ896psa5caYzBeaugTlc7TtRejJp3uvKYxas3S7Xc&m=clDRNGIKhf6SYQ2ZZpZvBniUiqx1GU1bYEUbcbCunuo&s=ZLEoHFOFkjfuvNw57WqNn6-EVjHbASRHgnmRc2YYXpM&e=</a><br></font><font size="2">>>> <br></font><font size="2">> <br></font><font size="2">>> _______________________________________________<br></font><font size="2">>> gpfsug-discuss mailing list<br></font><font size="2">>> gpfsug-discuss at spectrumscale.org<br></font><font size="2">>> <a href="https://urldefense.proofpoint">https://urldefense.proofpoint</a>.</font><font size="2">com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwIFBA&c=jf_iaSHvJObTbx-siA1ZOg&r=1mZ896psa5caYzBeaugTlc7TtRejJp3uvKYxas3S7Xc&m=clDRNGIKhf6SYQ2ZZpZvBniUiqx1GU1bYEUbcbCunuo&s=ZLEoHFOFkjfuvNw57WqNn6-EVjHbASRHgnmRc2YYXpM&e=<br></font><font size="2">> <br></font><font size="2">> <br></font><font size="2">> -- <br></font><font size="2">> Lukáš Hejtmánek<br></font><font size="2">> <br></font><font size="2">> Linux Administrator only because<br></font><font size="2">>  Full Time Multitasking Ninja <br></font><font size="2">>  is not an official job title<br></font><font size="2">> _______________________________________________<br></font><font size="2">> gpfsug-discuss mailing list<br></font><font size="2">> gpfsug-discuss at spectrumscale.org<br></font><font size="2">> <a href="https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwIFBA&c=jf_iaSHvJObTbx-siA1ZOg&r=1mZ896psa5caYzBeaugTlc7TtRejJp3uvKYxas3S7Xc&m=clDRNGIKhf6SYQ2ZZpZvBniUiqx1GU1bYEUbcbCunuo&s=ZLEoHFOFkjfuvNw57WqNn6-EVjHbASRHgnmRc2YYXpM&e=">https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwIFBA&c=jf_iaSHvJObTbx-siA1ZOg&r=1mZ896psa5caYzBeaugTlc7TtRejJp3uvKYxas3S7Xc&m=clDRNGIKhf6SYQ2ZZpZvBniUiqx1GU1bYEUbcbCunuo&s=ZLEoHFOFkjfuvNw57WqNn6-EVjHbASRHgnmRc2YYXpM&e=</a><br></font><font size="2">> <br></font><BR>


Ellei edellä ole toisin mainittu: / Unless stated otherwise above:<BR>


Oy IBM Finland Ab<BR>


PL 265, 00101 Helsinki, Finland<BR>


Business ID, Y-tunnus: 0195876-3 <BR>


Registered in Finland<BR>


<BR>


</body></html>