From jan.sundermann at kit.edu Mon Aug 5 17:01:10 2019 From: jan.sundermann at kit.edu (Sundermann, Jan Erik (SCC)) Date: Mon, 5 Aug 2019 16:01:10 +0000 Subject: [gpfsug-discuss] Moving data between dependent filesets Message-ID: <4C53EA39-51BA-4E3F-BF33-38606A560B7F@kit.edu> Dear all, I am trying to understand how to move data efficiently between filesets sharing the same inode space. I have an independent fileset fs1 which contains data that I would like to move to a newly created dependent fileset fs2. fs1 and fs2 are sharing the same inode space. Apparently calling mv is copying the data instead of just moving it. Using strace on mv prints lines like renameat2(AT_FDCWD, "subdir1/file257", AT_FDCWD, "../filesettest/subdir1/file257", 0) = -1 EXDEV (Invalid cross-device link) Is there an efficient way to move the data between the filesets fs1 and fs2? Best regards Jan Erik -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 5102 bytes Desc: not available URL: From jfosburg at mdanderson.org Mon Aug 5 17:07:14 2019 From: jfosburg at mdanderson.org (Fosburgh,Jonathan) Date: Mon, 5 Aug 2019 16:07:14 +0000 Subject: [gpfsug-discuss] [EXT] Moving data between dependent filesets Message-ID: AFAIK, the mv command treats filesets as separate filesystems, even when sharing the same inode space. -- Jonathan Fosburgh Principal Application Systems Analyst IT Operations Storage Team The University of Texas MD Anderson Cancer Center (713) 745-9346 ?On 8/5/19, 11:02 AM, "gpfsug-discuss-bounces at spectrumscale.org on behalf of Sundermann, Jan Erik (SCC)" wrote: Dear all, I am trying to understand how to move data efficiently between filesets sharing the same inode space. I have an independent fileset fs1 which contains data that I would like to move to a newly created dependent fileset fs2. fs1 and fs2 are sharing the same inode space. Apparently calling mv is copying the data instead of just moving it. Using strace on mv prints lines like renameat2(AT_FDCWD, "subdir1/file257", AT_FDCWD, "../filesettest/subdir1/file257", 0) = -1 EXDEV (Invalid cross-device link) Is there an efficient way to move the data between the filesets fs1 and fs2? Best regards Jan Erik The information contained in this e-mail message may be privileged, confidential, and/or protected from disclosure. This e-mail message may contain protected health information (PHI); dissemination of PHI should comply with applicable federal and state laws. If you are not the intended recipient, or an authorized representative of the intended recipient, any further review, disclosure, use, dissemination, distribution, or copying of this message or any attachment (or the information contained therein) is strictly prohibited. If you think that you have received this e-mail message in error, please notify the sender by return e-mail and delete all references to it and its contents from your systems. From makaplan at us.ibm.com Thu Aug 8 22:08:11 2019 From: makaplan at us.ibm.com (Marc A Kaplan) Date: Thu, 8 Aug 2019 17:08:11 -0400 Subject: [gpfsug-discuss] Asymetric SAN with GPFS In-Reply-To: <20190729161138.znuss5dt2rhig6cv@ics.muni.cz> References: <20190729161138.znuss5dt2rhig6cv@ics.muni.cz> Message-ID: Think GPFS POOLs ... Generally, you should not mix different kinds of LUNs into the same GPFS POOL. From: Lukas Hejtmanek To: gpfsug-discuss at spectrumscale.org Date: 07/29/2019 12:12 PM Subject: [EXTERNAL] [gpfsug-discuss] Asymetric SAN with GPFS Sent by: gpfsug-discuss-bounces at spectrumscale.org Hello, is there any settings for GPFS 5.x so that you could mitigate slow down of asymmetric SAN? The asymmetric SAN means, that not every LUN has the same speed, or not every disk array has the same number of LUNs. It seems that overal speed is degraded to the slowest LUN. Is there any workaround for this except avoiding that LUN at all? -- Luk?? Hejtm?nek Linux Administrator only because Full Time Multitasking Ninja is not an official job title _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwIFBA&c=jf_iaSHvJObTbx-siA1ZOg&r=cvpnBBH0j41aQy0RPiG2xRL_M8mTc1izuQD3_PmtjZ8&m=4g2oEvesNo0UZtsAeNzm33hQ9jYDALfLllZkZ2nMpak&s=aRp743C0EJjuap9PuF7U5CoPcqti2RRnGi6CwKojYtI&e= -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: From robert at strubi.ox.ac.uk Fri Aug 9 14:29:14 2019 From: robert at strubi.ox.ac.uk (Robert Esnouf) Date: Fri, 09 Aug 2019 14:29:14 +0100 Subject: [gpfsug-discuss] relion software using GPFS storage In-Reply-To: References: Message-ID: <230cab76cb83cd153d2f9ed33c1e47e5@strubi.ox.ac.uk> Dear Wei, Not a lot on information to go on there... e.g. layout of the MPI processes on compute nodes, the interconnect and the GPFS settings... but the standout information appears to be: "10X slower than local SSD, and nfs reexport of another gpfs filesystem" "The per process IO is very slow, 4-5 MiB/s, while on ssd and nfs I got 20-40 MiB/s" You also not 2GB/s performance for 4MB writes, and 1.7GB/s read. That is only 500 IOPS, I assume you'd see more with 4kB reads/writes. I'd also note that 10x slower is kind of an intermediate number, its bad but not totally unproductive. I think the likely issues are going to be around the GPFS (client) config, although you might also be struggling with IOPS. The fact that the NFS re-export trick works (allowing O/S-level lazy caching and instant re-opening of files) suggests that total performance is not your issue. Upping the pagepool and/or maxStatCache etc may just make all these issues go away. If I picked out the right benchmark, then it is one with a 360 box size which is not too small... I don't know how many files comprise your particle set... Regards, Robert -- Dr Robert Esnouf University Research Lecturer, Director of Research Computing BDI, Head of Research Computing Core WHG, NDM Research Computing Strategy Officer Main office: Room 10/028, Wellcome Centre for Human Genetics, Old Road Campus, Roosevelt Drive, Oxford OX3 7BN, UK Emails: robert at strubi.ox.ac.uk / robert at well.ox.ac.uk / robert.esnouf at bdi.ox.ac.uk Tel: (+44)-1865-287783 (WHG); (+44)-1865-743689 (BDI) ? ----- Original Message ----- From: Guo, Wei (Wei.Guo at STJUDE.ORG) Date: 08/08/19 23:19 To: gpfsug-discuss at spectrumscale.org, robert at strubi.ox.ac.uk, robert at well.ox.ac.uk, robert.esnouf at bid.ox.ac.uk Subject: [gpfsug-discuss] relion software using GPFS storage Hi, Robert and Michael, What are the settings within relion for parallel file systems? Sorry to bump this old threads, as I don't see any further conversation, and I cannot join the mailing list recently due to the spectrumscale.org:10000 web server error. I used to be in this mailing list with my previous work (email). The problem is I also see Relion 3 does not like GPFS. It is obscenely slow, slower than anything... local ssd, nfs reexport of gpfs. I am using the standard benchmarks from Relion 3 website. The mpirun -n 9 `which relion_refine_mpi` is 10X slower than local SSD, and nfs reexport of another gpfs filesystem. The latter two I can get close results (1hr25min) as compared with the publish results (1hr13min) on the same Intel Xeon Gold 6148 CPU @2.40GHz and 4 V100 GPU cards, with the same command. Running the same standard benchmark it takes 15-20 min for one iteration, should be <1.7 mins. The per process IO is very slow, 4-5 MiB/s, while on ssd and nfs I got 20-40 MiB/s if watching the /proc//io of the relion_refine processes. My gpfs client can see ~2GB/s when benchmarking with IOZONE, yes, 2GB/s because of small system, 70? drives. Record Size 4096 kB O_DIRECT feature enabled File size set to 20971520 kB Command line used: iozone -r 4m -I -t 16 -s 20g Output is in kBytes/sec Time Resolution = 0.000001 seconds. Processor cache size set to 1024 kBytes. Processor cache line size set to 32 bytes. File stride size set to 17 * record size. Throughput test with 16 processes Each process writes a 20971520 kByte file in 4096 kByte records Children see throughput for 16 initial writers = 1960218.38 kB/sec Parent sees throughput for 16 initial writers = 1938463.07 kB/sec Min throughput per process = ?120415.66 kB/sec? Max throughput per process = ?123652.07 kB/sec Avg throughput per process = ?122513.65 kB/sec Min xfer = 20426752.00 kB Children see throughput for 16 readers = 1700354.00 kB/sec Parent sees throughput for 16 readers = 1700046.71 kB/sec Min throughput per process = ?104587.73 kB/sec? Max throughput per process = ?108182.84 kB/sec Avg throughput per process = ?106272.12 kB/sec Min xfer = 20275200.00 kB The --no_parallel_disk_io is even worse. --only_do_unfinished_movies does not help much. Please advise. Thanks Wei Guo Computational Engineer, St Jude Children's Research Hospital wei.guo at stjude.org Dear Michael, There are settings within relion for parallel file systems, you should check they are enabled if you have SS underneath. Otherwise, check which version of relion and then try to understand the problem that is being analysed a little more. If the box size is very small and the internal symmetry low then the user may read 100,000s of small "picked particle" files for each iteration opening and closing the files each time. I believe that relion3 has some facility for extracting these small particles from the larger raw images and that is more SS-friendly. Alternatively, the size of the set of picked particles is often only in 50GB range and so staging to one or more local machines is quite feasible... Hope one of those suggestions helps. Regards, Robert -- Dr Robert Esnouf University Research Lecturer, Director of Research Computing BDI, Head of Research Computing Core WHG, NDM Research Computing Strategy Officer Main office: Room 10/028, Wellcome Centre for Human Genetics, Old Road Campus, Roosevelt Drive, Oxford OX3 7BN, UK Emails: robert at strubi.ox.ac.uk / robert at well.ox.ac.uk / robert.esnouf at bdi.ox.ac.uk Tel: (+44)-1865-287783 (WHG); (+44)-1865-743689 (BDI) ? -----Original Message----- From: "Michael Holliday" To: gpfsug-discuss at spectrumscale.org Date: 27/02/19 12:21 Subject: [gpfsug-discuss] relion software using GPFS storage Hi All, ? We?ve recently had an issue where a job on our client GPFS cluster caused out main storage to go extremely slowly.? ?The job was running relion using MPI (https://www2.mrc-lmb.cam.ac.uk/relion/index.php?title=Main_Page) ? It caused waiters across the cluster, and caused the load to spike on NSDS on at a time.? When the spike ended on one NSD, it immediately started on another.? ? There were no obvious errors in the logs and the issues cleared immediately after the job was cancelled.? ? Has anyone else see any issues with relion using GPFS storage? ? Michael ? Michael Holliday RITTech MBCS Senior HPC & Research Data Systems Engineer | eMedLab Operations Team Scientific Computing STP | The Francis Crick Institute 1, Midland Road | London | NW1 1AT | United Kingdom Tel: 0203 796 3167 ? The Francis Crick Institute Limited is a registered charity in England and Wales no. 1140062 and a company registered in England and Wales no. 06885462, with its registered office at 1 Midland Road London NW1 1AT _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss Email Disclaimer: www.stjude.org/emaildisclaimer Consultation Disclaimer: www.stjude.org/consultationdisclaimer -------------- next part -------------- An HTML attachment was scrubbed... URL: From alex.mamach at northwestern.edu Fri Aug 9 18:46:34 2019 From: alex.mamach at northwestern.edu (Alexander John Mamach) Date: Fri, 9 Aug 2019 17:46:34 +0000 Subject: [gpfsug-discuss] Checking for Stale File Handles Message-ID: Hi folks, We?re currently investigating a way to check for stale file handles on the nodes across our cluster in a way that minimizes impact to the filesystem and performance. Has anyone found a direct way of doing so? We considered a few methods, including simply attempting to ls a GPFS filesystem from each node, but that might have false positives, (detecting slowdowns as stale file handles), and could negatively impact performance with hundreds of nodes doing this simultaneously. Thanks, Alex Senior Systems Administrator Research Computing Infrastructure Northwestern University Information Technology (NUIT) 2020 Ridge Ave Evanston, IL 60208-4311 O: (847) 491-2219 M: (312) 887-1881 www.it.northwestern.edu -------------- next part -------------- An HTML attachment was scrubbed... URL: From stockf at us.ibm.com Fri Aug 9 19:03:09 2019 From: stockf at us.ibm.com (Frederick Stock) Date: Fri, 9 Aug 2019 18:03:09 +0000 Subject: [gpfsug-discuss] Checking for Stale File Handles In-Reply-To: References: Message-ID: An HTML attachment was scrubbed... URL: From novosirj at rutgers.edu Fri Aug 9 19:09:18 2019 From: novosirj at rutgers.edu (Ryan Novosielski) Date: Fri, 9 Aug 2019 18:09:18 +0000 Subject: [gpfsug-discuss] Checking for Stale File Handles In-Reply-To: References:

Message-ID: <3918E7F4-D499-489B-9D23-22B5C456D637@rutgers.edu> I?ve needed the same sort of thing ? we use NHC to check for FS status and we?ve had cases that we were not able to detect because they were in this ?Stale file handle? state. GPFS doesn?t always seem to behave in the way Linux would expect. > On Aug 9, 2019, at 2:03 PM, Frederick Stock wrote: > > Are you able to explain why you want to check for stale file handles? Are you attempting to detect failures of some sort, and why do the existing mechanisms in GPFS not provide the functionality you require? > > Fred > __________________________________________________ > Fred Stock | IBM Pittsburgh Lab | 720-430-8821 > stockf at us.ibm.com > > > ----- Original message ----- > From: Alexander John Mamach > Sent by: gpfsug-discuss-bounces at spectrumscale.org > To: "gpfsug-discuss at spectrumscale.org" > Cc: > Subject: [EXTERNAL] [gpfsug-discuss] Checking for Stale File Handles > Date: Fri, Aug 9, 2019 1:46 PM > > Hi folks, > > We?re currently investigating a way to check for stale file handles on the nodes across our cluster in a way that minimizes impact to the filesystem and performance. > > Has anyone found a direct way of doing so? We considered a few methods, including simply attempting to ls a GPFS filesystem from each node, but that might have false positives, (detecting slowdowns as stale file handles), and could negatively impact performance with hundreds of nodes doing this simultaneously. > > Thanks, > > Alex > > Senior Systems Administrator > > Research Computing Infrastructure > Northwestern University Information Technology (NUIT) > > 2020 Ridge Ave > Evanston, IL 60208-4311 > > O: (847) 491-2219 > M: (312) 887-1881 > www.it.northwestern.edu > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss From rmoye at quantlab.com Fri Aug 9 19:03:05 2019 From: rmoye at quantlab.com (Roger Moye) Date: Fri, 9 Aug 2019 18:03:05 +0000 Subject: [gpfsug-discuss] Checking for Stale File Handles In-Reply-To: References: Message-ID: <5db3a0bf06754c73b83a50db0b577847@quantlab.com> You might try something like: timeout --kill-after=5 --signal=SIGKILL 5 test -d /some/folder/on/gpfs [cid:image001.png at 01D22319.C7D5D540] Roger Moye HPC Engineer 713.425.6236 Office 713.898.0021 Mobile QUANTLAB Financial, LLC 3 Greenway Plaza Suite 200 Houston, Texas 77046 www.quantlab.com From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Alexander John Mamach Sent: Friday, August 9, 2019 12:47 PM To: gpfsug-discuss at spectrumscale.org Subject: [gpfsug-discuss] Checking for Stale File Handles Hi folks, We're currently investigating a way to check for stale file handles on the nodes across our cluster in a way that minimizes impact to the filesystem and performance. Has anyone found a direct way of doing so? We considered a few methods, including simply attempting to ls a GPFS filesystem from each node, but that might have false positives, (detecting slowdowns as stale file handles), and could negatively impact performance with hundreds of nodes doing this simultaneously. Thanks, Alex Senior Systems Administrator Research Computing Infrastructure Northwestern University Information Technology (NUIT) 2020 Ridge Ave Evanston, IL 60208-4311 O: (847) 491-2219 M: (312) 887-1881 www.it.northwestern.edu ----------------------------------------------------------------------------------- The information in this communication and any attachment is confidential and intended solely for the attention and use of the named addressee(s). All information and opinions expressed herein are subject to change without notice. This communication is not to be construed as an offer to sell or the solicitation of an offer to buy any security. Any such offer or solicitation can only be made by means of the delivery of a confidential private offering memorandum (which should be carefully reviewed for a complete description of investment strategies and risks). Any reliance one may place on the accuracy or validity of this information is at their own risk. Past performance is not necessarily indicative of the future results of an investment. All figures are estimated and unaudited unless otherwise noted. If you are not the intended recipient, or a person responsible for delivering this to the intended recipient, you are not authorized to and must not disclose, copy, distribute, or retain this message or any part of it. In this case, please notify the sender immediately at 713-333-5440 -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image001.png Type: image/png Size: 3364 bytes Desc: image001.png URL: From ewahl at osc.edu Fri Aug 9 19:54:48 2019 From: ewahl at osc.edu (Wahl, Edward) Date: Fri, 9 Aug 2019 18:54:48 +0000 Subject: [gpfsug-discuss] Checking for Stale File Handles In-Reply-To: References: Message-ID: We use NHC here (Node Health Check) from LBNL and our SS clients are almost all using NFS root. We have a check where we look for access to a couple of dotfiles (we have multiple SS file systems) and will mark a node offline if the checks fail. Many things can contribute to the failure of a single client node as we all know. Our checks are for actual node health on the clients, NOT to assess the health of the File Systems themselves. I will normally see MANY other problems from other monitoring sources long before I normally see stale file handles at the client level. We did have to turn up the timeout for a check of the file to return on very busy clients, but we've haven't seen slowdowns due to hundreds of nodes all checking the file at the same time. Localized node slowdowns will occasionally mark a node offline for this check here and there (normally a node that is extremely busy), but the next check will put the node right back online in the batch system. Ed Wahl Ohio Supercomputer Center ewahl at osc.edu ________________________________ From: gpfsug-discuss-bounces at spectrumscale.org on behalf of Alexander John Mamach Sent: Friday, August 9, 2019 1:46 PM To: gpfsug-discuss at spectrumscale.org Subject: [gpfsug-discuss] Checking for Stale File Handles Hi folks, We?re currently investigating a way to check for stale file handles on the nodes across our cluster in a way that minimizes impact to the filesystem and performance. Has anyone found a direct way of doing so? We considered a few methods, including simply attempting to ls a GPFS filesystem from each node, but that might have false positives, (detecting slowdowns as stale file handles), and could negatively impact performance with hundreds of nodes doing this simultaneously. Thanks, Alex Senior Systems Administrator Research Computing Infrastructure Northwestern University Information Technology (NUIT) 2020 Ridge Ave Evanston, IL 60208-4311 O: (847) 491-2219 M: (312) 887-1881 www.it.northwestern.edu -------------- next part -------------- An HTML attachment was scrubbed... URL: From alex.mamach at northwestern.edu Fri Aug 9 21:32:49 2019 From: alex.mamach at northwestern.edu (Alexander John Mamach) Date: Fri, 9 Aug 2019 20:32:49 +0000 Subject: [gpfsug-discuss] Checking for Stale File Handles In-Reply-To: References: , Message-ID: Hi Fred, We sometimes find a node will show that GPFS is active when running mmgetstate, but one of our GPFS filesystems, (such as our home or projects filesystems) are inaccessible to users, while the other GPFS-mounted filesystems behave as expected. Our current node health checks don?t always detect this, especially when it?s for a resource-based mount that doesn?t impact the node but would impact jobs trying to run on the node. If there is something native to GPFS that can detect this, all the better, but I?m simply unaware of how to do so. Thanks, Alex Senior Systems Administrator Research Computing Infrastructure Northwestern University Information Technology (NUIT) 2020 Ridge Ave Evanston, IL 60208-4311 O: (847) 491-2219 M: (312) 887-1881 www.it.northwestern.edu ________________________________ From: gpfsug-discuss-bounces at spectrumscale.org on behalf of Frederick Stock Sent: Friday, August 9, 2019 1:03:09 PM To: gpfsug-discuss at spectrumscale.org Cc: gpfsug-discuss at spectrumscale.org Subject: Re: [gpfsug-discuss] Checking for Stale File Handles Are you able to explain why you want to check for stale file handles? Are you attempting to detect failures of some sort, and why do the existing mechanisms in GPFS not provide the functionality you require? Fred __________________________________________________ Fred Stock | IBM Pittsburgh Lab | 720-430-8821 stockf at us.ibm.com ----- Original message ----- From: Alexander John Mamach Sent by: gpfsug-discuss-bounces at spectrumscale.org To: "gpfsug-discuss at spectrumscale.org" Cc: Subject: [EXTERNAL] [gpfsug-discuss] Checking for Stale File Handles Date: Fri, Aug 9, 2019 1:46 PM Hi folks, We?re currently investigating a way to check for stale file handles on the nodes across our cluster in a way that minimizes impact to the filesystem and performance. Has anyone found a direct way of doing so? We considered a few methods, including simply attempting to ls a GPFS filesystem from each node, but that might have false positives, (detecting slowdowns as stale file handles), and could negatively impact performance with hundreds of nodes doing this simultaneously. Thanks, Alex Senior Systems Administrator Research Computing Infrastructure Northwestern University Information Technology (NUIT) 2020 Ridge Ave Evanston, IL 60208-4311 O: (847) 491-2219 M: (312) 887-1881 www.it.northwestern.edu _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From MDIETZ at de.ibm.com Mon Aug 12 09:30:14 2019 From: MDIETZ at de.ibm.com (Mathias Dietz) Date: Mon, 12 Aug 2019 10:30:14 +0200 Subject: [gpfsug-discuss] Checking for Stale File Handles In-Reply-To: References: , Message-ID: Hi Alex, did you try mmhealth ? It should detect stale file handles of the gpfs filesystems already and report a "stale_mount" event. Mit freundlichen Gr??en / Kind regards Mathias Dietz Spectrum Scale Development - Release Lead Architect (4.2.x) Spectrum Scale RAS Architect --------------------------------------------------------------------------- IBM Deutschland Am Weiher 24 65451 Kelsterbach Phone: +49 70342744105 Mobile: +49-15152801035 E-Mail: mdietz at de.ibm.com ----------------------------------------------------------------------------- IBM Deutschland Research & Development GmbH Vorsitzender des Aufsichtsrats: Martina Koederitz, Gesch?ftsf?hrung: Dirk WittkoppSitz der Gesellschaft: B?blingen / Registergericht: Amtsgericht Stuttgart, HRB 243294 From: Alexander John Mamach To: gpfsug main discussion list Cc: "gpfsug-discuss at spectrumscale.org" Date: 09/08/2019 22:33 Subject: [EXTERNAL] Re: [gpfsug-discuss] Checking for Stale File Handles Sent by: gpfsug-discuss-bounces at spectrumscale.org Hi Fred, We sometimes find a node will show that GPFS is active when running mmgetstate, but one of our GPFS filesystems, (such as our home or projects filesystems) are inaccessible to users, while the other GPFS-mounted filesystems behave as expected. Our current node health checks don?t always detect this, especially when it?s for a resource-based mount that doesn?t impact the node but would impact jobs trying to run on the node. If there is something native to GPFS that can detect this, all the better, but I?m simply unaware of how to do so. Thanks, Alex Senior Systems Administrator Research Computing Infrastructure Northwestern University Information Technology (NUIT) 2020 Ridge Ave Evanston, IL 60208-4311 O: (847) 491-2219 M: (312) 887-1881 www.it.northwestern.edu From: gpfsug-discuss-bounces at spectrumscale.org on behalf of Frederick Stock Sent: Friday, August 9, 2019 1:03:09 PM To: gpfsug-discuss at spectrumscale.org Cc: gpfsug-discuss at spectrumscale.org Subject: Re: [gpfsug-discuss] Checking for Stale File Handles Are you able to explain why you want to check for stale file handles? Are you attempting to detect failures of some sort, and why do the existing mechanisms in GPFS not provide the functionality you require? Fred __________________________________________________ Fred Stock | IBM Pittsburgh Lab | 720-430-8821 stockf at us.ibm.com ----- Original message ----- From: Alexander John Mamach Sent by: gpfsug-discuss-bounces at spectrumscale.org To: "gpfsug-discuss at spectrumscale.org" Cc: Subject: [EXTERNAL] [gpfsug-discuss] Checking for Stale File Handles Date: Fri, Aug 9, 2019 1:46 PM Hi folks, We?re currently investigating a way to check for stale file handles on the nodes across our cluster in a way that minimizes impact to the filesystem and performance. Has anyone found a direct way of doing so? We considered a few methods, including simply attempting to ls a GPFS filesystem from each node, but that might have false positives, (detecting slowdowns as stale file handles), and could negatively impact performance with hundreds of nodes doing this simultaneously. Thanks, Alex Senior Systems Administrator Research Computing Infrastructure Northwestern University Information Technology (NUIT) 2020 Ridge Ave Evanston, IL 60208-4311 O: (847) 491-2219 M: (312) 887-1881 www.it.northwestern.edu _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=9dCEbNr27klWay2AcOfvOE1xq50K-CyRUu4qQx4HOlk&m=sUjgq9g2p2ncIpALAqAhOqt7blwynTJmgmFdYYik7MI&s=EFC3lNuf6koYPMPSWuYCNhwmIMUKKZ9mCQFhxVCYWLQ&e= -------------- next part -------------- An HTML attachment was scrubbed... URL: From u.sibiller at science-computing.de Mon Aug 12 11:42:49 2019 From: u.sibiller at science-computing.de (Ulrich Sibiller) Date: Mon, 12 Aug 2019 12:42:49 +0200 Subject: [gpfsug-discuss] Fileheat Message-ID: Hello, I am having difficulties with Spectrum Scale's fileheat feature on Spectrum Scale 5.0.2/5.0.3: The config has it activated: # mmlsconfig | grep fileHeat fileHeatPeriodMinutes 720 Now everytime I look at the files using mmapplypolicy I only see 0 for the fileheat. I have both tried reading files via nfs and locally. No difference, the fileheat always stays at zero. What could be wrong here? How to debug? We are exporting the filesystem using kernel NFS which is working fine. However, the documentation states that root access is not taken into account for fileheat, so I am wondering if that setup is supposed to work at all? Thx, Uli -- Science + Computing AG Vorstandsvorsitzender/Chairman of the board of management: Dr. Martin Matzke Vorstand/Board of Management: Matthias Schempp, Sabine Hohenstein Vorsitzender des Aufsichtsrats/ Chairman of the Supervisory Board: Philippe Miltin Aufsichtsrat/Supervisory Board: Martin Wibbe, Ursula Morgenstern Sitz/Registered Office: Tuebingen Registergericht/Registration Court: Stuttgart Registernummer/Commercial Register No.: HRB 382196 From alvise.dorigo at psi.ch Mon Aug 12 14:03:47 2019 From: alvise.dorigo at psi.ch (Dorigo Alvise (PSI)) Date: Mon, 12 Aug 2019 13:03:47 +0000 Subject: [gpfsug-discuss] AFM and SELinux Message-ID: <83A6EEB0EC738F459A39439733AE80452BE96124@MBX114.d.ethz.ch> Dear GPFS users, does anybody know if AFM behaves correctly if the AFM gateway has SELinux "Disabled" and NFS server has SElinux "Enforcing" ? thanks, Alvise -------------- next part -------------- An HTML attachment was scrubbed... URL: From makaplan at us.ibm.com Mon Aug 12 14:38:59 2019 From: makaplan at us.ibm.com (Marc A Kaplan) Date: Mon, 12 Aug 2019 09:38:59 -0400 Subject: [gpfsug-discuss] Fileheat In-Reply-To: References: Message-ID: My Admin guide says: The loss percentage and period are set via the configuration variables fileHeatLossPercent and fileHeatPeriodMinutes. By default, the file access temperature is not tracked. To use access temperature in policy, the tracking must first be enabled. To do this, set the two configuration variables as follows: fileHeatLossPercent The percentage (between 0 and 100) of file access temperature dissipated over the fileHeatPeriodMinutes time. The default value is 10. Chapter 25. Information lifecycle management for IBM Spectrum Scale 361 fileHeatPeriodMinutes The number of minutes defined for the recalculation of file access temperature. To turn on tracking, fileHeatPeriodMinutes must be set to a nonzero value. The default value is 0 SO Try setting both! ALSO to take effect you may have to mmshutdown and mmstartup, at least on the (client gpfs) nodes that are accessing the files of interest. From: Ulrich Sibiller To: gpfsug-discuss at gpfsug.org Date: 08/12/2019 06:50 AM Subject: [EXTERNAL] [gpfsug-discuss] Fileheat Sent by: gpfsug-discuss-bounces at spectrumscale.org Hello, I am having difficulties with Spectrum Scale's fileheat feature on Spectrum Scale 5.0.2/5.0.3: The config has it activated: # mmlsconfig | grep fileHeat fileHeatPeriodMinutes 720 Now everytime I look at the files using mmapplypolicy I only see 0 for the fileheat. I have both tried reading files via nfs and locally. No difference, the fileheat always stays at zero. What could be wrong here? How to debug? We are exporting the filesystem using kernel NFS which is working fine. However, the documentation states that root access is not taken into account for fileheat, so I am wondering if that setup is supposed to work at all? Thx, Uli -- Science + Computing AG Vorstandsvorsitzender/Chairman of the board of management: Dr. Martin Matzke Vorstand/Board of Management: Matthias Schempp, Sabine Hohenstein Vorsitzender des Aufsichtsrats/ Chairman of the Supervisory Board: Philippe Miltin Aufsichtsrat/Supervisory Board: Martin Wibbe, Ursula Morgenstern Sitz/Registered Office: Tuebingen Registergericht/Registration Court: Stuttgart Registernummer/Commercial Register No.: HRB 382196 _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=cvpnBBH0j41aQy0RPiG2xRL_M8mTc1izuQD3_PmtjZ8&m=vQ1ASSRY5HseAqfNFONyHvd4crfRlWttZe2Uti0rx1s&s=Q7wAWezSHse5uPfvwobmcmASiGvpLfbKy97sqRkvJ-M&e= -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: From u.sibiller at science-computing.de Tue Aug 13 10:22:27 2019 From: u.sibiller at science-computing.de (Ulrich Sibiller) Date: Tue, 13 Aug 2019 11:22:27 +0200 Subject: [gpfsug-discuss] Fileheat In-Reply-To: References:

Message-ID: On 12.08.19 15:38, Marc A Kaplan wrote: > My Admin guide says: > > The loss percentage and period are set via the configuration > variables *fileHeatLossPercent *and *fileHeatPeriodMinutes*. By default, the file access temperature > is not > tracked. To use access temperature in policy, the tracking must first be enabled. To do this, set > the two > configuration variables as follows:* Yes, I am aware of that. > fileHeatLossPercent* > The percentage (between 0 and 100) of file access temperature dissipated over the* > fileHeatPeriodMinutes *time. The default value is 10. > Chapter 25. Information lifecycle management for IBM Spectrum Scale *361** > fileHeatPeriodMinutes* > The number of minutes defined for the recalculation of file access temperature. To turn on > tracking, *fileHeatPeriodMinutes *must be set to a nonzero value. The default value is 0 > > > SO Try setting both! Well, I have not because the documentation explicitly mentions a default. What's the point of a default if I have to explicitly configure it? > ALSO to take effect you may have to mmshutdown and mmstartup, at least on the (client gpfs) nodes > that are accessing the files of interest. I have now configured both parameters and restarted GPFS. Ran a tar over a directory - still no change. I will wait for 720minutes and retry (tomorrow). Thanks Uli -- Science + Computing AG Vorstandsvorsitzender/Chairman of the board of management: Dr. Martin Matzke Vorstand/Board of Management: Matthias Schempp, Sabine Hohenstein Vorsitzender des Aufsichtsrats/ Chairman of the Supervisory Board: Philippe Miltin Aufsichtsrat/Supervisory Board: Martin Wibbe, Ursula Morgenstern Sitz/Registered Office: Tuebingen Registergericht/Registration Court: Stuttgart Registernummer/Commercial Register No.: HRB 382196 From janfrode at tanso.net Tue Aug 13 11:22:13 2019 From: janfrode at tanso.net (Jan-Frode Myklebust) Date: Tue, 13 Aug 2019 12:22:13 +0200 Subject: [gpfsug-discuss] Fileheat In-Reply-To: References:

Message-ID: What about filesystem atime updates. We recently changed the default to ?relatime?. Could that maybe influence heat tracking? -jf tir. 13. aug. 2019 kl. 11:29 skrev Ulrich Sibiller < u.sibiller at science-computing.de>: > On 12.08.19 15:38, Marc A Kaplan wrote: > > My Admin guide says: > > > > The loss percentage and period are set via the configuration > > variables *fileHeatLossPercent *and *fileHeatPeriodMinutes*. By default, > the file access temperature > > is not > > tracked. To use access temperature in policy, the tracking must first be > enabled. To do this, set > > the two > > configuration variables as follows:* > > Yes, I am aware of that. > > > fileHeatLossPercent* > > The percentage (between 0 and 100) of file access temperature dissipated > over the* > > fileHeatPeriodMinutes *time. The default value is 10. > > Chapter 25. Information lifecycle management for IBM Spectrum Scale > *361** > > fileHeatPeriodMinutes* > > The number of minutes defined for the recalculation of file access > temperature. To turn on > > tracking, *fileHeatPeriodMinutes *must be set to a nonzero value. The > default value is 0 > > > > > > SO Try setting both! > > Well, I have not because the documentation explicitly mentions a default. > What's the point of a > default if I have to explicitly configure it? > > > ALSO to take effect you may have to mmshutdown and mmstartup, at least > on the (client gpfs) nodes > > that are accessing the files of interest. > > I have now configured both parameters and restarted GPFS. Ran a tar over a > directory - still no > change. I will wait for 720minutes and retry (tomorrow). > > Thanks > > Uli > > -- > Science + Computing AG > Vorstandsvorsitzender/Chairman of the board of management: > Dr. Martin Matzke > Vorstand/Board of Management: > Matthias Schempp, Sabine Hohenstein > Vorsitzender des Aufsichtsrats/ > Chairman of the Supervisory Board: > Philippe Miltin > Aufsichtsrat/Supervisory Board: > Martin Wibbe, Ursula Morgenstern > Sitz/Registered Office: Tuebingen > Registergericht/Registration Court: Stuttgart > Registernummer/Commercial Register No.: HRB 382196 > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: From makaplan at us.ibm.com Tue Aug 13 14:32:53 2019 From: makaplan at us.ibm.com (Marc A Kaplan) Date: Tue, 13 Aug 2019 09:32:53 -0400 Subject: [gpfsug-discuss] Fileheat - does work! Complete test/example provided here. In-Reply-To: References:

Message-ID: Yes, you are correct. It should only be necessary to set fileHeatPeriodMinutes, since the loss percent does have a default value. But IIRC (I implemented part of this!) you must restart the daemon to get those fileheat parameter(s) "loaded"and initialized into the daemon processes. Not fully trusting my memory... I will now "prove" this works today as follows: To test, create and re-read a large file with dd... [root@/main/gpfs-git]$mmchconfig fileHeatPeriodMinutes=60 mmchconfig: Command successfully completed ... [root@/main/gpfs-git]$mmlsconfig | grep -i heat fileHeatPeriodMinutes 60 [root@/main/gpfs-git]$mmshutdown ... [root@/main/gpfs-git]$mmstartup ... [root@/main/gpfs-git]$mmmount c23 ... [root@/main/gpfs-git]$ls -l /c23/10g -rw-r--r--. 1 root root 10737418240 May 16 15:09 /c23/10g [root@/main/gpfs-git]$mmlsattr -d -X /c23/10g file name: /c23/10g security.selinux (NO fileheat attribute yet...) [root@/main/gpfs-git]$dd if=/c23/10g bs=1M of=/dev/null ... After the command finishes, you may need to wait a while for the metadata to flush to the inode on disk ... or you can force that with an unmount or a mmfsctl... Then the fileheat attribute will appear (I just waited by answering another email... No need to do any explicit operations on the file system..) [root@/main/gpfs-git]$mmlsattr -d -X /c23/10g file name: /c23/10g security.selinux gpfs.FileHeat To see its hex string value: [root@/main/gpfs-git]$mmlsattr -d -X -L /c23/10g file name: /c23/10g ... security.selinux: 0x756E636F6E66696E65645F753A6F626A6563745F723A756E6C6162656C65645F743A733000 gpfs.FileHeat: 0x000000EE42A40400 Which will be interpreted by mmapplypolicy... YES, the interpretation is relative to last access time and current time, and done by a policy/sql function "computeFileHeat" (You could find this using m4 directives in your policy file...) define([FILE_HEAT],[computeFileHeat(CURRENT_TIMESTAMP-ACCESS_TIME,xattr ('gpfs.FileHeat'),KB_ALLOCATED)]) Well gone that far, might as well try mmapplypolicy too.... [root@/main/gpfs-git]$cat /gh/policies/fileheat.policy define(DISPLAY_NULL,[CASE WHEN ($1) IS NULL THEN '_NULL_' ELSE varchar($1) END]) rule fh1 external list 'fh' exec '' rule fh2 list 'fh' weight(FILE_HEAT) show(DISPLAY_NULL(xattr_integer ('gpfs.FileHeat',1,4,'B')) || ' ' || DISPLAY_NULL(xattr_integer('gpfs.FileHeat',5,2,'B')) || ' ' || DISPLAY_NULL(xattr_integer('gpfs.FileHeat',7,2,'B')) || ' ' || DISPLAY_NULL(FILE_HEAT) || ' ' || DISPLAY_NULL(hex(xattr('gpfs.FileHeat'))) || ' ' || getmmconfig('fileHeatPeriodMinutes') || ' ' || getmmconfig('fileHeatLossPercent') || ' ' || getmmconfig('clusterName') ) [root@/main/gpfs-git]$mmapplypolicy /c23 --maxdepth 1 -P /gh/policies/fileheat.policy -I test -L 3 ... <1> /c23/10g RULE 'fh2' LIST 'fh' WEIGHT(0.022363) SHOW( 238 17060 1024 +2.23632812500000E-002 000000EE42A40400 60 10 makaplan.sl.cloud9.ibm.com) ... WEIGHT(0.022363) LIST 'fh' /c23/10g SHOW(238 17060 1024 +2.23632812500000E-002 000000EE42A40400 60 10 makaplan.sl.cloud9.ibm.com) From: Jan-Frode Myklebust To: gpfsug main discussion list Date: 08/13/2019 06:22 AM Subject: [EXTERNAL] Re: [gpfsug-discuss] Fileheat Sent by: gpfsug-discuss-bounces at spectrumscale.org What about filesystem atime updates. We recently changed the default to ?relatime?. Could that maybe influence heat tracking? ? -jf tir. 13. aug. 2019 kl. 11:29 skrev Ulrich Sibiller < u.sibiller at science-computing.de>: On 12.08.19 15:38, Marc A Kaplan wrote: > My Admin guide says: > > The loss percentage and period are set via the configuration > variables *fileHeatLossPercent *and *fileHeatPeriodMinutes*. By default, the file access temperature > is not > tracked. To use access temperature in policy, the tracking must first be enabled. To do this, set > the two > configuration variables as follows:* Yes, I am aware of that. > fileHeatLossPercent* > The percentage (between 0 and 100) of file access temperature dissipated over the* > fileHeatPeriodMinutes *time. The default value is 10. > Chapter 25. Information lifecycle management for IBM Spectrum Scale *361** > fileHeatPeriodMinutes* > The number of minutes defined for the recalculation of file access temperature. To turn on > tracking, *fileHeatPeriodMinutes *must be set to a nonzero value. The default value is 0 > > > SO Try setting both! Well, I have not because the documentation explicitly mentions a default. What's the point of a default if I have to explicitly configure it? > ALSO to take effect you may have to mmshutdown and mmstartup, at least on the (client gpfs) nodes > that are accessing the files of interest. I have now configured both parameters and restarted GPFS. Ran a tar over a directory - still no change. I will wait for 720minutes and retry (tomorrow). Thanks Uli -- Science + Computing AG Vorstandsvorsitzender/Chairman of the board of management: Dr. Martin Matzke Vorstand/Board of Management: Matthias Schempp, Sabine Hohenstein Vorsitzender des Aufsichtsrats/ Chairman of the Supervisory Board: Philippe Miltin Aufsichtsrat/Supervisory Board: Martin Wibbe, Ursula Morgenstern Sitz/Registered Office: Tuebingen Registergericht/Registration Court: Stuttgart Registernummer/Commercial Register No.: HRB 382196 _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=cvpnBBH0j41aQy0RPiG2xRL_M8mTc1izuQD3_PmtjZ8&m=q-xjYq0Bimv3bYK1rhVMZ7jLvoEssmvfyMF0kcf5slc&s=buMugfNwUbsXJ2Gi04A3ehIQ0v-ORRc-Mb7sxaGgLrA&e= -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: From abhisdav at in.ibm.com Wed Aug 14 07:38:56 2019 From: abhisdav at in.ibm.com (Abhishek Dave) Date: Wed, 14 Aug 2019 12:08:56 +0530 Subject: [gpfsug-discuss] AFM and SELinux In-Reply-To: <83A6EEB0EC738F459A39439733AE80452BE96124@MBX114.d.ethz.ch> References: <83A6EEB0EC738F459A39439733AE80452BE96124@MBX114.d.ethz.ch> Message-ID: Hi, SELinux is officially not supported with AFM & ADR as of now. Support will be added in upcoming release. May i know if you are using kernel nfs or Ganesha and on which release? In my opinion Kernel NFS should work. Thanks, Abhishek, Dave From: "Dorigo Alvise (PSI)" To: "gpfsug-discuss at spectrumscale.org" Date: 08/12/2019 06:40 PM Subject: [EXTERNAL] [gpfsug-discuss] AFM and SELinux Sent by: gpfsug-discuss-bounces at spectrumscale.org Dear GPFS users, does anybody know if AFM behaves correctly if the AFM gateway has SELinux "Disabled" and NFS server has SElinux "Enforcing" ? thanks, Alvise_______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=gmKxLriPwxbU8DBgwjZJwuhzDr6RwM7JisZU6_htZRw&m=M2nCwpuhe4WfjF_Qcq5iyTJkZzqBrYYtd7g1pQcev4Q&s=n8ZNq9h3P_quAD-7nWT42DPt8ZQJNehK35O0Tn-Dtd8&e= -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: From alvise.dorigo at psi.ch Wed Aug 14 08:17:24 2019 From: alvise.dorigo at psi.ch (Dorigo Alvise (PSI)) Date: Wed, 14 Aug 2019 07:17:24 +0000 Subject: [gpfsug-discuss] AFM and SELinux In-Reply-To: References: <83A6EEB0EC738F459A39439733AE80452BE96124@MBX114.d.ethz.ch>, Message-ID: <83A6EEB0EC738F459A39439733AE80452BE96501@MBX114.d.ethz.ch> Hi Dave, thanks for the answer. Sorry when I mentioned NFS I was not talking about CES service, but the NFS server that export the HOME filesystem to the gateway AFM. A ________________________________ From: gpfsug-discuss-bounces at spectrumscale.org [gpfsug-discuss-bounces at spectrumscale.org] on behalf of Abhishek Dave [abhisdav at in.ibm.com] Sent: Wednesday, August 14, 2019 8:38 AM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] AFM and SELinux Hi, SELinux is officially not supported with AFM & ADR as of now. Support will be added in upcoming release. May i know if you are using kernel nfs or Ganesha and on which release? In my opinion Kernel NFS should work. Thanks, Abhishek, Dave [Inactive hide details for "Dorigo Alvise (PSI)" ---08/12/2019 06:40:34 PM---Dear GPFS users, does anybody know if AFM behaves c]"Dorigo Alvise (PSI)" ---08/12/2019 06:40:34 PM---Dear GPFS users, does anybody know if AFM behaves correctly if the AFM gateway has SELinux "Disabled From: "Dorigo Alvise (PSI)" To: "gpfsug-discuss at spectrumscale.org" Date: 08/12/2019 06:40 PM Subject: [EXTERNAL] [gpfsug-discuss] AFM and SELinux Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ Dear GPFS users, does anybody know if AFM behaves correctly if the AFM gateway has SELinux "Disabled" and NFS server has SElinux "Enforcing" ? thanks, Alvise_______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: graycol.gif URL: From lore at cscs.ch Fri Aug 16 16:49:05 2019 From: lore at cscs.ch (Lo Re Giuseppe) Date: Fri, 16 Aug 2019 15:49:05 +0000 Subject: [gpfsug-discuss] SS on RHEL 7.7 Message-ID: Hi, Has anybody tried to run Spectrum Scale on RHEL 7.7? It is not listed on https://www.ibm.com/support/knowledgecenter/en/STXKQY/gpfsclustersfaq.html#linuxrest ?. yet I would be interested to know whether anyone is already running it, in test or production systems. Thanks! Giuseppe -------------- next part -------------- An HTML attachment was scrubbed... URL: From Robert.Oesterlin at nuance.com Fri Aug 16 16:03:41 2019 From: Robert.Oesterlin at nuance.com (Oesterlin, Robert) Date: Fri, 16 Aug 2019 15:03:41 +0000 Subject: [gpfsug-discuss] Running mmcheckquota on a file system with 1.3B files Message-ID: <48973175-A514-4012-92F6-D63CF8054623@nuance.com> I want to run mmcheckquota on a file system I have on my ESS. It has 1.3B files. Obviously it will take a long time, but I?m looking for anyone who has guidance running this command on a large file system, what I should expect for IO load. - Should I set QOS to minimize impact? - Run directly on the ESS IO nodes, or some other nodes? Bob Oesterlin Sr Principal Storage Engineer, Nuance -------------- next part -------------- An HTML attachment was scrubbed... URL: From david_johnson at brown.edu Fri Aug 16 17:01:40 2019 From: david_johnson at brown.edu (david_johnson at brown.edu) Date: Fri, 16 Aug 2019 12:01:40 -0400 Subject: [gpfsug-discuss] SS on RHEL 7.7 In-Reply-To: References: Message-ID: <92DF159B-3BF6-411D-9DB5-48C3E7852667@brown.edu> I installed it on a client machine that was accidentally upgraded to rhel7.7. There were two type mismatch warnings during the gplbin rpm build but gpfs started up and mounted the filesystem successfully. Client is running ss 5.0.3. -- ddj Dave Johnson > On Aug 16, 2019, at 11:49 AM, Lo Re Giuseppe wrote: > > Hi, > > Has anybody tried to run Spectrum Scale on RHEL 7.7? > It is not listed on https://www.ibm.com/support/knowledgecenter/en/STXKQY/gpfsclustersfaq.html#linuxrest > ?. yet > I would be interested to know whether anyone is already running it, in test or production systems. > > Thanks! > > Giuseppe > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From chris.schlipalius at pawsey.org.au Mon Aug 19 03:35:02 2019 From: chris.schlipalius at pawsey.org.au (Chris Schlipalius) Date: Mon, 19 Aug 2019 10:35:02 +0800 Subject: [gpfsug-discuss] Announcing 2019 October 18th Australian Spectrum Scale User Group event - call for user case speakers Message-ID: <9C44E510-3777-4706-A435-E9C374CD2180@pawsey.org.au> Hello all, This is the announcement for the Spectrum Scale Usergroup Sydney Australia on Friday the 18th October 2019. All current draft Australian Spectrum Scale User Group event details can be found here: http://bit.ly/2YOFQ3u We are calling for user case speakers please ? let Ulf or myself know if you are available to speak at this Usergroup. Feel free to circulate this event link to all who may need it. Please reserve your tickets now as tickets for places will close soon. There are some great speakers and topics, for details please see the agenda on Eventbrite. This is a combined Spectrum Scale, Spectrum Archive, Spectrum Protect and Spectrum LSF event. We are looking forwards to a great Usergroup in Sydney. Thanks again to IBM for helping to arrange the venue and event booking. Best Regards, Chris Schlipalius Team Lead, Data Storage Infrastructure, Data & Visualisation, Pawsey Supercomputing Centre (CSIRO) 1 Bryce Avenue Kensington WA 6151 Australia Tel +61 8 6436 8815 Email GPFSUGAUS at gmail.com Web www.pawsey.org.au Regards, Chris Schlipalius IBM Champion 2019 Team Lead, Storage Infrastructure, Data & Visualisation, The Pawsey Supercomputing Centre (CSIRO) 1 Bryce Avenue Kensington WA 6151 Australia Tel +61 8 6436 8815 GPFSUGAUS at gmail.com From scale at us.ibm.com Mon Aug 19 14:24:31 2019 From: scale at us.ibm.com (IBM Spectrum Scale) Date: Mon, 19 Aug 2019 09:24:31 -0400 Subject: [gpfsug-discuss] =?utf-8?q?Running_mmcheckquota_on_a_file_system_?= =?utf-8?q?with_1=2E3B=09files?= In-Reply-To: <48973175-A514-4012-92F6-D63CF8054623@nuance.com> References: <48973175-A514-4012-92F6-D63CF8054623@nuance.com> Message-ID: Bob, like most questions of this time I think the answer depends on a number of variables. Generally we do not recommend running the mmcheckquota command during the peak usage of your Spectrum Scale system. As I think you know the command will increase the IO to the NSDs that hold metadata and the number of NSDs that hold metadata will contribute to the time it takes for the command to complete, i.e. more metadata NSDs should improve the overall execution time. Regards, The Spectrum Scale (GPFS) team ------------------------------------------------------------------------------------------------------------------ If you feel that your question can benefit other users of Spectrum Scale (GPFS), then please post it to the public IBM developerWroks Forum at https://www.ibm.com/developerworks/community/forums/html/forum?id=11111111-0000-0000-0000-000000000479 . If your query concerns a potential software error in Spectrum Scale (GPFS) and you have an IBM software maintenance contract please contact 1-800-237-5511 in the United States or your local IBM Service Center in other countries. The forum is informally monitored as time permits and should not be used for priority messages to the Spectrum Scale (GPFS) team. From: "Oesterlin, Robert" To: gpfsug main discussion list Date: 08/16/2019 11:57 AM Subject: [EXTERNAL] [gpfsug-discuss] Running mmcheckquota on a file system with 1.3B files Sent by: gpfsug-discuss-bounces at spectrumscale.org I want to run mmcheckquota on a file system I have on my ESS. It has 1.3B files. Obviously it will take a long time, but I?m looking for anyone who has guidance running this command on a large file system, what I should expect for IO load. - Should I set QOS to minimize impact? - Run directly on the ESS IO nodes, or some other nodes? Bob Oesterlin Sr Principal Storage Engineer, Nuance _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=IbxtjdkPAM2Sbon4Lbbi4w&m=WUpGs2Rz2EHGkbHv8FBmwheiONLdqvqSIS2FfIlCcc4&s=LX8or_PP0SLUsKP9Kb0wWE1u5jz84jQR-paJuklTvu4&e= -------------- next part -------------- An HTML attachment was scrubbed... URL: From Robert.Oesterlin at nuance.com Mon Aug 19 14:54:36 2019 From: Robert.Oesterlin at nuance.com (Oesterlin, Robert) Date: Mon, 19 Aug 2019 13:54:36 +0000 Subject: [gpfsug-discuss] Running mmcheckquota on a file system with 1.3B files Message-ID: <97DDC9A8-6278-4249-BE39-A19C693CB341@nuance.com> Thanks, - I kicked it off and it finished in about 12 hours, so much quicker than I expected. Bob Oesterlin Sr Principal Storage Engineer, Nuance From: on behalf of IBM Spectrum Scale Reply-To: gpfsug main discussion list Date: Monday, August 19, 2019 at 8:24 AM To: gpfsug main discussion list Subject: [EXTERNAL] Re: [gpfsug-discuss] Running mmcheckquota on a file system with 1.3B files Bob, like most questions of this time I think the answer depends on a number of variables. Generally we do not recommend running the mmcheckquota command during the peak usage of your Spectrum Scale system. As I think you know the command will increase the IO to the NSDs that hold metadata and the number of NSDs that hold metadata will contribute to the time it takes for the command to complete, i.e. more metadata NSDs should improve the overall execution time. Regards, The Spectrum Scale (GPFS) team -------------- next part -------------- An HTML attachment was scrubbed... URL: From ewahl at osc.edu Mon Aug 19 21:01:14 2019 From: ewahl at osc.edu (Wahl, Edward) Date: Mon, 19 Aug 2019 20:01:14 +0000 Subject: [gpfsug-discuss] Running mmcheckquota on a file system with 1.3B files In-Reply-To: <97DDC9A8-6278-4249-BE39-A19C693CB341@nuance.com> References: <97DDC9A8-6278-4249-BE39-A19C693CB341@nuance.com> Message-ID: I'm assuming that was a run in the foreground and not using QoS? Our timings sound roughly similar for a Foreground run under 4.2.3.x. 1 hour and ~2 hours for 100million and 300 million each. Also I'm assuming actual file counts, not inode counts! Background is, of course, all over the place with QoS. I've seen between 8-12 hours for just 100 million files, but the NSDs on that FS were middling busy during those periods. I'd love to know if IBM has any "best practice" guidance for running mmcheckquota. Ed ________________________________ From: gpfsug-discuss-bounces at spectrumscale.org on behalf of Oesterlin, Robert Sent: Monday, August 19, 2019 9:54 AM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] Running mmcheckquota on a file system with 1.3B files Thanks, - I kicked it off and it finished in about 12 hours, so much quicker than I expected. Bob Oesterlin Sr Principal Storage Engineer, Nuance From: on behalf of IBM Spectrum Scale Reply-To: gpfsug main discussion list Date: Monday, August 19, 2019 at 8:24 AM To: gpfsug main discussion list Subject: [EXTERNAL] Re: [gpfsug-discuss] Running mmcheckquota on a file system with 1.3B files Bob, like most questions of this time I think the answer depends on a number of variables. Generally we do not recommend running the mmcheckquota command during the peak usage of your Spectrum Scale system. As I think you know the command will increase the IO to the NSDs that hold metadata and the number of NSDs that hold metadata will contribute to the time it takes for the command to complete, i.e. more metadata NSDs should improve the overall execution time. Regards, The Spectrum Scale (GPFS) team -------------- next part -------------- An HTML attachment was scrubbed... URL: From Renar.Grunenberg at huk-coburg.de Tue Aug 20 12:06:51 2019 From: Renar.Grunenberg at huk-coburg.de (Grunenberg, Renar) Date: Tue, 20 Aug 2019 11:06:51 +0000 Subject: [gpfsug-discuss] Spectrum Scale Technote mmap Message-ID: Hallo All, can everyone clarify the effected Level?s in witch ptf is the problem and in witch is not. The Abstract mean for v5.0.3.0 to 5.0.3.2. But in the content it says 5.0.3.0 to 5.0.3.3? https://www-01.ibm.com/support/docview.wss?uid=ibm10960396 Regards Renar Renar Grunenberg Abteilung Informatik - Betrieb HUK-COBURG Bahnhofsplatz 96444 Coburg Telefon: 09561 96-44110 Telefax: 09561 96-44104 E-Mail: Renar.Grunenberg at huk-coburg.de Internet: www.huk.de ________________________________ HUK-COBURG Haftpflicht-Unterst?tzungs-Kasse kraftfahrender Beamter Deutschlands a. G. in Coburg Reg.-Gericht Coburg HRB 100; St.-Nr. 9212/101/00021 Sitz der Gesellschaft: Bahnhofsplatz, 96444 Coburg Vorsitzender des Aufsichtsrats: Prof. Dr. Heinrich R. Schradin. Vorstand: Klaus-J?rgen Heitmann (Sprecher), Stefan Gronbach, Dr. Hans Olav Her?y, Dr. J?rg Rheinl?nder, Sarah R?ssler, Daniel Thomas. ________________________________ Diese Nachricht enth?lt vertrauliche und/oder rechtlich gesch?tzte Informationen. Wenn Sie nicht der richtige Adressat sind oder diese Nachricht irrt?mlich erhalten haben, informieren Sie bitte sofort den Absender und vernichten Sie diese Nachricht. Das unerlaubte Kopieren sowie die unbefugte Weitergabe dieser Nachricht ist nicht gestattet. This information may contain confidential and/or privileged information. If you are not the intended recipient (or have received this information in error) please notify the sender immediately and destroy this information. Any unauthorized copying, disclosure or distribution of the material in this information is strictly forbidden. ________________________________ -------------- next part -------------- An HTML attachment was scrubbed... URL: From scale at us.ibm.com Tue Aug 20 13:39:30 2019 From: scale at us.ibm.com (IBM Spectrum Scale) Date: Tue, 20 Aug 2019 08:39:30 -0400 Subject: [gpfsug-discuss] Spectrum Scale Technote mmap In-Reply-To: References: Message-ID: Since Spectrum Scale 5.0.3.3 has not yet been released I think the reference to it in the notice was incorrect. It should have referred to version 5.0.3.2 as it does in other statements. Thanks for noting the discrepancy. I will alert the appropriate folks so this can be fixed. Regards, The Spectrum Scale (GPFS) team ------------------------------------------------------------------------------------------------------------------ If you feel that your question can benefit other users of Spectrum Scale (GPFS), then please post it to the public IBM developerWroks Forum at https://www.ibm.com/developerworks/community/forums/html/forum?id=11111111-0000-0000-0000-000000000479 . If your query concerns a potential software error in Spectrum Scale (GPFS) and you have an IBM software maintenance contract please contact 1-800-237-5511 in the United States or your local IBM Service Center in other countries. The forum is informally monitored as time permits and should not be used for priority messages to the Spectrum Scale (GPFS) team. From: "Grunenberg, Renar" To: "'gpfsug-discuss at spectrumscale.org'" Date: 08/20/2019 07:14 AM Subject: [EXTERNAL] [gpfsug-discuss] Spectrum Scale Technote mmap Sent by: gpfsug-discuss-bounces at spectrumscale.org Hallo All, can everyone clarify the effected Level?s in witch ptf is the problem and in witch is not. The Abstract mean for v5.0.3.0 to 5.0.3.2. But in the content it says 5.0.3.0 to 5.0.3.3? https://www-01.ibm.com/support/docview.wss?uid=ibm10960396 Regards Renar Renar Grunenberg Abteilung Informatik - Betrieb HUK-COBURG Bahnhofsplatz 96444 Coburg Telefon: 09561 96-44110 Telefax: 09561 96-44104 E-Mail: Renar.Grunenberg at huk-coburg.de Internet: www.huk.de HUK-COBURG Haftpflicht-Unterst?tzungs-Kasse kraftfahrender Beamter Deutschlands a. G. in Coburg Reg.-Gericht Coburg HRB 100; St.-Nr. 9212/101/00021 Sitz der Gesellschaft: Bahnhofsplatz, 96444 Coburg Vorsitzender des Aufsichtsrats: Prof. Dr. Heinrich R. Schradin. Vorstand: Klaus-J?rgen Heitmann (Sprecher), Stefan Gronbach, Dr. Hans Olav Her?y, Dr. J?rg Rheinl?nder, Sarah R?ssler, Daniel Thomas. Diese Nachricht enth?lt vertrauliche und/oder rechtlich gesch?tzte Informationen. Wenn Sie nicht der richtige Adressat sind oder diese Nachricht irrt?mlich erhalten haben, informieren Sie bitte sofort den Absender und vernichten Sie diese Nachricht. Das unerlaubte Kopieren sowie die unbefugte Weitergabe dieser Nachricht ist nicht gestattet. This information may contain confidential and/or privileged information. If you are not the intended recipient (or have received this information in error) please notify the sender immediately and destroy this information. Any unauthorized copying, disclosure or distribution of the material in this information is strictly forbidden. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=IbxtjdkPAM2Sbon4Lbbi4w&m=7HNkFdDgnBPjNYNHC2exZU3YxzqRkjfxYIu7Uxfma_k&s=e7nXQUVcE5O1J1kBoN1r9KLJtho2yEAJHdTsUyFkMXA&e= -------------- next part -------------- An HTML attachment was scrubbed... URL: From novosirj at rutgers.edu Wed Aug 21 17:03:12 2019 From: novosirj at rutgers.edu (Ryan Novosielski) Date: Wed, 21 Aug 2019 16:03:12 +0000 Subject: [gpfsug-discuss] mmfsd segfault/signal 6 on dirop.C:4548 in GPFS 5.0.2.x Message-ID: I posted this on Slack, but it?s serious enough that I want to make sure everyone sees it. Does anyone, from IBM or otherwise, have any more information about this/whether it was even announced anyplace? Thanks! A little late, but we ran into a relatively serious problem at our site with 5.0.2.3 at our site. The symptom is a mmfsd crash/segfault related to fs/dirop.C:4548. We ran into this sporadically, but it was repeatable on the problem workload. From IBM Support: 2. This is a known defect. The problem has been fixed through D.1073563: CTM_A_XW_FOR_DATA_IN_INODE related assert in DirLTE::lock A companion fix is D.1073753: Assert that the lock mode in DirLTE::lock is strong enough The rep further said "It's not an APAR since it's found in internal testing. It's an internal function at a place it should not assert but a part of the condition as the code path is specific to the DIR_UPDATE_LOCKMODE optimization code... The assert was meant for certain file creation code path, but the condition wasn't set strictly for that code path that some other code path could also run into the assert. So we cannot predict on which node it would happen.? The fix was setting disableAssert="dirop.C:4548, which can be done live. Anyone seen anything else about this anyplace? The bug is fixed in 5.0.3.x and was introduced in 5.0.2.0/1 (not sure what this version number means; I?ve seen them listed X.X.X.X.X.X, X.X.X-X.X, and others). -- ____ || \\UTGERS, |---------------------------*O*--------------------------- ||_// the State | Ryan Novosielski - novosirj at rutgers.edu || \\ University | Sr. Technologist - 973/972.0922 (2x0922) ~*~ RBHS Campus || \\ of NJ | Office of Advanced Research Computing - MSB C630, Newark `' From scale at us.ibm.com Wed Aug 21 18:10:47 2019 From: scale at us.ibm.com (IBM Spectrum Scale) Date: Wed, 21 Aug 2019 13:10:47 -0400 Subject: [gpfsug-discuss] =?utf-8?q?mmfsd_segfault/signal_6_on_dirop=2EC?= =?utf-8?b?OjQ1NDggaW4gR1BGUwk1LjAuMi54?= In-Reply-To: References: Message-ID: As was noted this problem is fixed in the Spectrum Scale 5.0.3 release stream. Regarding the version number format of 5.0.2.0/1 I assume that it is meant to convey version 5.0.2 efix 1. Regards, The Spectrum Scale (GPFS) team ------------------------------------------------------------------------------------------------------------------ If you feel that your question can benefit other users of Spectrum Scale (GPFS), then please post it to the public IBM developerWroks Forum at https://www.ibm.com/developerworks/community/forums/html/forum?id=11111111-0000-0000-0000-000000000479 . If your query concerns a potential software error in Spectrum Scale (GPFS) and you have an IBM software maintenance contract please contact 1-800-237-5511 in the United States or your local IBM Service Center in other countries. The forum is informally monitored as time permits and should not be used for priority messages to the Spectrum Scale (GPFS) team. From: Ryan Novosielski To: gpfsug main discussion list Date: 08/21/2019 12:04 PM Subject: [EXTERNAL] [gpfsug-discuss] mmfsd segfault/signal 6 on dirop.C:4548 in GPFS 5.0.2.x Sent by: gpfsug-discuss-bounces at spectrumscale.org I posted this on Slack, but it?s serious enough that I want to make sure everyone sees it. Does anyone, from IBM or otherwise, have any more information about this/whether it was even announced anyplace? Thanks! A little late, but we ran into a relatively serious problem at our site with 5.0.2.3 at our site. The symptom is a mmfsd crash/segfault related to fs/dirop.C:4548. We ran into this sporadically, but it was repeatable on the problem workload. From IBM Support: 2. This is a known defect. The problem has been fixed through D.1073563: CTM_A_XW_FOR_DATA_IN_INODE related assert in DirLTE::lock A companion fix is D.1073753: Assert that the lock mode in DirLTE::lock is strong enough The rep further said "It's not an APAR since it's found in internal testing. It's an internal function at a place it should not assert but a part of the condition as the code path is specific to the DIR_UPDATE_LOCKMODE optimization code... The assert was meant for certain file creation code path, but the condition wasn't set strictly for that code path that some other code path could also run into the assert. So we cannot predict on which node it would happen.? The fix was setting disableAssert="dirop.C:4548, which can be done live. Anyone seen anything else about this anyplace? The bug is fixed in 5.0.3.x and was introduced in 5.0.2.0/1 (not sure what this version number means; I?ve seen them listed X.X.X.X.X.X, X.X.X-X.X, and others). -- ____ || \\UTGERS, |---------------------------*O*--------------------------- ||_// the State | Ryan Novosielski - novosirj at rutgers.edu || \\ University | Sr. Technologist - 973/972.0922 (2x0922) ~*~ RBHS Campus || \\ of NJ | Office of Advanced Research Computing - MSB C630, Newark `' _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwIGaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=IbxtjdkPAM2Sbon4Lbbi4w&m=2DWKJiKyweSkGrSB_31bZQerI4xIgc6Z_Pw7iTGZpH4&s=oLoaU67CVtDLGyv_LZO8AqZRU739wj1q-PysELBsBow&e= -------------- next part -------------- An HTML attachment was scrubbed... URL: From novosirj at rutgers.edu Wed Aug 21 18:13:44 2019 From: novosirj at rutgers.edu (Ryan Novosielski) Date: Wed, 21 Aug 2019 17:13:44 +0000 Subject: [gpfsug-discuss] mmfsd segfault/signal 6 on dirop.C:4548 in GPFS 5.0.2.x In-Reply-To: References:

Message-ID: <2D8F22F5-1AE6-48EE-804F-8EB4AA3284B5@rutgers.edu> Has there been any official notification of this one? I can?t see anything about it anyplace other than in my support ticket. -- ____ || \\UTGERS, |---------------------------*O*--------------------------- ||_// the State | Ryan Novosielski - novosirj at rutgers.edu || \\ University | Sr. Technologist - 973/972.0922 (2x0922) ~*~ RBHS Campus || \\ of NJ | Office of Advanced Research Computing - MSB C630, Newark `' > On Aug 21, 2019, at 1:10 PM, IBM Spectrum Scale wrote: > > As was noted this problem is fixed in the Spectrum Scale 5.0.3 release stream. Regarding the version number format of 5.0.2.0/1 I assume that it is meant to convey version 5.0.2 efix 1. > > Regards, The Spectrum Scale (GPFS) team > > ------------------------------------------------------------------------------------------------------------------ > If you feel that your question can benefit other users of Spectrum Scale (GPFS), then please post it to the public IBM developerWroks Forum at https://www.ibm.com/developerworks/community/forums/html/forum?id=11111111-0000-0000-0000-000000000479. > > If your query concerns a potential software error in Spectrum Scale (GPFS) and you have an IBM software maintenance contract please contact 1-800-237-5511 in the United States or your local IBM Service Center in other countries. > > The forum is informally monitored as time permits and should not be used for priority messages to the Spectrum Scale (GPFS) team. > > > > From: Ryan Novosielski > To: gpfsug main discussion list > Date: 08/21/2019 12:04 PM > Subject: [EXTERNAL] [gpfsug-discuss] mmfsd segfault/signal 6 on dirop.C:4548 in GPFS 5.0.2.x > Sent by: gpfsug-discuss-bounces at spectrumscale.org > > > > I posted this on Slack, but it?s serious enough that I want to make sure everyone sees it. Does anyone, from IBM or otherwise, have any more information about this/whether it was even announced anyplace? Thanks! > > A little late, but we ran into a relatively serious problem at our site with 5.0.2.3 at our site. The symptom is a mmfsd crash/segfault related to fs/dirop.C:4548. We ran into this sporadically, but it was repeatable on the problem workload. From IBM Support: > > 2. This is a known defect. > The problem has been fixed through > D.1073563: CTM_A_XW_FOR_DATA_IN_INODE related assert in DirLTE::lock > A companion fix is > D.1073753: Assert that the lock mode in DirLTE::lock is strong enough > > > The rep further said "It's not an APAR since it's found in internal testing. It's an internal function at a place it should not assert but a part of the condition as the code path is specific to the DIR_UPDATE_LOCKMODE optimization code... The assert was meant for certain file creation code path, but the condition wasn't set strictly for that code path that some other code path could also run into the assert. So we cannot predict on which node it would happen.? > > The fix was setting disableAssert="dirop.C:4548, which can be done live. Anyone seen anything else about this anyplace? The bug is fixed in 5.0.3.x and was introduced in 5.0.2.0/1 (not sure what this version number means; I?ve seen them listed X.X.X.X.X.X, X.X.X-X.X, and others). > > -- > ____ > || \\UTGERS, |---------------------------*O*--------------------------- > ||_// the State | Ryan Novosielski - novosirj at rutgers.edu > || \\ University | Sr. Technologist - 973/972.0922 (2x0922) ~*~ RBHS Campus > || \\ of NJ | Office of Advanced Research Computing - MSB C630, Newark > `' > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss From scale at us.ibm.com Wed Aug 21 18:20:57 2019 From: scale at us.ibm.com (IBM Spectrum Scale) Date: Wed, 21 Aug 2019 13:20:57 -0400 Subject: [gpfsug-discuss] =?utf-8?q?mmfsd_segfault/signal_6_on_dirop=2EC?= =?utf-8?b?OjQ1NDggaW4JR1BGUwk1LjAuMi54?= In-Reply-To: <2D8F22F5-1AE6-48EE-804F-8EB4AA3284B5@rutgers.edu> References:

<2D8F22F5-1AE6-48EE-804F-8EB4AA3284B5@rutgers.edu> Message-ID: To my knowledge there has been no notification sent regarding this problem. Generally we only notify customers about problems that impact file system data corruption or data loss. This problem does cause the GPFS instance to abort and restart (assert) but it does not impact file system data. It seems in your case you may have been encountering the problem frequently. Regards, The Spectrum Scale (GPFS) team ------------------------------------------------------------------------------------------------------------------ If you feel that your question can benefit other users of Spectrum Scale (GPFS), then please post it to the public IBM developerWroks Forum at https://www.ibm.com/developerworks/community/forums/html/forum?id=11111111-0000-0000-0000-000000000479 . If your query concerns a potential software error in Spectrum Scale (GPFS) and you have an IBM software maintenance contract please contact 1-800-237-5511 in the United States or your local IBM Service Center in other countries. The forum is informally monitored as time permits and should not be used for priority messages to the Spectrum Scale (GPFS) team. From: Ryan Novosielski To: gpfsug main discussion list Date: 08/21/2019 01:14 PM Subject: [EXTERNAL] Re: [gpfsug-discuss] mmfsd segfault/signal 6 on dirop.C:4548 in GPFS 5.0.2.x Sent by: gpfsug-discuss-bounces at spectrumscale.org Has there been any official notification of this one? I can?t see anything about it anyplace other than in my support ticket. -- ____ || \\UTGERS, |---------------------------*O*--------------------------- ||_// the State | Ryan Novosielski - novosirj at rutgers.edu || \\ University | Sr. Technologist - 973/972.0922 (2x0922) ~*~ RBHS Campus || \\ of NJ | Office of Advanced Research Computing - MSB C630, Newark `' > On Aug 21, 2019, at 1:10 PM, IBM Spectrum Scale wrote: > > As was noted this problem is fixed in the Spectrum Scale 5.0.3 release stream. Regarding the version number format of 5.0.2.0/1 I assume that it is meant to convey version 5.0.2 efix 1. > > Regards, The Spectrum Scale (GPFS) team > > ------------------------------------------------------------------------------------------------------------------ > If you feel that your question can benefit other users of Spectrum Scale (GPFS), then please post it to the public IBM developerWroks Forum at https://www.ibm.com/developerworks/community/forums/html/forum?id=11111111-0000-0000-0000-000000000479 . > > If your query concerns a potential software error in Spectrum Scale (GPFS) and you have an IBM software maintenance contract please contact 1-800-237-5511 in the United States or your local IBM Service Center in other countries. > > The forum is informally monitored as time permits and should not be used for priority messages to the Spectrum Scale (GPFS) team. > > > > From: Ryan Novosielski > To: gpfsug main discussion list > Date: 08/21/2019 12:04 PM > Subject: [EXTERNAL] [gpfsug-discuss] mmfsd segfault/signal 6 on dirop.C:4548 in GPFS 5.0.2.x > Sent by: gpfsug-discuss-bounces at spectrumscale.org > > > > I posted this on Slack, but it?s serious enough that I want to make sure everyone sees it. Does anyone, from IBM or otherwise, have any more information about this/whether it was even announced anyplace? Thanks! > > A little late, but we ran into a relatively serious problem at our site with 5.0.2.3 at our site. The symptom is a mmfsd crash/segfault related to fs/dirop.C:4548. We ran into this sporadically, but it was repeatable on the problem workload. From IBM Support: > > 2. This is a known defect. > The problem has been fixed through > D.1073563: CTM_A_XW_FOR_DATA_IN_INODE related assert in DirLTE::lock > A companion fix is > D.1073753: Assert that the lock mode in DirLTE::lock is strong enough > > > The rep further said "It's not an APAR since it's found in internal testing. It's an internal function at a place it should not assert but a part of the condition as the code path is specific to the DIR_UPDATE_LOCKMODE optimization code... The assert was meant for certain file creation code path, but the condition wasn't set strictly for that code path that some other code path could also run into the assert. So we cannot predict on which node it would happen.? > > The fix was setting disableAssert="dirop.C:4548, which can be done live. Anyone seen anything else about this anyplace? The bug is fixed in 5.0.3.x and was introduced in 5.0.2.0/1 (not sure what this version number means; I?ve seen them listed X.X.X.X.X.X, X.X.X-X.X, and others). > > -- > ____ > || \\UTGERS, |---------------------------*O*--------------------------- > ||_// the State | Ryan Novosielski - novosirj at rutgers.edu > || \\ University | Sr. Technologist - 973/972.0922 (2x0922) ~*~ RBHS Campus > || \\ of NJ | Office of Advanced Research Computing - MSB C630, Newark > `' > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwIGaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=IbxtjdkPAM2Sbon4Lbbi4w&m=Vb5n4LNvsDMO5ku78vPAo49t9F87b7EsHcty3P7vTwo&s=yjhPrnHKQtpBzlJ-8gaCH4CHoo6pMXFMf1vLsCFg9WU&e= > > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwIGaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=IbxtjdkPAM2Sbon4Lbbi4w&m=Vb5n4LNvsDMO5ku78vPAo49t9F87b7EsHcty3P7vTwo&s=yjhPrnHKQtpBzlJ-8gaCH4CHoo6pMXFMf1vLsCFg9WU&e= _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwIGaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=IbxtjdkPAM2Sbon4Lbbi4w&m=Vb5n4LNvsDMO5ku78vPAo49t9F87b7EsHcty3P7vTwo&s=yjhPrnHKQtpBzlJ-8gaCH4CHoo6pMXFMf1vLsCFg9WU&e= -------------- next part -------------- An HTML attachment was scrubbed... URL: From novosirj at rutgers.edu Wed Aug 21 18:34:03 2019 From: novosirj at rutgers.edu (Ryan Novosielski) Date: Wed, 21 Aug 2019 17:34:03 +0000 Subject: [gpfsug-discuss] mmfsd segfault/signal 6 on dirop.C:4548 in GPFS 5.0.2.x In-Reply-To: References:

<2D8F22F5-1AE6-48EE-804F-8EB4AA3284B5@rutgers.edu> Message-ID: <6170DFBB-070A-4C72-816C-4D4F3D8B77FF@rutgers.edu> If there is any means for feedback, I really think that anything that causes a crash of mmfsd is absolutely cause to send a notice. Regardless of data corruption, it makes the software unusable in production under certain circumstances. There was a large customer impact at our site. We have a reproducible case if it is useful. One customer workload crashed every time, though it took almost a full day to get to that point so you can imagine the time wasted. > On Aug 21, 2019, at 1:20 PM, IBM Spectrum Scale wrote: > > To my knowledge there has been no notification sent regarding this problem. Generally we only notify customers about problems that impact file system data corruption or data loss. This problem does cause the GPFS instance to abort and restart (assert) but it does not impact file system data. It seems in your case you may have been encountering the problem frequently. > > Regards, The Spectrum Scale (GPFS) team > > ------------------------------------------------------------------------------------------------------------------ > If you feel that your question can benefit other users of Spectrum Scale (GPFS), then please post it to the public IBM developerWroks Forum at https://www.ibm.com/developerworks/community/forums/html/forum?id=11111111-0000-0000-0000-000000000479. > > If your query concerns a potential software error in Spectrum Scale (GPFS) and you have an IBM software maintenance contract please contact 1-800-237-5511 in the United States or your local IBM Service Center in other countries. > > The forum is informally monitored as time permits and should not be used for priority messages to the Spectrum Scale (GPFS) team. > > > > From: Ryan Novosielski > To: gpfsug main discussion list > Date: 08/21/2019 01:14 PM > Subject: [EXTERNAL] Re: [gpfsug-discuss] mmfsd segfault/signal 6 on dirop.C:4548 in GPFS 5.0.2.x > Sent by: gpfsug-discuss-bounces at spectrumscale.org > > > > Has there been any official notification of this one? I can?t see anything about it anyplace other than in my support ticket. > > -- > ____ > || \\UTGERS, |---------------------------*O*--------------------------- > ||_// the State | Ryan Novosielski - novosirj at rutgers.edu > || \\ University | Sr. Technologist - 973/972.0922 (2x0922) ~*~ RBHS Campus > || \\ of NJ | Office of Advanced Research Computing - MSB C630, Newark > `' > > > On Aug 21, 2019, at 1:10 PM, IBM Spectrum Scale wrote: > > > > As was noted this problem is fixed in the Spectrum Scale 5.0.3 release stream. Regarding the version number format of 5.0.2.0/1 I assume that it is meant to convey version 5.0.2 efix 1. > > > > Regards, The Spectrum Scale (GPFS) team > > > > ------------------------------------------------------------------------------------------------------------------ > > If you feel that your question can benefit other users of Spectrum Scale (GPFS), then please post it to the public IBM developerWroks Forum at https://www.ibm.com/developerworks/community/forums/html/forum?id=11111111-0000-0000-0000-000000000479. > > > > If your query concerns a potential software error in Spectrum Scale (GPFS) and you have an IBM software maintenance contract please contact 1-800-237-5511 in the United States or your local IBM Service Center in other countries. > > > > The forum is informally monitored as time permits and should not be used for priority messages to the Spectrum Scale (GPFS) team. > > > > > > > > From: Ryan Novosielski > > To: gpfsug main discussion list > > Date: 08/21/2019 12:04 PM > > Subject: [EXTERNAL] [gpfsug-discuss] mmfsd segfault/signal 6 on dirop.C:4548 in GPFS 5.0.2.x > > Sent by: gpfsug-discuss-bounces at spectrumscale.org > > > > > > > > I posted this on Slack, but it?s serious enough that I want to make sure everyone sees it. Does anyone, from IBM or otherwise, have any more information about this/whether it was even announced anyplace? Thanks! > > > > A little late, but we ran into a relatively serious problem at our site with 5.0.2.3 at our site. The symptom is a mmfsd crash/segfault related to fs/dirop.C:4548. We ran into this sporadically, but it was repeatable on the problem workload. From IBM Support: > > > > 2. This is a known defect. > > The problem has been fixed through > > D.1073563: CTM_A_XW_FOR_DATA_IN_INODE related assert in DirLTE::lock > > A companion fix is > > D.1073753: Assert that the lock mode in DirLTE::lock is strong enough > > > > > > The rep further said "It's not an APAR since it's found in internal testing. It's an internal function at a place it should not assert but a part of the condition as the code path is specific to the DIR_UPDATE_LOCKMODE optimization code... The assert was meant for certain file creation code path, but the condition wasn't set strictly for that code path that some other code path could also run into the assert. So we cannot predict on which node it would happen.? > > > > The fix was setting disableAssert="dirop.C:4548, which can be done live. Anyone seen anything else about this anyplace? The bug is fixed in 5.0.3.x and was introduced in 5.0.2.0/1 (not sure what this version number means; I?ve seen them listed X.X.X.X.X.X, X.X.X-X.X, and others). > > > > -- > > ____ > > || \\UTGERS, |---------------------------*O*--------------------------- > > ||_// the State | Ryan Novosielski - novosirj at rutgers.edu > > || \\ University | Sr. Technologist - 973/972.0922 (2x0922) ~*~ RBHS Campus > > || \\ of NJ | Office of Advanced Research Computing - MSB C630, Newark > > `' > > > > _______________________________________________ > > gpfsug-discuss mailing list > > gpfsug-discuss at spectrumscale.org > > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > > > > > > > _______________________________________________ > > gpfsug-discuss mailing list > > gpfsug-discuss at spectrumscale.org > > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss From scale at us.ibm.com Wed Aug 21 18:46:40 2019 From: scale at us.ibm.com (IBM Spectrum Scale) Date: Wed, 21 Aug 2019 13:46:40 -0400 Subject: [gpfsug-discuss] =?utf-8?q?mmfsd_segfault/signal_6_on_dirop=2EC?= =?utf-8?b?OjQ1NDgJaW4JR1BGUwk1LjAuMi54?= In-Reply-To: <6170DFBB-070A-4C72-816C-4D4F3D8B77FF@rutgers.edu> References:

<2D8F22F5-1AE6-48EE-804F-8EB4AA3284B5@rutgers.edu> <6170DFBB-070A-4C72-816C-4D4F3D8B77FF@rutgers.edu> Message-ID: We do appreciate the feedback. Since Spectrum Scale is a cluster based solution we do not consider the failure of a single node significant since the cluster will adjust to the loss of the node and access to the file data is not lost. It seems in this specific instance this problem was having a more significant impact in your environment. Presumably you have installed the available fix and are no longer encountering the problem. Regards, The Spectrum Scale (GPFS) team ------------------------------------------------------------------------------------------------------------------ If you feel that your question can benefit other users of Spectrum Scale (GPFS), then please post it to the public IBM developerWroks Forum at https://www.ibm.com/developerworks/community/forums/html/forum?id=11111111-0000-0000-0000-000000000479 . If your query concerns a potential software error in Spectrum Scale (GPFS) and you have an IBM software maintenance contract please contact 1-800-237-5511 in the United States or your local IBM Service Center in other countries. The forum is informally monitored as time permits and should not be used for priority messages to the Spectrum Scale (GPFS) team. From: Ryan Novosielski To: gpfsug main discussion list Date: 08/21/2019 01:34 PM Subject: [EXTERNAL] Re: [gpfsug-discuss] mmfsd segfault/signal 6 on dirop.C:4548 in GPFS 5.0.2.x Sent by: gpfsug-discuss-bounces at spectrumscale.org If there is any means for feedback, I really think that anything that causes a crash of mmfsd is absolutely cause to send a notice. Regardless of data corruption, it makes the software unusable in production under certain circumstances. There was a large customer impact at our site. We have a reproducible case if it is useful. One customer workload crashed every time, though it took almost a full day to get to that point so you can imagine the time wasted. > On Aug 21, 2019, at 1:20 PM, IBM Spectrum Scale wrote: > > To my knowledge there has been no notification sent regarding this problem. Generally we only notify customers about problems that impact file system data corruption or data loss. This problem does cause the GPFS instance to abort and restart (assert) but it does not impact file system data. It seems in your case you may have been encountering the problem frequently. > > Regards, The Spectrum Scale (GPFS) team > > ------------------------------------------------------------------------------------------------------------------ > If you feel that your question can benefit other users of Spectrum Scale (GPFS), then please post it to the public IBM developerWroks Forum at https://www.ibm.com/developerworks/community/forums/html/forum?id=11111111-0000-0000-0000-000000000479 . > > If your query concerns a potential software error in Spectrum Scale (GPFS) and you have an IBM software maintenance contract please contact 1-800-237-5511 in the United States or your local IBM Service Center in other countries. > > The forum is informally monitored as time permits and should not be used for priority messages to the Spectrum Scale (GPFS) team. > > > > From: Ryan Novosielski > To: gpfsug main discussion list > Date: 08/21/2019 01:14 PM > Subject: [EXTERNAL] Re: [gpfsug-discuss] mmfsd segfault/signal 6 on dirop.C:4548 in GPFS 5.0.2.x > Sent by: gpfsug-discuss-bounces at spectrumscale.org > > > > Has there been any official notification of this one? I can?t see anything about it anyplace other than in my support ticket. > > -- > ____ > || \\UTGERS, |---------------------------*O*--------------------------- > ||_// the State | Ryan Novosielski - novosirj at rutgers.edu > || \\ University | Sr. Technologist - 973/972.0922 (2x0922) ~*~ RBHS Campus > || \\ of NJ | Office of Advanced Research Computing - MSB C630, Newark > `' > > > On Aug 21, 2019, at 1:10 PM, IBM Spectrum Scale wrote: > > > > As was noted this problem is fixed in the Spectrum Scale 5.0.3 release stream. Regarding the version number format of 5.0.2.0/1 I assume that it is meant to convey version 5.0.2 efix 1. > > > > Regards, The Spectrum Scale (GPFS) team > > > > ------------------------------------------------------------------------------------------------------------------ > > If you feel that your question can benefit other users of Spectrum Scale (GPFS), then please post it to the public IBM developerWroks Forum at https://www.ibm.com/developerworks/community/forums/html/forum?id=11111111-0000-0000-0000-000000000479 . > > > > If your query concerns a potential software error in Spectrum Scale (GPFS) and you have an IBM software maintenance contract please contact 1-800-237-5511 in the United States or your local IBM Service Center in other countries. > > > > The forum is informally monitored as time permits and should not be used for priority messages to the Spectrum Scale (GPFS) team. > > > > > > > > From: Ryan Novosielski > > To: gpfsug main discussion list > > Date: 08/21/2019 12:04 PM > > Subject: [EXTERNAL] [gpfsug-discuss] mmfsd segfault/signal 6 on dirop.C:4548 in GPFS 5.0.2.x > > Sent by: gpfsug-discuss-bounces at spectrumscale.org > > > > > > > > I posted this on Slack, but it?s serious enough that I want to make sure everyone sees it. Does anyone, from IBM or otherwise, have any more information about this/whether it was even announced anyplace? Thanks! > > > > A little late, but we ran into a relatively serious problem at our site with 5.0.2.3 at our site. The symptom is a mmfsd crash/segfault related to fs/dirop.C:4548. We ran into this sporadically, but it was repeatable on the problem workload. From IBM Support: > > > > 2. This is a known defect. > > The problem has been fixed through > > D.1073563: CTM_A_XW_FOR_DATA_IN_INODE related assert in DirLTE::lock > > A companion fix is > > D.1073753: Assert that the lock mode in DirLTE::lock is strong enough > > > > > > The rep further said "It's not an APAR since it's found in internal testing. It's an internal function at a place it should not assert but a part of the condition as the code path is specific to the DIR_UPDATE_LOCKMODE optimization code... The assert was meant for certain file creation code path, but the condition wasn't set strictly for that code path that some other code path could also run into the assert. So we cannot predict on which node it would happen.? > > > > The fix was setting disableAssert="dirop.C:4548, which can be done live. Anyone seen anything else about this anyplace? The bug is fixed in 5.0.3.x and was introduced in 5.0.2.0/1 (not sure what this version number means; I?ve seen them listed X.X.X.X.X.X, X.X.X-X.X, and others). > > > > -- > > ____ > > || \\UTGERS, |---------------------------*O*--------------------------- > > ||_// the State | Ryan Novosielski - novosirj at rutgers.edu > > || \\ University | Sr. Technologist - 973/972.0922 (2x0922) ~*~ RBHS Campus > > || \\ of NJ | Office of Advanced Research Computing - MSB C630, Newark > > `' > > > > _______________________________________________ > > gpfsug-discuss mailing list > > gpfsug-discuss at spectrumscale.org > > https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwIGaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=IbxtjdkPAM2Sbon4Lbbi4w&m=HCjHUpCQ9fP06jd_TYHBTYqKKqy5-Uz6_whU-Q2N7Sg&s=ZohtBw4iz6ohlaFeZWXuNdHzw59RCEwLBbCHMXRRAkk&e= > > > > > > > > > > _______________________________________________ > > gpfsug-discuss mailing list > > gpfsug-discuss at spectrumscale.org > > https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwIGaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=IbxtjdkPAM2Sbon4Lbbi4w&m=HCjHUpCQ9fP06jd_TYHBTYqKKqy5-Uz6_whU-Q2N7Sg&s=ZohtBw4iz6ohlaFeZWXuNdHzw59RCEwLBbCHMXRRAkk&e= > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwIGaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=IbxtjdkPAM2Sbon4Lbbi4w&m=HCjHUpCQ9fP06jd_TYHBTYqKKqy5-Uz6_whU-Q2N7Sg&s=ZohtBw4iz6ohlaFeZWXuNdHzw59RCEwLBbCHMXRRAkk&e= > > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwIGaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=IbxtjdkPAM2Sbon4Lbbi4w&m=HCjHUpCQ9fP06jd_TYHBTYqKKqy5-Uz6_whU-Q2N7Sg&s=ZohtBw4iz6ohlaFeZWXuNdHzw59RCEwLBbCHMXRRAkk&e= _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwIGaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=IbxtjdkPAM2Sbon4Lbbi4w&m=HCjHUpCQ9fP06jd_TYHBTYqKKqy5-Uz6_whU-Q2N7Sg&s=ZohtBw4iz6ohlaFeZWXuNdHzw59RCEwLBbCHMXRRAkk&e= -------------- next part -------------- An HTML attachment was scrubbed... URL: From novosirj at rutgers.edu Wed Aug 21 18:54:28 2019 From: novosirj at rutgers.edu (Ryan Novosielski) Date: Wed, 21 Aug 2019 17:54:28 +0000 Subject: [gpfsug-discuss] mmfsd segfault/signal 6 on dirop.C:4548 in GPFS 5.0.2.x In-Reply-To: References:

<2D8F22F5-1AE6-48EE-804F-8EB4AA3284B5@rutgers.edu> <6170DFBB-070A-4C72-816C-4D4F3D8B77FF@rutgers.edu> Message-ID: <8D9E4CAF-8B31-42C5-82B3-3B5EE4D132D3@rutgers.edu> There again, I would disagree. It is true that any one occurrence would affect one node, but it?s quite possible to conceive of a scenario where it happened on all nodes at around the same time, if conducting the right operations. We haven?t installed the EFIX (we have Lenovo equipment and generally use the Lenovo software ? not sure how that fits in here), but we applied the described workaround (disableAssert=dirop.C:4548) and are planning to upgrade our clients as soon as we can get 5.0.3.3 software (since <5.0.3.3 contains the other mmap read bug). > On Aug 21, 2019, at 1:46 PM, IBM Spectrum Scale wrote: > > We do appreciate the feedback. Since Spectrum Scale is a cluster based solution we do not consider the failure of a single node significant since the cluster will adjust to the loss of the node and access to the file data is not lost. It seems in this specific instance this problem was having a more significant impact in your environment. Presumably you have installed the available fix and are no longer encountering the problem. > > Regards, The Spectrum Scale (GPFS) team > > ------------------------------------------------------------------------------------------------------------------ > If you feel that your question can benefit other users of Spectrum Scale (GPFS), then please post it to the public IBM developerWroks Forum at https://www.ibm.com/developerworks/community/forums/html/forum?id=11111111-0000-0000-0000-000000000479. > > If your query concerns a potential software error in Spectrum Scale (GPFS) and you have an IBM software maintenance contract please contact 1-800-237-5511 in the United States or your local IBM Service Center in other countries. > > The forum is informally monitored as time permits and should not be used for priority messages to the Spectrum Scale (GPFS) team. > > > > From: Ryan Novosielski > To: gpfsug main discussion list > Date: 08/21/2019 01:34 PM > Subject: [EXTERNAL] Re: [gpfsug-discuss] mmfsd segfault/signal 6 on dirop.C:4548 in GPFS 5.0.2.x > Sent by: gpfsug-discuss-bounces at spectrumscale.org > > > > If there is any means for feedback, I really think that anything that causes a crash of mmfsd is absolutely cause to send a notice. Regardless of data corruption, it makes the software unusable in production under certain circumstances. There was a large customer impact at our site. We have a reproducible case if it is useful. One customer workload crashed every time, though it took almost a full day to get to that point so you can imagine the time wasted. > > > On Aug 21, 2019, at 1:20 PM, IBM Spectrum Scale wrote: > > > > To my knowledge there has been no notification sent regarding this problem. Generally we only notify customers about problems that impact file system data corruption or data loss. This problem does cause the GPFS instance to abort and restart (assert) but it does not impact file system data. It seems in your case you may have been encountering the problem frequently. > > > > Regards, The Spectrum Scale (GPFS) team > > > > ------------------------------------------------------------------------------------------------------------------ > > If you feel that your question can benefit other users of Spectrum Scale (GPFS), then please post it to the public IBM developerWroks Forum at https://www.ibm.com/developerworks/community/forums/html/forum?id=11111111-0000-0000-0000-000000000479. > > > > If your query concerns a potential software error in Spectrum Scale (GPFS) and you have an IBM software maintenance contract please contact 1-800-237-5511 in the United States or your local IBM Service Center in other countries. > > > > The forum is informally monitored as time permits and should not be used for priority messages to the Spectrum Scale (GPFS) team. > > > > > > > > From: Ryan Novosielski > > To: gpfsug main discussion list > > Date: 08/21/2019 01:14 PM > > Subject: [EXTERNAL] Re: [gpfsug-discuss] mmfsd segfault/signal 6 on dirop.C:4548 in GPFS 5.0.2.x > > Sent by: gpfsug-discuss-bounces at spectrumscale.org > > > > > > > > Has there been any official notification of this one? I can?t see anything about it anyplace other than in my support ticket. > > > > -- > > ____ > > || \\UTGERS, |---------------------------*O*--------------------------- > > ||_// the State | Ryan Novosielski - novosirj at rutgers.edu > > || \\ University | Sr. Technologist - 973/972.0922 (2x0922) ~*~ RBHS Campus > > || \\ of NJ | Office of Advanced Research Computing - MSB C630, Newark > > `' > > > > > On Aug 21, 2019, at 1:10 PM, IBM Spectrum Scale wrote: > > > > > > As was noted this problem is fixed in the Spectrum Scale 5.0.3 release stream. Regarding the version number format of 5.0.2.0/1 I assume that it is meant to convey version 5.0.2 efix 1. > > > > > > Regards, The Spectrum Scale (GPFS) team > > > > > > ------------------------------------------------------------------------------------------------------------------ > > > If you feel that your question can benefit other users of Spectrum Scale (GPFS), then please post it to the public IBM developerWroks Forum at https://www.ibm.com/developerworks/community/forums/html/forum?id=11111111-0000-0000-0000-000000000479. > > > > > > If your query concerns a potential software error in Spectrum Scale (GPFS) and you have an IBM software maintenance contract please contact 1-800-237-5511 in the United States or your local IBM Service Center in other countries. > > > > > > The forum is informally monitored as time permits and should not be used for priority messages to the Spectrum Scale (GPFS) team. > > > > > > > > > > > > From: Ryan Novosielski > > > To: gpfsug main discussion list > > > Date: 08/21/2019 12:04 PM > > > Subject: [EXTERNAL] [gpfsug-discuss] mmfsd segfault/signal 6 on dirop.C:4548 in GPFS 5.0.2.x > > > Sent by: gpfsug-discuss-bounces at spectrumscale.org > > > > > > > > > > > > I posted this on Slack, but it?s serious enough that I want to make sure everyone sees it. Does anyone, from IBM or otherwise, have any more information about this/whether it was even announced anyplace? Thanks! > > > > > > A little late, but we ran into a relatively serious problem at our site with 5.0.2.3 at our site. The symptom is a mmfsd crash/segfault related to fs/dirop.C:4548. We ran into this sporadically, but it was repeatable on the problem workload. From IBM Support: > > > > > > 2. This is a known defect. > > > The problem has been fixed through > > > D.1073563: CTM_A_XW_FOR_DATA_IN_INODE related assert in DirLTE::lock > > > A companion fix is > > > D.1073753: Assert that the lock mode in DirLTE::lock is strong enough > > > > > > > > > The rep further said "It's not an APAR since it's found in internal testing. It's an internal function at a place it should not assert but a part of the condition as the code path is specific to the DIR_UPDATE_LOCKMODE optimization code... The assert was meant for certain file creation code path, but the condition wasn't set strictly for that code path that some other code path could also run into the assert. So we cannot predict on which node it would happen.? > > > > > > The fix was setting disableAssert="dirop.C:4548, which can be done live. Anyone seen anything else about this anyplace? The bug is fixed in 5.0.3.x and was introduced in 5.0.2.0/1 (not sure what this version number means; I?ve seen them listed X.X.X.X.X.X, X.X.X-X.X, and others). > > > > > > -- > > > ____ > > > || \\UTGERS, |---------------------------*O*--------------------------- > > > ||_// the State | Ryan Novosielski - novosirj at rutgers.edu > > > || \\ University | Sr. Technologist - 973/972.0922 (2x0922) ~*~ RBHS Campus > > > || \\ of NJ | Office of Advanced Research Computing - MSB C630, Newark > > > `' > > > > > > _______________________________________________ > > > gpfsug-discuss mailing list > > > gpfsug-discuss at spectrumscale.org > > > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > > > > > > > > > > > > _______________________________________________ > > > gpfsug-discuss mailing list > > > gpfsug-discuss at spectrumscale.org > > > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > _______________________________________________ > > gpfsug-discuss mailing list > > gpfsug-discuss at spectrumscale.org > > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > > > > > > > _______________________________________________ > > gpfsug-discuss mailing list > > gpfsug-discuss at spectrumscale.org > > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss From Craig.Abram at gmfinancial.com Thu Aug 29 14:58:36 2019 From: Craig.Abram at gmfinancial.com (Craig.Abram at gmfinancial.com) Date: Thu, 29 Aug 2019 13:58:36 +0000 Subject: [gpfsug-discuss] Backup question Message-ID: <41363a9ff37a4cf19245ba67d5f43077@gmfinancial.com> Are there any other options to backup up GPFS other that Spectrum Protect ? ________________________________ Notice to all users The information contained in this email, including any attachment(s) is confidential and intended solely for the addressee and may contain privileged, confidential or restricted information. If you are not the intended recipient or responsible to deliver to the intended recipient, you are hereby notified that any dissemination, distribution or copying of this communication is strictly prohibited. If you received this message in error please notify the originator and then delete. Neither, the sender or GMF's network will be liable for direct, indirect or consequential infection by viruses associated with this email. -------------- next part -------------- An HTML attachment was scrubbed... URL: From luis.bolinches at fi.ibm.com Thu Aug 29 15:13:14 2019 From: luis.bolinches at fi.ibm.com (Luis Bolinches) Date: Thu, 29 Aug 2019 14:13:14 +0000 Subject: [gpfsug-discuss] Backup question In-Reply-To: <41363a9ff37a4cf19245ba67d5f43077@gmfinancial.com> Message-ID: You can backup GPFS with basically anything that can read a POSIX filesystem. Do you refer to refer to mmbackup integration? -- Cheers > On 29 Aug 2019, at 17.09, Craig.Abram at gmfinancial.com wrote: > > > > Are there any other options to backup up GPFS other that Spectrum Protect ? > > > > Notice to all users The information contained in this email, including any attachment(s) is confidential and intended solely for the addressee and may contain privileged, confidential or restricted information. If you are not the intended recipient or responsible to deliver to the intended recipient, you are hereby notified that any dissemination, distribution or copying of this communication is strictly prohibited. If you received this message in error please notify the originator and then delete. Neither, the sender or GMF's network will be liable for direct, indirect or consequential infection by viruses associated with this email. Ellei edell?? ole toisin mainittu: / Unless stated otherwise above: Oy IBM Finland Ab PL 265, 00101 Helsinki, Finland Business ID, Y-tunnus: 0195876-3 Registered in Finland -------------- next part -------------- An HTML attachment was scrubbed... URL: From stockf at us.ibm.com Thu Aug 29 15:14:34 2019 From: stockf at us.ibm.com (Frederick Stock) Date: Thu, 29 Aug 2019 14:14:34 +0000 Subject: [gpfsug-discuss] Backup question In-Reply-To: <41363a9ff37a4cf19245ba67d5f43077@gmfinancial.com> References: <41363a9ff37a4cf19245ba67d5f43077@gmfinancial.com> Message-ID: An HTML attachment was scrubbed... URL: From oehmes at gmail.com Thu Aug 29 15:19:23 2019 From: oehmes at gmail.com (Sven Oehme) Date: Thu, 29 Aug 2019 07:19:23 -0700 Subject: [gpfsug-discuss] Backup question In-Reply-To: References: Message-ID: <08F5DE8C-9C95-45D8-8711-229D76985329@gmail.com> while it is true that you can backup the data with everything that can read a POSIX filesystem, you will miss all the metadata associated like extended attributes and ACL?s. beside mmbackup (which uses spectrum protect) DDN also offers a product for data management including backup/restore (but also migration and other scenarios) that preserves the metadata information. you can get more info here ?> https://www.ddn.com/products/dataflow-backup-archive-data-migration/ sven Sent from my iPad > On Aug 29, 2019, at 7:13 AM, Luis Bolinches wrote: > > You can backup GPFS with basically anything that can read a POSIX filesystem. > > Do you refer to refer to mmbackup integration? > > -- > Cheers > >> On 29 Aug 2019, at 17.09, Craig.Abram at gmfinancial.com wrote: >> >> >> >> Are there any other options to backup up GPFS other that Spectrum Protect ? >> >> >> >> Notice to all users The information contained in this email, including any attachment(s) is confidential and intended solely for the addressee and may contain privileged, confidential or restricted information. If you are not the intended recipient or responsible to deliver to the intended recipient, you are hereby notified that any dissemination, distribution or copying of this communication is strictly prohibited. If you received this message in error please notify the originator and then delete. Neither, the sender or GMF's network will be liable for direct, indirect or consequential infection by viruses associated with this email. > > Ellei edell? ole toisin mainittu: / Unless stated otherwise above: > Oy IBM Finland Ab > PL 265, 00101 Helsinki, Finland > Business ID, Y-tunnus: 0195876-3 > Registered in Finland > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From kkr at lbl.gov Fri Aug 30 03:30:22 2019 From: kkr at lbl.gov (Kristy Kallback-Rose) Date: Thu, 29 Aug 2019 19:30:22 -0700 Subject: [gpfsug-discuss] Hold the Date - September 23 and 24 In-Reply-To: <3F2B08E9-C6E3-412B-9308-D79E3480C5DA@lbl.gov> References: <3F2B08E9-C6E3-412B-9308-D79E3480C5DA@lbl.gov> Message-ID: <938EC571-B900-42BC-8465-3E666912533F@lbl.gov> Hello, You will now find the nearly complete agenda here: https://www.spectrumscaleug.org/event/spectrum-scale-gpfs-days-part-of-hpcxxl/ As noted before, the event is free, but please do register below to help with catering planning. You can find more information about the full HPCXXL event here: http://hpcxxl.org/ Any questions let us know. Hope to see you there! -Kristy > On Jul 2, 2019, at 10:45 AM, Kristy Kallback-Rose wrote: > > Hello, > > HPCXXL will be hosted by NERSC (Berkeley, CA) this September. As part of this event, there will be approximately a day and a half on GPFS content. We have done this type of event in the past, and as before, the GPFS days will be free to attend, but you do need to register. > > We?ll have more details soon, mark your calendars. > > Initial details: https://www.spectrumscaleug.org/event/spectrum-scale-gpfs-days-part-of-hpcxxl/ > > Best, > Kristy -------------- next part -------------- An HTML attachment was scrubbed... URL: From jtucker at pixitmedia.com Fri Aug 30 09:07:24 2019 From: jtucker at pixitmedia.com (Jez Tucker) Date: Fri, 30 Aug 2019 09:07:24 +0100 Subject: [gpfsug-discuss] Backup question In-Reply-To: <41363a9ff37a4cf19245ba67d5f43077@gmfinancial.com> References: <41363a9ff37a4cf19245ba67d5f43077@gmfinancial.com> Message-ID: Hey ho, ? You may wish to evaluate NetBackup (I have not). https://www.veritas.com/content/support/en_US/doc/18716246-126559472-0/v107946958-126559472 You can accelerate most 3rd party backup solutions by driving file lists via policy. With a bit of additional development, it's not hard to achieve something extremely close to mmbackup. Perhaps the prudent question is; - What issue(s) make Spectrum Protect not the first solution of choice? Best, Jez On 29/08/2019 14:58, Craig.Abram at gmfinancial.com wrote: > > Are there any other options to backup up GPFS other that Spectrum > Protect ? > > > ------------------------------------------------------------------------ > > Notice to all users The information contained in this email, including > any attachment(s) is confidential and intended solely for the > addressee and may contain privileged, confidential or restricted > information. If you are not the intended recipient or responsible to > deliver to the intended recipient, you are hereby notified that any > dissemination, distribution or copying of this communication is > strictly prohibited. If you received this message in error please > notify the originator and then delete. Neither, the sender or GMF's > network will be liable for direct, indirect or consequential infection > by viruses associated with this email. > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss -- *Jez Tucker* Head of Research and Development, Pixit Media 07764193820 | jtucker at pixitmedia.com www.pixitmedia.com | Tw:@pixitmedia.com -- This email is confidential in that it is intended for the exclusive attention of the addressee(s) indicated. If you are not the intended recipient, this email should not be read or disclosed to any other person. Please notify the sender immediately and delete this email from your computer system. Any opinions expressed are not necessarily those of the company from which this email was sent and, whilst to the best of our knowledge no viruses or defects exist, no responsibility can be accepted for any loss or damage arising from its receipt or subsequent use of this email. -------------- next part -------------- An HTML attachment was scrubbed... URL: From makaplan at us.ibm.com Fri Aug 30 14:16:17 2019 From: makaplan at us.ibm.com (Marc A Kaplan) Date: Fri, 30 Aug 2019 09:16:17 -0400 Subject: [gpfsug-discuss] Backup question In-Reply-To: References: <41363a9ff37a4cf19245ba67d5f43077@gmfinancial.com> Message-ID: Certainly fair and understandable to look at alternatives to the IBM backup/restore/HSM products and possibly mixing-matching. But thanks, Jez for raising the question: What do you see as strengths and weaknesses of the alternatives, IBM and others? AND as long as you are considering alternatives, here's another: HPSS http://www.hpss-collaboration.org/ghi.shtml which, from the beginning, was designed to work with GPFS. From: Jez Tucker To: gpfsug-discuss at spectrumscale.org Date: 08/30/2019 04:07 AM Subject: [EXTERNAL] Re: [gpfsug-discuss] Backup question Sent by: gpfsug-discuss-bounces at spectrumscale.org Hey ho, ? You may wish to evaluate NetBackup (I have not). https://www.veritas.com/content/support/en_US/doc/18716246-126559472-0/v107946958-126559472 You can accelerate most 3rd party backup solutions by driving file lists via policy. With a bit of additional development, it's not hard to achieve something extremely close to mmbackup. Perhaps the prudent question is; - What issue(s) make Spectrum Protect not the first solution of choice? Best, Jez On 29/08/2019 14:58, Craig.Abram at gmfinancial.com wrote: Are there any other options to backup up GPFS other that Spectrum Protect ? Notice to all users The information contained in this email, including any attachment(s) is confidential and intended solely for the addressee and may contain privileged, confidential or restricted information. If you are not the intended recipient or responsible to deliver to the intended recipient, you are hereby notified that any dissemination, distribution or copying of this communication is strictly prohibited. If you received this message in error please notify the originator and then delete. Neither, the sender or GMF's network will be liable for direct, indirect or consequential infection by viruses associated with this email. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -- Jez Tucker Head of Research and Development, Pixit Media 07764193820?|?jtucker at pixitmedia.com www.pixitmedia.com?|?Tw:@pixitmedia.com This email is confidential in that it is intended for the exclusive attention of the addressee(s) indicated. If you are not the intended recipient, this email should not be read or disclosed to any other person. Please notify the sender immediately and delete this email from your computer system. Any opinions expressed are not necessarily those of the company from which this email was sent and, whilst to the best of our knowledge no viruses or defects exist, no responsibility can be accepted for any loss or damage arising from its receipt or subsequent use of this email._______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=cvpnBBH0j41aQy0RPiG2xRL_M8mTc1izuQD3_PmtjZ8&m=U27EFbJGDImYvJu1bHWkn2F8wDttdtFDvQnazTtsZU4&s=OeU7xEE9o5ycn4_u3x4W3W_qraCPmVvftuWSfObWmso&e= -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: From daniel.kidger at uk.ibm.com Fri Aug 30 16:07:45 2019 From: daniel.kidger at uk.ibm.com (Daniel Kidger) Date: Fri, 30 Aug 2019 15:07:45 +0000 Subject: [gpfsug-discuss] Backup question In-Reply-To: References: , <41363a9ff37a4cf19245ba67d5f43077@gmfinancial.com> Message-ID: An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: Image.1__=0ABB0EF5DFDB98B68f9e8a93df938690918c0AB at .gif Type: image/gif Size: 105 bytes Desc: not available URL: From jan.sundermann at kit.edu Mon Aug 5 17:01:10 2019 From: jan.sundermann at kit.edu (Sundermann, Jan Erik (SCC)) Date: Mon, 5 Aug 2019 16:01:10 +0000 Subject: [gpfsug-discuss] Moving data between dependent filesets Message-ID: <4C53EA39-51BA-4E3F-BF33-38606A560B7F@kit.edu> Dear all, I am trying to understand how to move data efficiently between filesets sharing the same inode space. I have an independent fileset fs1 which contains data that I would like to move to a newly created dependent fileset fs2. fs1 and fs2 are sharing the same inode space. Apparently calling mv is copying the data instead of just moving it. Using strace on mv prints lines like renameat2(AT_FDCWD, "subdir1/file257", AT_FDCWD, "../filesettest/subdir1/file257", 0) = -1 EXDEV (Invalid cross-device link) Is there an efficient way to move the data between the filesets fs1 and fs2? Best regards Jan Erik -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 5102 bytes Desc: not available URL: From jfosburg at mdanderson.org Mon Aug 5 17:07:14 2019 From: jfosburg at mdanderson.org (Fosburgh,Jonathan) Date: Mon, 5 Aug 2019 16:07:14 +0000 Subject: [gpfsug-discuss] [EXT] Moving data between dependent filesets Message-ID: AFAIK, the mv command treats filesets as separate filesystems, even when sharing the same inode space. -- Jonathan Fosburgh Principal Application Systems Analyst IT Operations Storage Team The University of Texas MD Anderson Cancer Center (713) 745-9346 ?On 8/5/19, 11:02 AM, "gpfsug-discuss-bounces at spectrumscale.org on behalf of Sundermann, Jan Erik (SCC)" wrote: Dear all, I am trying to understand how to move data efficiently between filesets sharing the same inode space. I have an independent fileset fs1 which contains data that I would like to move to a newly created dependent fileset fs2. fs1 and fs2 are sharing the same inode space. Apparently calling mv is copying the data instead of just moving it. Using strace on mv prints lines like renameat2(AT_FDCWD, "subdir1/file257", AT_FDCWD, "../filesettest/subdir1/file257", 0) = -1 EXDEV (Invalid cross-device link) Is there an efficient way to move the data between the filesets fs1 and fs2? Best regards Jan Erik The information contained in this e-mail message may be privileged, confidential, and/or protected from disclosure. This e-mail message may contain protected health information (PHI); dissemination of PHI should comply with applicable federal and state laws. If you are not the intended recipient, or an authorized representative of the intended recipient, any further review, disclosure, use, dissemination, distribution, or copying of this message or any attachment (or the information contained therein) is strictly prohibited. If you think that you have received this e-mail message in error, please notify the sender by return e-mail and delete all references to it and its contents from your systems. From makaplan at us.ibm.com Thu Aug 8 22:08:11 2019 From: makaplan at us.ibm.com (Marc A Kaplan) Date: Thu, 8 Aug 2019 17:08:11 -0400 Subject: [gpfsug-discuss] Asymetric SAN with GPFS In-Reply-To: <20190729161138.znuss5dt2rhig6cv@ics.muni.cz> References: <20190729161138.znuss5dt2rhig6cv@ics.muni.cz> Message-ID: Think GPFS POOLs ... Generally, you should not mix different kinds of LUNs into the same GPFS POOL. From: Lukas Hejtmanek To: gpfsug-discuss at spectrumscale.org Date: 07/29/2019 12:12 PM Subject: [EXTERNAL] [gpfsug-discuss] Asymetric SAN with GPFS Sent by: gpfsug-discuss-bounces at spectrumscale.org Hello, is there any settings for GPFS 5.x so that you could mitigate slow down of asymmetric SAN? The asymmetric SAN means, that not every LUN has the same speed, or not every disk array has the same number of LUNs. It seems that overal speed is degraded to the slowest LUN. Is there any workaround for this except avoiding that LUN at all? -- Luk?? Hejtm?nek Linux Administrator only because Full Time Multitasking Ninja is not an official job title _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwIFBA&c=jf_iaSHvJObTbx-siA1ZOg&r=cvpnBBH0j41aQy0RPiG2xRL_M8mTc1izuQD3_PmtjZ8&m=4g2oEvesNo0UZtsAeNzm33hQ9jYDALfLllZkZ2nMpak&s=aRp743C0EJjuap9PuF7U5CoPcqti2RRnGi6CwKojYtI&e= -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: From robert at strubi.ox.ac.uk Fri Aug 9 14:29:14 2019 From: robert at strubi.ox.ac.uk (Robert Esnouf) Date: Fri, 09 Aug 2019 14:29:14 +0100 Subject: [gpfsug-discuss] relion software using GPFS storage In-Reply-To: References: Message-ID: <230cab76cb83cd153d2f9ed33c1e47e5@strubi.ox.ac.uk> Dear Wei, Not a lot on information to go on there... e.g. layout of the MPI processes on compute nodes, the interconnect and the GPFS settings... but the standout information appears to be: "10X slower than local SSD, and nfs reexport of another gpfs filesystem" "The per process IO is very slow, 4-5 MiB/s, while on ssd and nfs I got 20-40 MiB/s" You also not 2GB/s performance for 4MB writes, and 1.7GB/s read. That is only 500 IOPS, I assume you'd see more with 4kB reads/writes. I'd also note that 10x slower is kind of an intermediate number, its bad but not totally unproductive. I think the likely issues are going to be around the GPFS (client) config, although you might also be struggling with IOPS. The fact that the NFS re-export trick works (allowing O/S-level lazy caching and instant re-opening of files) suggests that total performance is not your issue. Upping the pagepool and/or maxStatCache etc may just make all these issues go away. If I picked out the right benchmark, then it is one with a 360 box size which is not too small... I don't know how many files comprise your particle set... Regards, Robert -- Dr Robert Esnouf University Research Lecturer, Director of Research Computing BDI, Head of Research Computing Core WHG, NDM Research Computing Strategy Officer Main office: Room 10/028, Wellcome Centre for Human Genetics, Old Road Campus, Roosevelt Drive, Oxford OX3 7BN, UK Emails: robert at strubi.ox.ac.uk / robert at well.ox.ac.uk / robert.esnouf at bdi.ox.ac.uk Tel: (+44)-1865-287783 (WHG); (+44)-1865-743689 (BDI) ? ----- Original Message ----- From: Guo, Wei (Wei.Guo at STJUDE.ORG) Date: 08/08/19 23:19 To: gpfsug-discuss at spectrumscale.org, robert at strubi.ox.ac.uk, robert at well.ox.ac.uk, robert.esnouf at bid.ox.ac.uk Subject: [gpfsug-discuss] relion software using GPFS storage Hi, Robert and Michael, What are the settings within relion for parallel file systems? Sorry to bump this old threads, as I don't see any further conversation, and I cannot join the mailing list recently due to the spectrumscale.org:10000 web server error. I used to be in this mailing list with my previous work (email). The problem is I also see Relion 3 does not like GPFS. It is obscenely slow, slower than anything... local ssd, nfs reexport of gpfs. I am using the standard benchmarks from Relion 3 website. The mpirun -n 9 `which relion_refine_mpi` is 10X slower than local SSD, and nfs reexport of another gpfs filesystem. The latter two I can get close results (1hr25min) as compared with the publish results (1hr13min) on the same Intel Xeon Gold 6148 CPU @2.40GHz and 4 V100 GPU cards, with the same command. Running the same standard benchmark it takes 15-20 min for one iteration, should be <1.7 mins. The per process IO is very slow, 4-5 MiB/s, while on ssd and nfs I got 20-40 MiB/s if watching the /proc//io of the relion_refine processes. My gpfs client can see ~2GB/s when benchmarking with IOZONE, yes, 2GB/s because of small system, 70? drives. Record Size 4096 kB O_DIRECT feature enabled File size set to 20971520 kB Command line used: iozone -r 4m -I -t 16 -s 20g Output is in kBytes/sec Time Resolution = 0.000001 seconds. Processor cache size set to 1024 kBytes. Processor cache line size set to 32 bytes. File stride size set to 17 * record size. Throughput test with 16 processes Each process writes a 20971520 kByte file in 4096 kByte records Children see throughput for 16 initial writers = 1960218.38 kB/sec Parent sees throughput for 16 initial writers = 1938463.07 kB/sec Min throughput per process = ?120415.66 kB/sec? Max throughput per process = ?123652.07 kB/sec Avg throughput per process = ?122513.65 kB/sec Min xfer = 20426752.00 kB Children see throughput for 16 readers = 1700354.00 kB/sec Parent sees throughput for 16 readers = 1700046.71 kB/sec Min throughput per process = ?104587.73 kB/sec? Max throughput per process = ?108182.84 kB/sec Avg throughput per process = ?106272.12 kB/sec Min xfer = 20275200.00 kB The --no_parallel_disk_io is even worse. --only_do_unfinished_movies does not help much. Please advise. Thanks Wei Guo Computational Engineer, St Jude Children's Research Hospital wei.guo at stjude.org Dear Michael, There are settings within relion for parallel file systems, you should check they are enabled if you have SS underneath. Otherwise, check which version of relion and then try to understand the problem that is being analysed a little more. If the box size is very small and the internal symmetry low then the user may read 100,000s of small "picked particle" files for each iteration opening and closing the files each time. I believe that relion3 has some facility for extracting these small particles from the larger raw images and that is more SS-friendly. Alternatively, the size of the set of picked particles is often only in 50GB range and so staging to one or more local machines is quite feasible... Hope one of those suggestions helps. Regards, Robert -- Dr Robert Esnouf University Research Lecturer, Director of Research Computing BDI, Head of Research Computing Core WHG, NDM Research Computing Strategy Officer Main office: Room 10/028, Wellcome Centre for Human Genetics, Old Road Campus, Roosevelt Drive, Oxford OX3 7BN, UK Emails: robert at strubi.ox.ac.uk / robert at well.ox.ac.uk / robert.esnouf at bdi.ox.ac.uk Tel: (+44)-1865-287783 (WHG); (+44)-1865-743689 (BDI) ? -----Original Message----- From: "Michael Holliday" To: gpfsug-discuss at spectrumscale.org Date: 27/02/19 12:21 Subject: [gpfsug-discuss] relion software using GPFS storage Hi All, ? We?ve recently had an issue where a job on our client GPFS cluster caused out main storage to go extremely slowly.? ?The job was running relion using MPI (https://www2.mrc-lmb.cam.ac.uk/relion/index.php?title=Main_Page) ? It caused waiters across the cluster, and caused the load to spike on NSDS on at a time.? When the spike ended on one NSD, it immediately started on another.? ? There were no obvious errors in the logs and the issues cleared immediately after the job was cancelled.? ? Has anyone else see any issues with relion using GPFS storage? ? Michael ? Michael Holliday RITTech MBCS Senior HPC & Research Data Systems Engineer | eMedLab Operations Team Scientific Computing STP | The Francis Crick Institute 1, Midland Road | London | NW1 1AT | United Kingdom Tel: 0203 796 3167 ? The Francis Crick Institute Limited is a registered charity in England and Wales no. 1140062 and a company registered in England and Wales no. 06885462, with its registered office at 1 Midland Road London NW1 1AT _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss Email Disclaimer: www.stjude.org/emaildisclaimer Consultation Disclaimer: www.stjude.org/consultationdisclaimer -------------- next part -------------- An HTML attachment was scrubbed... URL: From alex.mamach at northwestern.edu Fri Aug 9 18:46:34 2019 From: alex.mamach at northwestern.edu (Alexander John Mamach) Date: Fri, 9 Aug 2019 17:46:34 +0000 Subject: [gpfsug-discuss] Checking for Stale File Handles Message-ID: Hi folks, We?re currently investigating a way to check for stale file handles on the nodes across our cluster in a way that minimizes impact to the filesystem and performance. Has anyone found a direct way of doing so? We considered a few methods, including simply attempting to ls a GPFS filesystem from each node, but that might have false positives, (detecting slowdowns as stale file handles), and could negatively impact performance with hundreds of nodes doing this simultaneously. Thanks, Alex Senior Systems Administrator Research Computing Infrastructure Northwestern University Information Technology (NUIT) 2020 Ridge Ave Evanston, IL 60208-4311 O: (847) 491-2219 M: (312) 887-1881 www.it.northwestern.edu -------------- next part -------------- An HTML attachment was scrubbed... URL: From stockf at us.ibm.com Fri Aug 9 19:03:09 2019 From: stockf at us.ibm.com (Frederick Stock) Date: Fri, 9 Aug 2019 18:03:09 +0000 Subject: [gpfsug-discuss] Checking for Stale File Handles In-Reply-To: References: Message-ID: An HTML attachment was scrubbed... URL: From novosirj at rutgers.edu Fri Aug 9 19:09:18 2019 From: novosirj at rutgers.edu (Ryan Novosielski) Date: Fri, 9 Aug 2019 18:09:18 +0000 Subject: [gpfsug-discuss] Checking for Stale File Handles In-Reply-To: References: