From S.J.Thompson at bham.ac.uk Mon Jan 4 12:21:05 2021 From: S.J.Thompson at bham.ac.uk (Simon Thompson) Date: Mon, 4 Jan 2021 12:21:05 +0000 Subject: [gpfsug-discuss] Spectrum Protect and disk pools Message-ID: <36E75FAC-F5D8-45B1-B2AB-EAF4922A0DC6@bham.ac.uk> Hi All, We use Spectrum Protect (TSM) to backup our Scale filesystems. We have the backup setup to use multiple nodes with the PROXY node function turned on (and to some extent also use multiple target servers). This all feels like it is nice and parallel, on the TSM servers, we have disk pools for any ?small? files to drop into (I think we set anything smaller than 20GB) to prevent lots of small files stalling tape drive writes. Whilst digging into why we have slow backups at times, we found that the disk pool empties with a single thread (one drive). And looking at the docs: https://www.ibm.com/support/pages/concurrent-migration-processes-and-constraints This implies that we are limited to the number of client nodes stored in the pool. i.e. because we have one node and PROXY nodes, we are essentially limited to a single thread streaming out of the disk pool when full. Have we understood this correctly as if so, this appears to make the whole purpose of PROXY nodes sort of pointless if you have lots of small files. Or is there some other setting we should be looking at to increase the number of threads when the disk pool is emptying? (The disk pool itself has Migration Processes: 6) Thanks Simon -------------- next part -------------- An HTML attachment was scrubbed... URL: From jordi.caubet at es.ibm.com Mon Jan 4 13:36:40 2021 From: jordi.caubet at es.ibm.com (Jordi Caubet Serrabou) Date: Mon, 4 Jan 2021 13:36:40 +0000 Subject: [gpfsug-discuss] Spectrum Protect and disk pools In-Reply-To: <36E75FAC-F5D8-45B1-B2AB-EAF4922A0DC6@bham.ac.uk> References: <36E75FAC-F5D8-45B1-B2AB-EAF4922A0DC6@bham.ac.uk> Message-ID: An HTML attachment was scrubbed... URL: From scale at us.ibm.com Mon Jan 4 13:37:50 2021 From: scale at us.ibm.com (IBM Spectrum Scale) Date: Mon, 4 Jan 2021 19:07:50 +0530 Subject: [gpfsug-discuss] Spectrum Protect and disk pools In-Reply-To: <36E75FAC-F5D8-45B1-B2AB-EAF4922A0DC6@bham.ac.uk> References: <36E75FAC-F5D8-45B1-B2AB-EAF4922A0DC6@bham.ac.uk> Message-ID: Hi Diane, Can you help Simon with the below query. Or else would you know who would be the best person to be contacted here. Regards, The Spectrum Scale (GPFS) team ------------------------------------------------------------------------------------------------------------------ If you feel that your question can benefit other users of Spectrum Scale (GPFS), then please post it to the public IBM developerWroks Forum at https://www.ibm.com/developerworks/community/forums/html/forum?id=11111111-0000-0000-0000-000000000479. If your query concerns a potential software error in Spectrum Scale (GPFS) and you have an IBM software maintenance contract please contact 1-800-237-5511 in the United States or your local IBM Service Center in other countries. The forum is informally monitored as time permits and should not be used for priority messages to the Spectrum Scale (GPFS) team. From: Simon Thompson To: "gpfsug-discuss at spectrumscale.org" Date: 04-01-2021 05.51 PM Subject: [EXTERNAL] [gpfsug-discuss] Spectrum Protect and disk pools Sent by: gpfsug-discuss-bounces at spectrumscale.org Hi All, We use Spectrum Protect (TSM) to backup our Scale filesystems. We have the backup setup to use multiple nodes with the PROXY node function turned on (and to some extent also use multiple target servers). This all feels like it is nice and parallel, on the TSM servers, we have disk pools for any ?small? files to drop into (I think we set anything smaller than 20GB) to prevent lots of small files stalling tape drive writes. Whilst digging into why we have slow backups at times, we found that the disk pool empties with a single thread (one drive). And looking at the docs: https://www.ibm.com/support/pages/concurrent-migration-processes-and-constraints This implies that we are limited to the number of client nodes stored in the pool. i.e. because we have one node and PROXY nodes, we are essentially limited to a single thread streaming out of the disk pool when full. Have we understood this correctly as if so, this appears to make the whole purpose of PROXY nodes sort of pointless if you have lots of small files. Or is there some other setting we should be looking at to increase the number of threads when the disk pool is emptying? (The disk pool itself has Migration Processes: 6) Thanks Simon_______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: From S.J.Thompson at bham.ac.uk Mon Jan 4 13:52:05 2021 From: S.J.Thompson at bham.ac.uk (Simon Thompson) Date: Mon, 4 Jan 2021 13:52:05 +0000 Subject: [gpfsug-discuss] Spectrum Protect and disk pools In-Reply-To: References: <36E75FAC-F5D8-45B1-B2AB-EAF4922A0DC6@bham.ac.uk> Message-ID: <62F6E92A-31B4-45BE-9FF7-E6DBE0F7526B@bham.ac.uk> Hi Jordi, Thanks, yes it is a disk pool: Protect: TSM01>q stg BACKUP_DISK f=d Storage Pool Name: BACKUP_DISK Storage Pool Type: Primary Device Class Name: DISK Storage Type: DEVCLASS ? Next Storage Pool: BACKUP_ONSTAPE So it is a disk pool ? though it is made up of multiple disk files ? /tsmdisk/stgpool/tsmins- BACKUP_DISK DISK 200.0 G 0.0 On-Line t3/bkup_diskvol01.dsm /tsmdisk/stgpool/tsmins- BACKUP_DISK DISK 200.0 G 0.0 On-Line t3/bkup_diskvol02.dsm /tsmdisk/stgpool/tsmins- BACKUP_DISK DISK 200.0 G 0.0 On-Line t3/bkup_diskvol03.dsm Will look into the FILE pool as this sounds like it might be less single threaded than now ? Thanks Simon From: on behalf of "jordi.caubet at es.ibm.com" Reply to: "gpfsug-discuss at spectrumscale.org" Date: Monday, 4 January 2021 at 13:36 To: "gpfsug-discuss at spectrumscale.org" Subject: Re: [gpfsug-discuss] Spectrum Protect and disk pools Simon, which kind of storage pool are you using, DISK or FILE ? I understand DISK pool from your mail. DISK pool does not behave the same as FILE pool. DISK pool is limited by the number of nodes or MIGProcess setting (the minimum of both) as the document states. Using proxy helps you backup in parallel from multiple nodes to the stg pool but from Protect perspective it is a single node. Even multiple nodes are sending they run "asnodename" so single node from Protect perspective. If using FILE pool, you can define the number of volumes within the FILE pool and when migrating to tape, it will migrate each volume in parallel with the limit of MIGProcess setting. So it would be the minimum of #volumes and MIGProcess value. I know more deep technical skills in Protect are on this mailing list so feel free to add something or correct me. Best Regards, -- Jordi Caubet Serrabou IBM Storage Client Technical Specialist (IBM Spain) Ext. Phone: (+34) 679.79.17.84 (internal 55834) E-mail: jordi.caubet at es.ibm.com -----gpfsug-discuss-bounces at spectrumscale.org wrote: ----- To: "gpfsug-discuss at spectrumscale.org" > From: Simon Thompson Sent by: gpfsug-discuss-bounces at spectrumscale.org Date: 01/04/2021 01:21PM Subject: [EXTERNAL] [gpfsug-discuss] Spectrum Protect and disk pools Hi All, We use Spectrum Protect (TSM) to backup our Scale filesystems. We have the backup setup to use multiple nodes with the PROXY node function turned on (and to some extent also use multiple target servers). This all feels like it is nice and parallel, on the TSM servers, we have disk pools for any ?small? files to drop into (I think we set anything smaller than 20GB) to prevent lots of small files stalling tape drive writes. Whilst digging into why we have slow backups at times, we found that the disk pool empties with a single thread (one drive). And looking at the docs: https://www.ibm.com/support/pages/concurrent-migration-processes-and-constraints This implies that we are limited to the number of client nodes stored in the pool. i.e. because we have one node and PROXY nodes, we are essentially limited to a single thread streaming out of the disk pool when full. Have we understood this correctly as if so, this appears to make the whole purpose of PROXY nodes sort of pointless if you have lots of small files. Or is there some other setting we should be looking at to increase the number of threads when the disk pool is emptying? (The disk pool itself has Migration Processes: 6) Thanks Simon _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss Salvo indicado de otro modo m?s arriba / Unless stated otherwise above: International Business Machines, S.A. Santa Hortensia, 26-28, 28002 Madrid Registro Mercantil de Madrid; Folio 1; Tomo 1525; Hoja M-28146 CIF A28-010791 -------------- next part -------------- An HTML attachment was scrubbed... URL: From skylar2 at uw.edu Mon Jan 4 15:27:31 2021 From: skylar2 at uw.edu (Skylar Thompson) Date: Mon, 4 Jan 2021 07:27:31 -0800 Subject: [gpfsug-discuss] Spectrum Protect and disk pools In-Reply-To: <36E75FAC-F5D8-45B1-B2AB-EAF4922A0DC6@bham.ac.uk> References: <36E75FAC-F5D8-45B1-B2AB-EAF4922A0DC6@bham.ac.uk> Message-ID: <20210104152731.mwgcj2caojjalony@thargelion> I think the collocation settings of the target pool for the migration come into play as well. If you have multiple filespaces associated with a node and collocation is set to FILESPACE, then you should be able to get one migration process per filespace rather than one per node/collocation group. On Mon, Jan 04, 2021 at 12:21:05PM +0000, Simon Thompson wrote: > Hi All, > > We use Spectrum Protect (TSM) to backup our Scale filesystems. We have the backup setup to use multiple nodes with the PROXY node function turned on (and to some extent also use multiple target servers). > > This all feels like it is nice and parallel, on the TSM servers, we have disk pools for any ???small??? files to drop into (I think we set anything smaller than 20GB) to prevent lots of small files stalling tape drive writes. > > Whilst digging into why we have slow backups at times, we found that the disk pool empties with a single thread (one drive). And looking at the docs: > https://www.ibm.com/support/pages/concurrent-migration-processes-and-constraints > > This implies that we are limited to the number of client nodes stored in the pool. i.e. because we have one node and PROXY nodes, we are essentially limited to a single thread streaming out of the disk pool when full. > > Have we understood this correctly as if so, this appears to make the whole purpose of PROXY nodes sort of pointless if you have lots of small files. Or is there some other setting we should be looking at to increase the number of threads when the disk pool is emptying? (The disk pool itself has Migration Processes: 6) > > Thanks > > Simon > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss -- -- Skylar Thompson (skylar2 at u.washington.edu) -- Genome Sciences Department (UW Medicine), System Administrator -- Foege Building S046, (206)-685-7354 -- Pronouns: He/Him/His From jonathan.buzzard at strath.ac.uk Mon Jan 4 16:24:25 2021 From: jonathan.buzzard at strath.ac.uk (Jonathan Buzzard) Date: Mon, 4 Jan 2021 16:24:25 +0000 Subject: [gpfsug-discuss] Spectrum Protect and disk pools In-Reply-To: <36E75FAC-F5D8-45B1-B2AB-EAF4922A0DC6@bham.ac.uk> References: <36E75FAC-F5D8-45B1-B2AB-EAF4922A0DC6@bham.ac.uk> Message-ID: On 04/01/2021 12:21, Simon Thompson wrote: > CAUTION: This email originated outside the University. Check before > clicking links or attachments. > > Hi All, > > We use Spectrum Protect (TSM) to backup our Scale filesystems. We have > the backup setup to use multiple nodes with the PROXY node function > turned on (and to some extent also use multiple target servers). > > This all feels like it is nice and parallel, on the TSM servers, we have > disk pools for any ?small? files to drop into (I think we set anything > smaller than 20GB) to prevent lots of small files stalling tape drive > writes. > > Whilst digging into why we have slow backups at times, we found that the > disk pool empties with a single thread (one drive). And looking at the docs: > > https://www.ibm.com/support/pages/concurrent-migration-processes-and-constraints > > > This implies that we are limited to the number of client nodes stored in > the pool. i.e. because we have one node and PROXY nodes, we are > essentially limited to a single thread streaming out of the disk pool > when full. > > Have we understood this correctly as if so, this appears to make the > whole purpose of PROXY nodes sort of pointless if you have lots of small > files. Or is there some other setting we should be looking at to > increase the number of threads when the disk pool is emptying? (The disk > pool itself has Migration Processes: 6) > I have found in the past that the speed of the disk pool can make a large difference. That is a RAID5/6 of 7200RPM drives was inadequate and there was a significant boost in backup in moving to 15k RPM disks. Also your DB really needs to be on SSD, again this affords a large boost in backup speed. The other rule of thumb I have always worked with is that the disk pool should be sized for the daily churn. That is your backup should disappear into the disk pool and then when the backup is finished you can then spit the disk pool out to the primary and copy pools. If you are needing to drain the disk pool mid backup your disk pool is too small. TL;DR your TSM disks (DB and disk pool) need to be some of the best storage you have to maximize backup speed. JAB. -- Jonathan A. Buzzard Tel: +44141-5483420 HPC System Administrator, ARCHIE-WeSt. University of Strathclyde, John Anderson Building, Glasgow. G4 0NG From Alec.Effrat at wellsfargo.com Mon Jan 4 17:30:39 2021 From: Alec.Effrat at wellsfargo.com (Alec.Effrat at wellsfargo.com) Date: Mon, 4 Jan 2021 17:30:39 +0000 Subject: [gpfsug-discuss] Spectrum Protect and disk pools In-Reply-To: References: <36E75FAC-F5D8-45B1-B2AB-EAF4922A0DC6@bham.ac.uk> Message-ID: <151a06b1b52545fca2f92d3a5e3ce943@wellsfargo.com> I am not sure what platform you run on but for AIX with a fully virtualized LPAR we needed to enable "mtu_bypass" on the en device that was used for our backups. Prior to this setting we could not exceed 250 MB/s on our 10G interface, after that we run at 1.6GB/s solid per 10G virtual adapter, fueled by Spectrum Scale and a different backup engine. We did lose a lot of sleep trying to figure this one out, but are very pleased with the end result. Alec Effrat SAS Lead, AVP Business Intelligence Competency Center SAS Administration Cell?949-246-7713 alec.effrat at wellsfargo.com -----Original Message----- From: gpfsug-discuss-bounces at spectrumscale.org On Behalf Of Jonathan Buzzard Sent: Monday, January 4, 2021 8:24 AM To: gpfsug-discuss at spectrumscale.org Subject: Re: [gpfsug-discuss] Spectrum Protect and disk pools On 04/01/2021 12:21, Simon Thompson wrote: > CAUTION: This email originated outside the University. Check before > clicking links or attachments. > > Hi All, > > We use Spectrum Protect (TSM) to backup our Scale filesystems. We have > the backup setup to use multiple nodes with the PROXY node function > turned on (and to some extent also use multiple target servers). > > This all feels like it is nice and parallel, on the TSM servers, we > have disk pools for any ?small? files to drop into (I think we set > anything smaller than 20GB) to prevent lots of small files stalling > tape drive writes. > > Whilst digging into why we have slow backups at times, we found that > the disk pool empties with a single thread (one drive). And looking at the docs: > > https://www.ibm.com/support/pages/concurrent-migration-processes-and-c > onstraints > .ibm.com%2Fsupport%2Fpages%2Fconcurrent-migration-processes-and-constr > aints&data=04%7C01%7Cjonathan.buzzard%40strath.ac.uk%7C99158004dad04c7 > 9a58808d8b0ab39b8%7C631e0763153347eba5cd0457bee5944e%7C0%7C0%7C6374535 > 96745356438%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMz > IiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=ZPUkTB5Vy5S0%2BL67neMp4C > 1lxIuphMS5HuTkBYcmcMU%3D&reserved=0> > > This implies that we are limited to the number of client nodes stored > in the pool. i.e. because we have one node and PROXY nodes, we are > essentially limited to a single thread streaming out of the disk pool > when full. > > Have we understood this correctly as if so, this appears to make the > whole purpose of PROXY nodes sort of pointless if you have lots of > small files. Or is there some other setting we should be looking at to > increase the number of threads when the disk pool is emptying? (The > disk pool itself has Migration Processes: 6) > I have found in the past that the speed of the disk pool can make a large difference. That is a RAID5/6 of 7200RPM drives was inadequate and there was a significant boost in backup in moving to 15k RPM disks. Also your DB really needs to be on SSD, again this affords a large boost in backup speed. The other rule of thumb I have always worked with is that the disk pool should be sized for the daily churn. That is your backup should disappear into the disk pool and then when the backup is finished you can then spit the disk pool out to the primary and copy pools. If you are needing to drain the disk pool mid backup your disk pool is too small. TL;DR your TSM disks (DB and disk pool) need to be some of the best storage you have to maximize backup speed. JAB. -- Jonathan A. Buzzard Tel: +44141-5483420 HPC System Administrator, ARCHIE-WeSt. University of Strathclyde, John Anderson Building, Glasgow. G4 0NG _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss From xhejtman at ics.muni.cz Wed Jan 6 17:46:58 2021 From: xhejtman at ics.muni.cz (Lukas Hejtmanek) Date: Wed, 6 Jan 2021 18:46:58 +0100 Subject: [gpfsug-discuss] S3 API and POSIX rights Message-ID: <20210106174658.GA1764842@ics.muni.cz> Hello, we are playing a bit with Spectrum Scale OBJ storage. We were able to get working unified access for NFS and OBJ but only if we use swift clients. If we use s3 client for OBJ, all objects are owned by swift user and large objects are multiparted wich is not suitable for unified access. Should the unified access work also for S3 API? Or only swift is supported currently? -- Luk?? Hejtm?nek Linux Administrator only because Full Time Multitasking Ninja is not an official job title From jcatana at gmail.com Wed Jan 6 22:30:08 2021 From: jcatana at gmail.com (Josh Catana) Date: Wed, 6 Jan 2021 17:30:08 -0500 Subject: [gpfsug-discuss] S3 API and POSIX rights In-Reply-To: <20210106174658.GA1764842@ics.muni.cz> References: <20210106174658.GA1764842@ics.muni.cz> Message-ID: Swift and s3 are both object storage, but different protocol implementation. Not compatible. I use minio to share data for s3 compatibility. On Wed, Jan 6, 2021, 12:52 PM Lukas Hejtmanek wrote: > Hello, > > we are playing a bit with Spectrum Scale OBJ storage. We were able to get > working unified access for NFS and OBJ but only if we use swift clients. > If we > use s3 client for OBJ, all objects are owned by swift user and large > objects > are multiparted wich is not suitable for unified access. > > Should the unified access work also for S3 API? Or only swift is supported > currently? > > -- > Luk?? Hejtm?nek > > Linux Administrator only because > Full Time Multitasking Ninja > is not an official job title > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: From brnelson at us.ibm.com Thu Jan 7 00:18:28 2021 From: brnelson at us.ibm.com (Brian Nelson) Date: Wed, 6 Jan 2021 18:18:28 -0600 Subject: [gpfsug-discuss] S3 API and POSIX rights Message-ID: Unfortunately, these features are not supported. Multipart uploads are not supported with Unified File and Object for the reason you mentioned, as the separate parts of the object are written as separate files. And because the S3 and Swift authentication is handled differently, the user is not passed through in the S3 path. Without the user information, the Unified File and Object layer is not able to set the file ownership to the external authentication user. Ownership is set to the default of 'swift' in that case. -Brian =================================== Brian Nelson 512-286-7735 (T/L) 363-7735 IBM Spectrum Scale brnelson at us.ibm.com On Wed, Jan 6, 2021, 12:52 PM Lukas Hejtmanek wrote: > Hello, > > we are playing a bit with Spectrum Scale OBJ storage. We were able to get > working unified access for NFS and OBJ but only if we use swift clients. > If we > use s3 client for OBJ, all objects are owned by swift user and large > objects > are multiparted wich is not suitable for unified access. > > Should the unified access work also for S3 API? Or only swift is supported > currently? > > -- > Luk?? Hejtm?nek > > Linux Administrator only because > Full Time Multitasking Ninja > is not an official job title > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: From scale at us.ibm.com Thu Jan 7 08:36:25 2021 From: scale at us.ibm.com (IBM Spectrum Scale) Date: Thu, 7 Jan 2021 14:06:25 +0530 Subject: [gpfsug-discuss] S3 API and POSIX rights In-Reply-To: <20210106174658.GA1764842@ics.muni.cz> References: <20210106174658.GA1764842@ics.muni.cz> Message-ID: Hi Brian, Can you please answer the below S3 API related query. Or would you know who would be the right person to forward this to. Regards, The Spectrum Scale (GPFS) team ------------------------------------------------------------------------------------------------------------------ If you feel that your question can benefit other users of Spectrum Scale (GPFS), then please post it to the public IBM developerWroks Forum at https://www.ibm.com/developerworks/community/forums/html/forum?id=11111111-0000-0000-0000-000000000479. If your query concerns a potential software error in Spectrum Scale (GPFS) and you have an IBM software maintenance contract please contact 1-800-237-5511 in the United States or your local IBM Service Center in other countries. The forum is informally monitored as time permits and should not be used for priority messages to the Spectrum Scale (GPFS) team. From: Lukas Hejtmanek To: gpfsug-discuss at spectrumscale.org Date: 06-01-2021 11.22 PM Subject: [EXTERNAL] [gpfsug-discuss] S3 API and POSIX rights Sent by: gpfsug-discuss-bounces at spectrumscale.org Hello, we are playing a bit with Spectrum Scale OBJ storage. We were able to get working unified access for NFS and OBJ but only if we use swift clients. If we use s3 client for OBJ, all objects are owned by swift user and large objects are multiparted wich is not suitable for unified access. Should the unified access work also for S3 API? Or only swift is supported currently? -- Luk?? Hejtm?nek Linux Administrator only because Full Time Multitasking Ninja is not an official job title _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: From juergen.hannappel at desy.de Fri Jan 8 12:27:27 2021 From: juergen.hannappel at desy.de (Hannappel, Juergen) Date: Fri, 8 Jan 2021 13:27:27 +0100 (CET) Subject: [gpfsug-discuss] GPFS_CLEAR_FILE_CACHE fails on Read-Only FS Message-ID: <933204168.28588491.1610108847123.JavaMail.zimbra@desy.de> Hi, in a program after reading a file I did a gpfs_fcntl() with GPFS_CLEAR_FILE_CACHE to get rid of the now unused pages in the file cache. That works fine, but if the file system is read-only (in a remote cluster) this fails with a message that the file system is read only. Is that expected behaviour or an unexpected feature (aka bug)? -- Dr. J?rgen Hannappel DESY/IT Tel. : +49 40 8998-4616 -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 1711 bytes Desc: S/MIME Cryptographic Signature URL: From scale at us.ibm.com Fri Jan 8 13:42:25 2021 From: scale at us.ibm.com (IBM Spectrum Scale) Date: Fri, 8 Jan 2021 08:42:25 -0500 Subject: [gpfsug-discuss] GPFS_CLEAR_FILE_CACHE fails on Read-Only FS In-Reply-To: <933204168.28588491.1610108847123.JavaMail.zimbra@desy.de> References: <933204168.28588491.1610108847123.JavaMail.zimbra@desy.de> Message-ID: It seems like a defect. Could you please open a help case and if possible provide a sample program and the steps you took to create the problem? Also, please provide the version of Scale you are using where you see this behavior. This should result in a defect being opened against GPFS which will then be addressed by a member of the development team. Regards, The Spectrum Scale (GPFS) team ------------------------------------------------------------------------------------------------------------------ If you feel that your question can benefit other users of Spectrum Scale (GPFS), then please post it to the public IBM developerWroks Forum at https://www.ibm.com/developerworks/community/forums/html/forum?id=11111111-0000-0000-0000-000000000479 . If your query concerns a potential software error in Spectrum Scale (GPFS) and you have an IBM software maintenance contract please contact 1-800-237-5511 in the United States or your local IBM Service Center in other countries. The forum is informally monitored as time permits and should not be used for priority messages to the Spectrum Scale (GPFS) team. From: "Hannappel, Juergen" To: gpfsug main discussion list Date: 01/08/2021 07:33 AM Subject: [EXTERNAL] [gpfsug-discuss] GPFS_CLEAR_FILE_CACHE fails on Read-Only FS Sent by: gpfsug-discuss-bounces at spectrumscale.org Hi, in a program after reading a file I did a gpfs_fcntl() with GPFS_CLEAR_FILE_CACHE to get rid of the now unused pages in the file cache. That works fine, but if the file system is read-only (in a remote cluster) this fails with a message that the file system is read only. Is that expected behaviour or an unexpected feature (aka bug)? -- Dr. J?rgen Hannappel DESY/IT Tel. : +49 40 8998-4616 [attachment "smime.p7s" deleted by Frederick Stock/Pittsburgh/IBM] _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From hoov at us.ibm.com Mon Jan 11 17:32:48 2021 From: hoov at us.ibm.com (Theodore Hoover Jr) Date: Mon, 11 Jan 2021 17:32:48 +0000 Subject: [gpfsug-discuss] Spectrum Scale Cloud Online Survey In-Reply-To: References: Message-ID: An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: Image.16082105961220.jpg Type: image/jpeg Size: 6839 bytes Desc: not available URL: From Philipp.Rehs at uni-duesseldorf.de Mon Jan 11 18:53:29 2021 From: Philipp.Rehs at uni-duesseldorf.de (Rehs, Philipp Helo) Date: Mon, 11 Jan 2021 18:53:29 +0000 Subject: [gpfsug-discuss] GPFS GUI does not create snapshots Message-ID: <63152da6-4464-4497-b4d2-11f8d2260614@email.android.com> Hello, we have gpfs GUI on 4.2.3.22 running and it suddenly stopped to create new snapshots from schedule. I can manually create snapshots but none is created from schedule. How can I debug it? Kind regards Philipp Rehs -------------- next part -------------- An HTML attachment was scrubbed... URL: From S.J.Thompson at bham.ac.uk Mon Jan 11 19:07:44 2021 From: S.J.Thompson at bham.ac.uk (Simon Thompson) Date: Mon, 11 Jan 2021 19:07:44 +0000 Subject: [gpfsug-discuss] GPFS GUI does not create snapshots Message-ID: Have you tried restarting the gpfs.gui service? At some point in the past we have seen similar and restarting the GUI made it start again. Simon From: on behalf of "Philipp.Rehs at uni-duesseldorf.de" Reply to: "gpfsug-discuss at spectrumscale.org" Date: Monday, 11 January 2021 at 19:03 To: "gpfsug-discuss at spectrumscale.org" Subject: [gpfsug-discuss] GPFS GUI does not create snapshots Hello, we have gpfs GUI on 4.2.3.22 running and it suddenly stopped to create new snapshots from schedule. I can manually create snapshots but none is created from schedule. How can I debug it? Kind regards Philipp Rehs -------------- next part -------------- An HTML attachment was scrubbed... URL: From Philipp.Rehs at uni-duesseldorf.de Mon Jan 11 19:16:52 2021 From: Philipp.Rehs at uni-duesseldorf.de (Rehs, Philipp Helo) Date: Mon, 11 Jan 2021 19:16:52 +0000 Subject: [gpfsug-discuss] GPFS GUI does not create snapshots Message-ID: Hello Simon, I have already rebooted the server but no change. I also see no calls to mmcrsnapshot in the journalctl sudo log. Maybe there is a service which is not running? Kind regards Philipp Am 11.01.2021 20:07 schrieb Simon Thompson : Have you tried restarting the gpfs.gui service? At some point in the past we have seen similar and restarting the GUI made it start again. Simon From: on behalf of "Philipp.Rehs at uni-duesseldorf.de" Reply to: "gpfsug-discuss at spectrumscale.org" Date: Monday, 11 January 2021 at 19:03 To: "gpfsug-discuss at spectrumscale.org" Subject: [gpfsug-discuss] GPFS GUI does not create snapshots Hello, we have gpfs GUI on 4.2.3.22 running and it suddenly stopped to create new snapshots from schedule. I can manually create snapshots but none is created from schedule. How can I debug it? Kind regards Philipp Rehs -------------- next part -------------- An HTML attachment was scrubbed... URL: From rohwedder at de.ibm.com Tue Jan 12 10:46:16 2021 From: rohwedder at de.ibm.com (Markus Rohwedder) Date: Tue, 12 Jan 2021 11:46:16 +0100 Subject: [gpfsug-discuss] GPFS GUI does not create snapshots In-Reply-To: References: Message-ID: Hello Philipp. there is no additional service that covers the snapshot scheduling besides the GUI service. Please note, that in case you have two GUI instances running, the snapshot scheduling would have moved to the second instance in case you reboot. The GUI/REST application logs are located in /var/log/cnlog/mgtsrv, but I propose to open a support case for this issue. Mit freundlichen Gr??en / Kind regards Dr. Markus Rohwedder IBM Systems / Lab Services Europe / EMEA Storage Competence Center Phone: +49 162 4159920 IBM Deutschland GmbH E-Mail: rohwedder at de.ibm.com Am Weiher 24 65451 Kelsterbach Germany IBM Deutschland GmbH / Vorsitzender des Aufsichtsrats: Sebastian Krause / Gesch?ftsf?hrung: Gregor Pillen (Vorsitzender), Agnes Heftberger, Norbert Janzen, Markus Koerner, Christian Noll, Nicole Reimer / Sitz der Gesellschaft: 71139 Ehningen, IBM-Allee 1 / Registergericht: Amtsgericht Stuttgart, HRB14562 From: "Rehs, Philipp Helo" To: gpfsug main discussion list Date: 11.01.2021 20:17 Subject: [EXTERNAL] Re: [gpfsug-discuss] GPFS GUI does not create snapshots Sent by: gpfsug-discuss-bounces at spectrumscale.org Hello Simon, I have already rebooted the server but no change. I also see no calls to mmcrsnapshot in the journalctl sudo log. Maybe there is a service which is not running? Kind regards Philipp Am 11.01.2021 20:07 schrieb Simon Thompson : Have you tried restarting the gpfs.gui service? At some point in the past we have seen similar and restarting the GUI made it start again. Simon From: on behalf of "Philipp.Rehs at uni-duesseldorf.de" Reply to: "gpfsug-discuss at spectrumscale.org" Date: Monday, 11 January 2021 at 19:03 To: "gpfsug-discuss at spectrumscale.org" Subject: [gpfsug-discuss] GPFS GUI does not create snapshots Hello, we have gpfs GUI on 4.2.3.22 running and it suddenly stopped to create new snapshots from schedule. I can manually create snapshots but none is created from schedule. How can I debug it? Kind regards Philipp Rehs _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: ecblank.gif Type: image/gif Size: 45 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: 1E685739.gif Type: image/gif Size: 4659 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: From cabrillo at ifca.unican.es Tue Jan 12 14:32:23 2021 From: cabrillo at ifca.unican.es (Iban Cabrillo) Date: Tue, 12 Jan 2021 15:32:23 +0100 (CET) Subject: [gpfsug-discuss] Disk in unrecovered state Message-ID: <1959609895.205347.1610461943831.JavaMail.zimbra@ifca.unican.es> Dear, Since this moning I have a couple of disk (7) in down state, I have tried to start them again but after that they change to unrecovered. These "failed" disk are only DATA. Both pool Data and Metadata has two failures groups, and set replica to 2. The Metadata disks are in two different enclosures one for each filure group. The filesystem has been unmounted , but when i have tried to run the mmfsck told me the I should remove the down disk [root at gpfs06 ~]# mmlsdisk gpfs2 -L | grep -v up disk driver sector failure holds holds storage ------------ -------- ------ ----------- -------- ----- ------------- ------------ ------- ------------ --------- ..... nsd18jbod1 nsd 512 2 No Yes to be emptied unrecovered 26 data nsd19jbod1 nsd 512 2 No Yes ready unrecovered 27 data nsd19jbod2 nsd 512 3 No Yes ready down 46 data nsd24jbod2 nsd 512 3 No Yes ready down 51 data nsd57jbod1 nsd 512 2 No Yes ready down 109 data nsd61jbod1 nsd 512 2 No Yes ready down 113 data nsd71jbod1 nsd 512 2 No Yes ready down 123 data ..... Any help is welcomed. Regards, I -------------- next part -------------- An HTML attachment was scrubbed... URL: From bbanister at jumptrading.com Tue Jan 12 15:11:22 2021 From: bbanister at jumptrading.com (Bryan Banister) Date: Tue, 12 Jan 2021 15:11:22 +0000 Subject: [gpfsug-discuss] Disk in unrecovered state In-Reply-To: <1959609895.205347.1610461943831.JavaMail.zimbra@ifca.unican.es> References: <1959609895.205347.1610461943831.JavaMail.zimbra@ifca.unican.es> Message-ID: Definitely recommend getting a IBM Case in and ask someone for direct assistance (Zoom even). Then also check that you can access all of the underlying storage with READ ONLY operations from all defined NSD Servers in the NSD ServerList for nsd18jbod1 and nsd19jbod1. Given the name of the NSDs, sound like there is not any RAID protection on theses disks. If so then you would have serious data loss issues with one of the drives corrupted. Hope that helps, -Bryan From: gpfsug-discuss-bounces at spectrumscale.org On Behalf Of Iban Cabrillo Sent: Tuesday, January 12, 2021 8:32 AM To: gpfsug-discuss Subject: [gpfsug-discuss] Disk in unrecovered state [EXTERNAL EMAIL] Dear, Since this moning I have a couple of disk (7) in down state, I have tried to start them again but after that they change to unrecovered. These "failed" disk are only DATA. Both pool Data and Metadata has two failures groups, and set replica to 2. The Metadata disks are in two different enclosures one for each filure group. The filesystem has been unmounted , but when i have tried to run the mmfsck told me the I should remove the down disk [root at gpfs06 ~]# mmlsdisk gpfs2 -L | grep -v up disk driver sector failure holds holds storage ------------ -------- ------ ----------- -------- ----- ------------- ------------ ------- ------------ --------- ..... nsd18jbod1 nsd 512 2 No Yes to be emptied unrecovered 26 data nsd19jbod1 nsd 512 2 No Yes ready unrecovered 27 data nsd19jbod2 nsd 512 3 No Yes ready down 46 data nsd24jbod2 nsd 512 3 No Yes ready down 51 data nsd57jbod1 nsd 512 2 No Yes ready down 109 data nsd61jbod1 nsd 512 2 No Yes ready down 113 data nsd71jbod1 nsd 512 2 No Yes ready down 123 data ..... Any help is welcomed. Regards, I -------------- next part -------------- An HTML attachment was scrubbed... URL: From Renar.Grunenberg at huk-coburg.de Tue Jan 12 15:21:33 2021 From: Renar.Grunenberg at huk-coburg.de (Grunenberg, Renar) Date: Tue, 12 Jan 2021 15:21:33 +0000 Subject: [gpfsug-discuss] Disk in unrecovered state In-Reply-To: <1959609895.205347.1610461943831.JavaMail.zimbra@ifca.unican.es> References: <1959609895.205347.1610461943831.JavaMail.zimbra@ifca.unican.es> Message-ID: <3a7cf4ec9ee8458cb8275453f7b9d0a1@huk-coburg.de> Hallo Iban, first you should check the path to the disk. (mmlsnsd -m) It seems to be broken from the OS view. This should fixed first. If you see no dev entry you have a HW problem. If this is fixed then you can start each disk individuell to see there are something start here. On wich scale version do you are? Renar Grunenberg Abteilung Informatik - Betrieb HUK-COBURG Bahnhofsplatz 96444 Coburg Telefon: 09561 96-44110 Telefax: 09561 96-44104 E-Mail: Renar.Grunenberg at huk-coburg.de Internet: www.huk.de ________________________________ HUK-COBURG Haftpflicht-Unterst?tzungs-Kasse kraftfahrender Beamter Deutschlands a. G. in Coburg Reg.-Gericht Coburg HRB 100; St.-Nr. 9212/101/00021 Sitz der Gesellschaft: Bahnhofsplatz, 96444 Coburg Vorsitzender des Aufsichtsrats: Prof. Dr. Heinrich R. Schradin. Vorstand: Klaus-J?rgen Heitmann (Sprecher), Stefan Gronbach, Dr. Hans Olav Her?y, Dr. J?rg Rheinl?nder, Sarah R?ssler, Thomas Sehn, Daniel Thomas. ________________________________ Diese Nachricht enth?lt vertrauliche und/oder rechtlich gesch?tzte Informationen. Wenn Sie nicht der richtige Adressat sind oder diese Nachricht irrt?mlich erhalten haben, informieren Sie bitte sofort den Absender und vernichten Sie diese Nachricht. Das unerlaubte Kopieren sowie die unbefugte Weitergabe dieser Nachricht ist nicht gestattet. This information may contain confidential and/or privileged information. If you are not the intended recipient (or have received this information in error) please notify the sender immediately and destroy this information. Any unauthorized copying, disclosure or distribution of the material in this information is strictly forbidden. ________________________________ Von: gpfsug-discuss-bounces at spectrumscale.org Im Auftrag von Iban Cabrillo Gesendet: Dienstag, 12. Januar 2021 15:32 An: gpfsug-discuss Betreff: [gpfsug-discuss] Disk in unrecovered state Dear, Since this moning I have a couple of disk (7) in down state, I have tried to start them again but after that they change to unrecovered. These "failed" disk are only DATA. Both pool Data and Metadata has two failures groups, and set replica to 2. The Metadata disks are in two different enclosures one for each filure group. The filesystem has been unmounted , but when i have tried to run the mmfsck told me the I should remove the down disk [root at gpfs06 ~]# mmlsdisk gpfs2 -L | grep -v up disk driver sector failure holds holds storage ------------ -------- ------ ----------- -------- ----- ------------- ------------ ------- ------------ --------- ..... nsd18jbod1 nsd 512 2 No Yes to be emptied unrecovered 26 data nsd19jbod1 nsd 512 2 No Yes ready unrecovered 27 data nsd19jbod2 nsd 512 3 No Yes ready down 46 data nsd24jbod2 nsd 512 3 No Yes ready down 51 data nsd57jbod1 nsd 512 2 No Yes ready down 109 data nsd61jbod1 nsd 512 2 No Yes ready down 113 data nsd71jbod1 nsd 512 2 No Yes ready down 123 data ..... Any help is welcomed. Regards, I -------------- next part -------------- An HTML attachment was scrubbed... URL: From cabrillo at ifca.unican.es Tue Jan 12 15:59:03 2021 From: cabrillo at ifca.unican.es (Iban Cabrillo) Date: Tue, 12 Jan 2021 16:59:03 +0100 (CET) Subject: [gpfsug-discuss] Disk in unrecovered state In-Reply-To: <3a7cf4ec9ee8458cb8275453f7b9d0a1@huk-coburg.de> References: <1959609895.205347.1610461943831.JavaMail.zimbra@ifca.unican.es> <3a7cf4ec9ee8458cb8275453f7b9d0a1@huk-coburg.de> Message-ID: <753758367.214256.1610467143052.JavaMail.zimbra@ifca.unican.es> Hi Renar, The version we are installed is 5.0.4-3, and the paths to these wrong disks seems to be fine: [root at gpfs06 ~]# mmlsnsd -m| grep nsd18jbod1 nsd18jbod1 0A0A00675EE76CF5 /dev/sds gpfs05.ifca.es server node nsd18jbod1 0A0A00675EE76CF5 /dev/sdby gpfs06.ifca.es server node [root at gpfs06 ~]# mmlsnsd -m| grep nsd19jbod1 nsd19jbod1 0A0A00665EE76CF6 /dev/sdt gpfs05.ifca.es server node nsd19jbod1 0A0A00665EE76CF6 /dev/sdaa gpfs06.ifca.es server node [root at gpfs06 ~]# mmlsnsd -m| grep nsd19jbod2 nsd19jbod2 0A0A00695EE79A12 /dev/sdt gpfs07.ifca.es server node nsd19jbod2 0A0A00695EE79A12 /dev/sdat gpfs08.ifca.es server node [root at gpfs06 ~]# mmlsnsd -m| grep nsd24jbod2 nsd24jbod2 0A0A00685EE79749 /dev/sdbn gpfs07.ifca.es server node nsd24jbod2 0A0A00685EE79749 /dev/sdcg gpfs08.ifca.es server node [root at gpfs06 ~]# mmlsnsd -m| grep nsd57jbod1 nsd57jbod1 0A0A00665F243CE1 /dev/sdbg gpfs05.ifca.es server node nsd57jbod1 0A0A00665F243CE1 /dev/sdbx gpfs06.ifca.es server node [root at gpfs06 ~]# mmlsnsd -m| grep nsd61jbod1 nsd61jbod1 0A0A00665F243CFA /dev/sdbk gpfs05.ifca.es server node nsd61jbod1 0A0A00665F243CFA /dev/sdy gpfs06.ifca.es server node [root at gpfs06 ~]# mmlsnsd -m| grep nsd71jbod1 nsd71jbod1 0A0A00665F243D38 /dev/sdbu gpfs05.ifca.es server node nsd71jbod1 0A0A00665F243D38 /dev/sdbv gpfs06.ifca.es server node trying to start 19jbod1 again: [root at gpfs06 ~]# mmchdisk gpfs2 start -d nsd19jbod1 mmnsddiscover: Attempting to rediscover the disks. This may take a while ... mmnsddiscover: Finished. gpfs06.ifca.es: Rediscovered nsd server access to nsd19jbod1. gpfs05.ifca.es: Rediscovered nsd server access to nsd19jbod1. Failed to open gpfs2. Log recovery failed. Input/output error Initial disk state was updated successfully, but another error may have changed the state again. mmchdisk: Command failed. Examine previous error messages to determine cause. Regards, I From olaf.weiser at de.ibm.com Tue Jan 12 16:30:24 2021 From: olaf.weiser at de.ibm.com (Olaf Weiser) Date: Tue, 12 Jan 2021 16:30:24 +0000 Subject: [gpfsug-discuss] Disk in unrecovered state In-Reply-To: <753758367.214256.1610467143052.JavaMail.zimbra@ifca.unican.es> References: <753758367.214256.1610467143052.JavaMail.zimbra@ifca.unican.es>, <1959609895.205347.1610461943831.JavaMail.zimbra@ifca.unican.es><3a7cf4ec9ee8458cb8275453f7b9d0a1@huk-coburg.de> Message-ID: An HTML attachment was scrubbed... URL: From nikhilk at us.ibm.com Tue Jan 12 17:32:08 2021 From: nikhilk at us.ibm.com (Nikhil Khandelwal) Date: Tue, 12 Jan 2021 17:32:08 +0000 Subject: [gpfsug-discuss] Disk in unrecovered state In-Reply-To: References: , <753758367.214256.1610467143052.JavaMail.zimbra@ifca.unican.es>, <1959609895.205347.1610461943831.JavaMail.zimbra@ifca.unican.es><3a7cf4ec9ee8458cb8275453f7b9d0a1@huk-coburg.de> Message-ID: An HTML attachment was scrubbed... URL: From cabrillo at ifca.unican.es Wed Jan 13 10:23:20 2021 From: cabrillo at ifca.unican.es (Iban Cabrillo) Date: Wed, 13 Jan 2021 11:23:20 +0100 (CET) Subject: [gpfsug-discuss] Disk in unrecovered state In-Reply-To: References: <753758367.214256.1610467143052.JavaMail.zimbra@ifca.unican.es> <1959609895.205347.1610461943831.JavaMail.zimbra@ifca.unican.es> <3a7cf4ec9ee8458cb8275453f7b9d0a1@huk-coburg.de> Message-ID: <538725591.388718.1610533400379.JavaMail.zimbra@ifca.unican.es> Hi Guys, Devices seems to be accesible from both server primary and secondary, and thr harware state is "Optimal" [root at gpfs05 ~]# mmlsnsd -m| grep nsd18jbod1 nsd18jbod1 0A0A00675EE76CF5 /dev/sds gpfs05.ifca.es server node nsd18jbod1 0A0A00675EE76CF5 /dev/sdby gpfs06.ifca.es server node [root at gpfs05 ~]# #dd if=/dev/sds [root at gpfs05 ~]# man od [root at gpfs05 ~]# dd if=/dev/sds bs=4k count=2 | od -c 2+0 records in 2+0 records out 8192 bytes (8.2 kB) copied, 0.000249162 s, 32.9 MB/s 0000000 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 * 0000700 001 \0 356 376 377 377 001 \0 \0 \0 377 377 377 377 \0 \0 0000720 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 * 0000760 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 U 252 0001000 E F I P A R T \0 \0 001 \0 \ \0 \0 \0 0001020 \r 0 267 u \0 \0 \0 \0 001 \0 \0 \0 \0 \0 \0 \0 0001040 257 * 201 243 003 \0 \0 \0 " \0 \0 \0 \0 \0 \0 \0 0001060 216 * 201 243 003 \0 \0 \0 240 ! 302 3 . R \f M 0001100 200 241 323 024 245 h | G 002 \0 \0 \0 \0 \0 \0 \0 0001120 200 \0 \0 \0 200 \0 \0 \0 p b 203 F \0 \0 \0 \0 0001140 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 * 0002000 220 374 257 7 } 357 226 N 221 303 - z 340 U 261 t 0002020 316 343 324 ' } 033 K C 203 a 314 = 220 k 336 023 0002040 0 \0 \0 \0 \0 \0 \0 \0 177 * 201 243 003 \0 \0 \0 0002060 001 \0 \0 \0 \0 \0 \0 @ G \0 P \0 F \0 S \0 0002100 : \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 0002120 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 * 0020000 Regards, I -------------- next part -------------- An HTML attachment was scrubbed... URL: From Achim.Rehor at de.ibm.com Wed Jan 13 11:51:44 2021 From: Achim.Rehor at de.ibm.com (Achim Rehor) Date: Wed, 13 Jan 2021 12:51:44 +0100 Subject: [gpfsug-discuss] Disk in unrecovered state In-Reply-To: <753758367.214256.1610467143052.JavaMail.zimbra@ifca.unican.es> References: <1959609895.205347.1610461943831.JavaMail.zimbra@ifca.unican.es><3a7cf4ec9ee8458cb8275453f7b9d0a1@huk-coburg.de> <753758367.214256.1610467143052.JavaMail.zimbra@ifca.unican.es> Message-ID: Hi Iban, given that you have physical access to the disks and they are readable ( i see you checked that via dd command) you should mmchdisk start them. Note: as you have down disks in more than one FG, you will need to be able to at least get one good copy of the metadata readable .. in order to be able to mmchdisk start a disk. In that case i would run : mmchdisk start -a (so gpfs can get data from all readable disks) Mit freundlichen Gr??en / Kind regards Achim Rehor Remote Technical Support Engineer Storage IBM Systems Storage Support - EMEA Storage Competence Center (ESCC) Spectrum Scale / Elastic Storage Server ------------------------------------------------------------------------------------------------------------------------------------------- IBM Deutschland Am Weiher 24 65451 Kelsterbach Phone: +49-170-4521194 E-Mail: Achim.Rehor at de.ibm.com ------------------------------------------------------------------------------------------------------------------------------------------- IBM Deutschland GmbH / Vorsitzender des Aufsichtsrats: Sebastian Krause Gesch?ftsf?hrung: Gregor Pillen (Vorsitzender), Agnes Heftberger, Norbert Janzen, Markus Koerner, Christian Noll, Nicole Reimer Sitz der Gesellschaft: Ehningen / Registergericht: Amtsgericht Stuttgart, HRB 14562 / WEEE-Reg.-Nr. DE 99369940 gpfsug-discuss-bounces at spectrumscale.org wrote on 12/01/2021 16:59:03: > From: Iban Cabrillo > To: gpfsug-discuss > Date: 12/01/2021 16:59 > Subject: [EXTERNAL] Re: [gpfsug-discuss] Disk in unrecovered state > Sent by: gpfsug-discuss-bounces at spectrumscale.org > > Hi Renar, > The version we are installed is 5.0.4-3, and the paths to these > wrong disks seems to be fine: > > [root at gpfs06 ~]# mmlsnsd -m| grep nsd18jbod1 > nsd18jbod1 0A0A00675EE76CF5 /dev/sds gpfs05.ifca.es > server node > nsd18jbod1 0A0A00675EE76CF5 /dev/sdby gpfs06.ifca.es > server node > [root at gpfs06 ~]# mmlsnsd -m| grep nsd19jbod1 > nsd19jbod1 0A0A00665EE76CF6 /dev/sdt gpfs05.ifca.es > server node > nsd19jbod1 0A0A00665EE76CF6 /dev/sdaa gpfs06.ifca.es > server node > [root at gpfs06 ~]# mmlsnsd -m| grep nsd19jbod2 > nsd19jbod2 0A0A00695EE79A12 /dev/sdt gpfs07.ifca.es > server node > nsd19jbod2 0A0A00695EE79A12 /dev/sdat gpfs08.ifca.es > server node > [root at gpfs06 ~]# mmlsnsd -m| grep nsd24jbod2 > nsd24jbod2 0A0A00685EE79749 /dev/sdbn gpfs07.ifca.es > server node > nsd24jbod2 0A0A00685EE79749 /dev/sdcg gpfs08.ifca.es > server node > [root at gpfs06 ~]# mmlsnsd -m| grep nsd57jbod1 > nsd57jbod1 0A0A00665F243CE1 /dev/sdbg gpfs05.ifca.es > server node > nsd57jbod1 0A0A00665F243CE1 /dev/sdbx gpfs06.ifca.es > server node > [root at gpfs06 ~]# mmlsnsd -m| grep nsd61jbod1 > nsd61jbod1 0A0A00665F243CFA /dev/sdbk gpfs05.ifca.es > server node > nsd61jbod1 0A0A00665F243CFA /dev/sdy gpfs06.ifca.es > server node > [root at gpfs06 ~]# mmlsnsd -m| grep nsd71jbod1 > nsd71jbod1 0A0A00665F243D38 /dev/sdbu gpfs05.ifca.es > server node > nsd71jbod1 0A0A00665F243D38 /dev/sdbv gpfs06.ifca.es > server node > > trying to start 19jbod1 again: > [root at gpfs06 ~]# mmchdisk gpfs2 start -d nsd19jbod1 > mmnsddiscover: Attempting to rediscover the disks. This may take awhile ... > mmnsddiscover: Finished. > gpfs06.ifca.es: Rediscovered nsd server access to nsd19jbod1. > gpfs05.ifca.es: Rediscovered nsd server access to nsd19jbod1. > Failed to open gpfs2. > Log recovery failed. > Input/output error > Initial disk state was updated successfully, but another error may > have changed the state again. > mmchdisk: Command failed. Examine previous error messages to determine cause. > > Regards, I > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > INVALID URI REMOVED > u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx- > siA1ZOg&r=RGTETs2tk0Kz_VOpznDVDkqChhnfLapOTkxLvgmR2- > M&m=f4oAWXtPlhIm5cEShA0Amlf1ZUG3PyXvVbzB9e- > I3hk&s=SA1wXw8XXPjvMbSU6TILc2vnC4KxkfoboM8RolqBmuc&e= > From cabrillo at ifca.unican.es Wed Jan 13 12:26:17 2021 From: cabrillo at ifca.unican.es (Iban Cabrillo) Date: Wed, 13 Jan 2021 13:26:17 +0100 (CET) Subject: [gpfsug-discuss] Disk in unrecovered state In-Reply-To: References: <1959609895.205347.1610461943831.JavaMail.zimbra@ifca.unican.es> <3a7cf4ec9ee8458cb8275453f7b9d0a1@huk-coburg.de> <753758367.214256.1610467143052.JavaMail.zimbra@ifca.unican.es> Message-ID: <1691473688.446197.1610540777278.JavaMail.zimbra@ifca.unican.es> Thanks a lot!! Guys, this do the trick Now, whole disks are up again and the FS has been mounted without troubles, Cheers, I From anacreo at gmail.com Wed Jan 20 11:09:27 2021 From: anacreo at gmail.com (Alec) Date: Wed, 20 Jan 2021 03:09:27 -0800 Subject: [gpfsug-discuss] Spectrum Scale 5 and Reading Compressed Data Message-ID: We have AIX and Spectrum Scale 5.1 and are compressing older data. We can compress data at about 10GB/minute and decompress data wicked fast using mmchattr, when a user reads data from a compressed file via application open / read calls.... it moves at about 5MB/s. Normally our I/O pipeline allows for 2400MB/s on a single file read. What can we look at to speed up the read of the compressed data, are there any tunables that might affect this? As it is now if the backup daemon is backing up a compressed file, it can get stuck for hours, I will go and mmchattr to decompress the file, within a minute the file is decompressed, and backed up, then I simply recompress the file once backup has moved on. Any advice on how to improve the compressed reads under AIX would be very helpful. Alec -------------- next part -------------- An HTML attachment was scrubbed... URL: From daniel.kidger at uk.ibm.com Wed Jan 20 11:59:39 2021 From: daniel.kidger at uk.ibm.com (Daniel Kidger) Date: Wed, 20 Jan 2021 11:59:39 +0000 Subject: [gpfsug-discuss] Spectrum Scale 5 and Reading Compressed Data In-Reply-To: References: Message-ID: An HTML attachment was scrubbed... URL: From janfrode at tanso.net Wed Jan 20 14:47:07 2021 From: janfrode at tanso.net (Jan-Frode Myklebust) Date: Wed, 20 Jan 2021 15:47:07 +0100 Subject: [gpfsug-discuss] Spectrum Scale 5 and Reading Compressed Data In-Reply-To: References: Message-ID: This sounds like a bug to me... (I wouldn't expect mmchattr works on different node than other file access). I would check "mmdiag --iohist verbose" during these slow reads, to see if it gives a hint at what it's doing, versus what it shows during "mmchattr". Maybe one is triggering prefetch, while the other is some kind of random IO ? Also might be worth to try a mmtrace. Compare the traces for mmtrace start trace="all 0 vnode 1 vnop 1 io 1" cat compressedLargeFile mmtrace stop vs.: mmtrace start trace="all 0 vnode 1 vnop 1 io 1" mmchattr --compress no someLargeFile mmtrace stop (but please make sure that the file wasn't already uncompressed in pagepool in this second run). -jf On Wed, Jan 20, 2021 at 12:59 PM Daniel Kidger wrote: > I think you need to think about which node the file is being decompressed > on (and if that node has plenty of space in the page pool.) > iirc mmchattr works on one of the 'manager' nodes not necessarily the node > you typed the command on? > Daniel > > _________________________________________________________ > *Daniel Kidger Ph.D.* > IBM Technical Sales Specialist > Spectrum Scale, Spectrum Discover and IBM Cloud Object Storage > > +44-(0)7818 522 266 > daniel.kidger at uk.ibm.com > > > > > > > > > > ----- Original message ----- > From: Alec > Sent by: gpfsug-discuss-bounces at spectrumscale.org > To: gpfsug-discuss at spectrumscale.org > Cc: > Subject: [EXTERNAL] [gpfsug-discuss] Spectrum Scale 5 and Reading > Compressed Data > Date: Wed, Jan 20, 2021 11:10 > > We have AIX and Spectrum Scale 5.1 and are compressing older data. > > We can compress data at about 10GB/minute and decompress data wicked fast > using mmchattr, when a user reads data from a compressed file via > application open / read calls.... it moves at about 5MB/s. Normally our > I/O pipeline allows for 2400MB/s on a single file read. > > What can we look at to speed up the read of the compressed data, are there > any tunables that might affect this? > > As it is now if the backup daemon is backing up a compressed file, it can > get stuck for hours, I will go and mmchattr to decompress the file, within > a minute the file is decompressed, and backed up, then I simply recompress > the file once backup has moved on. > > Any advice on how to improve the compressed reads under AIX would be very > helpful. > > Alec > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > > https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=HlQDuUjgJx4p54QzcXd0_zTwf4Cr2t3NINalNhLTA2E&m=USgQqOp8HDCg0DYjdjSVFvVOwq1rMgRYPP_hoZqgUyI&s=_hdEB3EvWW-8ZzdS1D1roh92-AicdrVMywJwQGlKTIQ&e= > > > > Unless stated otherwise above: > IBM United Kingdom Limited - Registered in England and Wales with number > 741598. > Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: From anacreo at gmail.com Wed Jan 20 22:10:39 2021 From: anacreo at gmail.com (Alec) Date: Wed, 20 Jan 2021 14:10:39 -0800 Subject: [gpfsug-discuss] Spectrum Scale 5 and Reading Compressed Data Message-ID: I see a lot of references to the page pool. Our page pool is only 8 gb and our files can be very large into the terrabytes. I will try increasing the page pool in dev to 2x a test file and see if the problem resolves. Any documentation on the correlation here would be nice. I will see if I can get rights for the debug as well. Alec -------------- next part -------------- An HTML attachment was scrubbed... URL: From anacreo at gmail.com Fri Jan 22 11:44:56 2021 From: anacreo at gmail.com (Alec) Date: Fri, 22 Jan 2021 03:44:56 -0800 Subject: [gpfsug-discuss] Spectrum Scale 5 and Reading Compressed Data In-Reply-To: References: Message-ID: When comparing compression performance I see the following performance, is anyone else getting significantly higher on any other systems? Read Speeds: lz4 with null fill data, ~ 90MB/s lz4 with a SAS data set, ~40-50MB/s z with null fill data, ~ 15MB/s z with a SAS data set, ~ 5MB/s While on a 4G page pool I tested each of these file sizes and got roughly identical performance in all cases: 1 GB, 5 GB, and 10GB. This was on an S824 (p8) with read performance typically going to 1.2GB/s of read on a single thread (non-compressed). Doing a "very limited test" in Production hardware E850, 8gb Page Pool, with ~2.4 GB/s of read on a single thread (non-compressed) I got very similar results. In all cases the work was done from the NSD master, and due to the file sizes and the difference in page pool, i'd expect the 1gb files to move at a significantly faster pace if pagepool was a factor. If anyone could tell me what performance they get on their platform and what OS or Hardware they're using, I'd very much be interested. I'm debating if using GPFS to migrate the files to a .gz compressed version, and then providing a fifo mechanism to pipe through the compressed data wouldn't be a better solution. Alec On Wed, Jan 20, 2021 at 2:10 PM Alec wrote: > I see a lot of references to the page pool. Our page pool is only 8 gb and > our files can be very large into the terrabytes. > > I will try increasing the page pool in dev to 2x a test file and see if > the problem resolves. > > Any documentation on the correlation here would be nice. > > I will see if I can get rights for the debug as well. > > Alec > -------------- next part -------------- An HTML attachment was scrubbed... URL: From cabrillo at ifca.unican.es Wed Jan 27 13:20:08 2021 From: cabrillo at ifca.unican.es (Iban Cabrillo) Date: Wed, 27 Jan 2021 14:20:08 +0100 (CET) Subject: [gpfsug-discuss] cannot unmount fs Message-ID: <1692854831.1435321.1611753608516.JavaMail.zimbra@ifca.unican.es> Dear, We have a couple of GPFS fs, gpfs mount on /gpfs and gpfs2 mount on /gpfs/external, the problem is the mount path of the second fs sometimes is missied I am trying to mmumount this FS in order to change the mount path. but I cann't. If I make a mmumont gpfs2 or mmumount /gpfs/external I get this error: [root at gpfsgui ~]# mmumount gpfs2 Wed Jan 27 14:11:07 CET 2021: mmumount: Unmounting file systems ... umount: /gpfs/external: not mounted (/gpfs/external path exists) If I try to mmchfs -T XXX , the system says that the FS is already mounted. But there is no error in the logs. Any Idea? Regards, I -------------- next part -------------- An HTML attachment was scrubbed... URL: From olaf.weiser at de.ibm.com Wed Jan 27 13:28:44 2021 From: olaf.weiser at de.ibm.com (Olaf Weiser) Date: Wed, 27 Jan 2021 13:28:44 +0000 Subject: [gpfsug-discuss] cannot unmount fs In-Reply-To: <1692854831.1435321.1611753608516.JavaMail.zimbra@ifca.unican.es> References: <1692854831.1435321.1611753608516.JavaMail.zimbra@ifca.unican.es> Message-ID: An HTML attachment was scrubbed... URL: From owen.morgan at motionpicturesolutions.com Wed Jan 27 17:14:45 2021 From: owen.morgan at motionpicturesolutions.com (Owen Morgan) Date: Wed, 27 Jan 2021 17:14:45 +0000 Subject: [gpfsug-discuss] General Introduction Message-ID: Hi Everyone, First off thanks for this user group existing! I've already watched a load of the great webinars that were uploaded to YouTube! My name is Owen Morgan and I'm currently the 'Archivist' at Motion Picture Solutions in the UK. MPS is a post-production and distribution facility for the major studios and a multitude of smaller studios. Their main area of operation is mastering and localisation of feature films along with trailer creation etc.. They also then have a combined Hard drive and Internet based distribution arm that can distribute all that content to all cinemas in the UK and, with a huge number of growing partners and co-investors, globally as well. My role started of primarily as just archiving data to tar based LTO tapes, but in recent times has moved to using Spectrum Scale and Spectrum Archive and now to pretty much managing those systems from a sysadmin level. Recently MPS invested in a Spectrum Scale system for their content network, and again, I'm starting to take over management of that both on a ILM perspective and actively involved with maintenance and support. Enough about me. I have a 'first question' but will send that over separately over the next day or so to stop this email being a novella! Thanks and nice to meet people! Owen. [Sent from Front] Owen Morgan Data Wrangler Motion Picture Solutions Ltd T: E: owen.morgan at motionpicturesolutions.com | W: motionpicturesolutions.com A: Mission Hall, 9-11 North End Road, London, W14 8ST Motion Picture Solutions Ltd is a company registered in England and Wales under number 5388229, VAT number 201330482 -------------- next part -------------- An HTML attachment was scrubbed... URL: From owen.morgan at motionpicturesolutions.com Wed Jan 27 22:17:09 2021 From: owen.morgan at motionpicturesolutions.com (Owen Morgan) Date: Wed, 27 Jan 2021 22:17:09 +0000 Subject: [gpfsug-discuss] Policy Rules Syntax to find files older than X days excluding weekends in the calculation.... Message-ID: Hi Everyone, First question from me I appreciate this is policy engine thing as opposed to more fundamental Spectrum Scale so hope its ok! I'm trying to find a 'neat' way within a couple of policy rules to measure different time intervals (in days) but solely interested in WEEK DAYS only (ie delete files older than X week days only). An example is one of the rules a team would like implemented is delete all files older than 10 business days (ie week days only. We are ignoring public holidays as if they don't exist). Followed by a separate rule for a different folder of deleting all files older than 4 business days. The only way I've been able to facilitate this so far for the 4 business days is to separate out Fridays as a separate rule (because Friday - 4 days are all week days), then a separate rule for Monday through Thursday (because timestamp - 4 days has to factor in weekends, so has to actually set the INTERVAL to 6 days). Likewise for the 10 days rule I have to have a method to separate out Monday - Wednesday, and Thursday and Friday. I feel my 'solution', which does work, is extremely messy and not ideal should they want to add more rules as it just makes the policy file very long full of random definitions for all the different scenarios. So whilst the 'rules' are simple thanks to definitions, its the definitions themselves that are stacking up... depending on the interval required I have to create a unique set of is_weekday definitions and unique is_older_than_xdays definitions. here is a snippet of the policy: define( is_older_than_4days, ( (CURRENT_TIMESTAMP - CREATION_TIME) >= INTERVAL '4' DAYS ) ) define( is_older_than_6days, ( (CURRENT_TIMESTAMP - CREATION_TIME) >= INTERVAL '6' DAYS ) ) define( is_weekday_ex_fri, ( DAYOFWEEK(CURRENT_DATE) IN (2,3,4,5) ) ) define( is_weekday_ex_fri, ( DAYOFWEEK(CURRENT_DATE) = 6 ) ) RULE 'rule name' WHEN is_weekday_ex_fri DELETE WHERE include_list /* an include list just not added above */ AND is_older_than_6days RULE 'rule name' WHEN is_fri DELETE WHERE include_list /* an include list just not added above */ AND is_older_than_4days Are there any 'neat' other ways that are a tad more 'concise' for calculating INTERVAL X weekdays only which is easily and concisely extendable for any permutation of intervals required. I'm not sure how much SQL you can shoehorn into a policy before mmapplypolicy / policy engine isn't happy. Thanks in advance, Owen. [Sent from Front] Owen Morgan Data Wrangler Motion Picture Solutions Ltd T: E: owen.morgan at motionpicturesolutions.com | W: motionpicturesolutions.com A: Mission Hall, 9-11 North End Road, London, W14 8ST Motion Picture Solutions Ltd is a company registered in England and Wales under number 5388229, VAT number 201330482 -------------- next part -------------- An HTML attachment was scrubbed... URL: From owen.morgan at motionpicturesolutions.com Thu Jan 28 14:27:35 2021 From: owen.morgan at motionpicturesolutions.com (Owen Morgan) Date: Thu, 28 Jan 2021 14:27:35 +0000 Subject: [gpfsug-discuss] Policy Rules Syntax to find files older than X days excluding weekends in the calculation.... In-Reply-To: <1360632-1611790655.643971@r36M.X7Dl.WWDV> References: , <1360632-1611790655.643971@r36M.X7Dl.WWDV> Message-ID: Mark, Thank you for taking the time to comment, I genuinely appreciate it! I will digest and look at the mmfind examples (to be honest, didn't know it was a thing.....). Everything I know about Spectrum Scale (and Spectrum Archive) has been self taught so...... I'm pretty sure I'm missing Soooooooooo much useful info! I wish there was like a dummies guide (I've read the redbooks and admin guides as best I can but I know my knowledge is patchy at best)! Once digested I may, or may not, have further questions but I genuinely thank you for your assistance. Owen. [Sent from Front] Owen Morgan Data Wrangler Motion Picture Solutions Ltd T: E: owen.morgan at motionpicturesolutions.com | W: motionpicturesolutions.com A: Mission Hall, 9-11 North End Road, London, W14 8ST Motion Picture Solutions Ltd is a company registered in England and Wales under number 5388229, VAT number 201330482 On Wed, Jan 27 at 11:53 pm, > mark.bergman at uphs.upenn.edu wrote: In the message dated: Wed, 27 Jan 2021 22:17:09 +0000, The pithy ruminations from Owen Morgan on [[External] [gpfsug-discuss] Policy Rules Syntax to find files older than X days excluding weekends in the calculation....] were: => Hi Everyone, => => First question from me I appreciate this is policy engine thing as => opposed to more fundamental Spectrum Scale so hope its ok! It's great. => => I'm trying to find a 'neat' way within a couple of policy rules to => measure different time intervals (in days) but solely interested in WEEK => DAYS only (ie delete files older than X week days only). Policy SQL syntax gives me a headache. For this kind of task, I find that mmfind is your friend -- it's in the "examples" source dir within /usr/lpp/mmfs. Trivial to compile & install. Easier to debug, and it will generate the SQL. => => An example is one of the rules a team would like implemented is delete => all files older than 10 business days (ie week days only. We are What about "delete all files older than 12 calendar days" -- by definition, those files are older than 10 business days as well. => ignoring public holidays as if they don't exist). Followed by a separate => rule for a different folder of deleting all files older than 4 business => days. Or, older than 6 calendar days. Or, run this nightly: #! /bin/bash dateOffset=0 if [ `date '+%u'` -le 4 ] ; then # Mon=1, Tue=2, Wed=3, Thu=4 # # For a file to be more than 4 business days old on-or-before the # 4th day of the week, it must span the weekend, so offset the number # of required days in the file age dateOffset=2 fi mmfind -mtime $((4 + $dateOffset)) /path/to/Nuke/After/4/Days -xarg rm -f => => Thanks in advance, => => Owen. [Sent from Front] => => Owen Morgan Data Wrangler Motion Picture Solutions Ltd T: E: => owen.morgan at motionpicturesolutions.com | W: motionpicturesolutions.com => A: Mission Hall, 9-11 North End Road, London, W14 8ST Motion Picture => Solutions Ltd is a company registered in England and Wales under number => 5388229, VAT number 201330482 => -------------- next part -------------- An HTML attachment was scrubbed... URL: From jamervi at sandia.gov Thu Jan 28 18:26:37 2021 From: jamervi at sandia.gov (Mervini, Joseph A) Date: Thu, 28 Jan 2021 18:26:37 +0000 Subject: [gpfsug-discuss] Number of vCPUs exceeded Message-ID: <59193954-B649-4DF5-AD21-652922E49FD9@contoso.com> Hi, I haven?t seen this before but one of my remote cluster users reported the system in question is experiencing high loads and is with Scale unmounting the file system. This is the output she is seeing: Wed Jan 27 22:18:34.168 2021: [I] GPFS vCPU limits: Low warning limit 3 vCPUs, High warning limit 256 vCPUs, Hard limit 1536 vCPUs. Wed Jan 27 22:18:34.169 2021: [I] GPFS vCPU limits include all vCPUs that Linux sees as online or possibly online via hot add, ht/smt changes, etc. Wed Jan 27 22:18:34.170 2021: [X] GPFS detected 1792 vCPUs. This exceeds the warning limit of 256 vCPUs and the hard limit of 1536 vCPUs. GPFS will shutdown Any help will be appreciated. Thanks, Joe ==== Joe Mervini Sandia National Laboratories High Performance Computing 505.844.6770 jamervi at sandia.gov -------------- next part -------------- An HTML attachment was scrubbed... URL: From mzp at us.ibm.com Thu Jan 28 18:42:56 2021 From: mzp at us.ibm.com (Madhav Ponamgi1) Date: Thu, 28 Jan 2021 13:42:56 -0500 Subject: [gpfsug-discuss] gpfsug-discuss Digest, Vol 108, Issue 18 In-Reply-To: References: Message-ID: To calculate this directly (if you don't want to depend on a utility) consider the following steps. There are many more such algorithms in the wonderful book Calenderical Calculations. Take the last two digits of the year. Divide by 4, discarding any fraction. Add the day of the month. Add the month's key value: JFM AMJ JAS OND 144 025 036 146 Subtract 1 for January or February of a leap year. For a Gregorian date, add 0 for 1900's, 6 for 2000's, 4 for 1700's, 2 for 1800's; for other years, add or subtract multiples of 400. For a Julian date, add 1 for 1700's, and 1 for every additional century you go back. Add the last two digits of the year. Divide by 7 and take the remainder. --- Madhav mzp at us.ibm.com From: gpfsug-discuss-request at spectrumscale.org To: gpfsug-discuss at spectrumscale.org Date: 01/28/2021 01:32 PM Subject: [EXTERNAL] gpfsug-discuss Digest, Vol 108, Issue 18 Sent by: gpfsug-discuss-bounces at spectrumscale.org Send gpfsug-discuss mailing list submissions to gpfsug-discuss at spectrumscale.org To subscribe or unsubscribe via the World Wide Web, visit http://gpfsug.org/mailman/listinfo/gpfsug-discuss or, via email, send a message with subject or body 'help' to gpfsug-discuss-request at spectrumscale.org You can reach the person managing the list at gpfsug-discuss-owner at spectrumscale.org When replying, please edit your Subject line so it is more specific than "Re: Contents of gpfsug-discuss digest..." Today's Topics: 1. Re: Policy Rules Syntax to find files older than X days excluding weekends in the calculation.... (Owen Morgan) 2. Number of vCPUs exceeded (Mervini, Joseph A) ---------------------------------------------------------------------- Message: 1 Date: Thu, 28 Jan 2021 14:27:35 +0000 From: Owen Morgan To: "mark.bergman at uphs.upenn.edu" , "gpfsug-discuss at spectrumscale.org" Subject: Re: [gpfsug-discuss] Policy Rules Syntax to find files older than X days excluding weekends in the calculation.... Message-ID: Content-Type: text/plain; charset="utf-8" Mark, Thank you for taking the time to comment, I genuinely appreciate it! I will digest and look at the mmfind examples (to be honest, didn't know it was a thing.....). Everything I know about Spectrum Scale (and Spectrum Archive) has been self taught so...... I'm pretty sure I'm missing Soooooooooo much useful info! I wish there was like a dummies guide (I've read the redbooks and admin guides as best I can but I know my knowledge is patchy at best)! Once digested I may, or may not, have further questions but I genuinely thank you for your assistance. Owen. [Sent from Front] Owen Morgan Data Wrangler Motion Picture Solutions Ltd T: E: owen.morgan at motionpicturesolutions.com | W: motionpicturesolutions.com A: Mission Hall, 9-11 North End Road, London, W14 8ST Motion Picture Solutions Ltd is a company registered in England and Wales under number 5388229, VAT number 201330482 On Wed, Jan 27 at 11:53 pm, > mark.bergman at uphs.upenn.edu< mailto:mark.bergman at uphs.upenn.edu> wrote: In the message dated: Wed, 27 Jan 2021 22:17:09 +0000, The pithy ruminations from Owen Morgan on [[External] [gpfsug-discuss] Policy Rules Syntax to find files older than X days excluding weekends in the calculation....] were: => Hi Everyone, => => First question from me I appreciate this is policy engine thing as => opposed to more fundamental Spectrum Scale so hope its ok! It's great. => => I'm trying to find a 'neat' way within a couple of policy rules to => measure different time intervals (in days) but solely interested in WEEK => DAYS only (ie delete files older than X week days only). Policy SQL syntax gives me a headache. For this kind of task, I find that mmfind is your friend -- it's in the "examples" source dir within /usr/lpp/mmfs. Trivial to compile & install. Easier to debug, and it will generate the SQL. => => An example is one of the rules a team would like implemented is delete => all files older than 10 business days (ie week days only. We are What about "delete all files older than 12 calendar days" -- by definition, those files are older than 10 business days as well. => ignoring public holidays as if they don't exist). Followed by a separate => rule for a different folder of deleting all files older than 4 business => days. Or, older than 6 calendar days. Or, run this nightly: #! /bin/bash dateOffset=0 if [ `date '+%u'` -le 4 ] ; then # Mon=1, Tue=2, Wed=3, Thu=4 # # For a file to be more than 4 business days old on-or-before the # 4th day of the week, it must span the weekend, so offset the number # of required days in the file age dateOffset=2 fi mmfind -mtime $((4 + $dateOffset)) /path/to/Nuke/After/4/Days -xarg rm -f => => Thanks in advance, => => Owen. [Sent from Front] => => Owen Morgan Data Wrangler Motion Picture Solutions Ltd T: E: => owen.morgan at motionpicturesolutions.com< mailto:owen.morgan at motionpicturesolutions.com> | W: motionpicturesolutions.com< http://motionpicturesolutions.com > => A: Mission Hall, 9-11 North End Road, London, W14 8ST Motion Picture => Solutions Ltd is a company registered in England and Wales under number => 5388229, VAT number 201330482 => -------------- next part -------------- An HTML attachment was scrubbed... URL: < http://gpfsug.org/pipermail/gpfsug-discuss/attachments/20210128/201a280e/attachment-0001.html > ------------------------------ Message: 2 Date: Thu, 28 Jan 2021 18:26:37 +0000 From: "Mervini, Joseph A" To: "gpfsug-discuss at spectrumscale.org" Subject: [gpfsug-discuss] Number of vCPUs exceeded Message-ID: <59193954-B649-4DF5-AD21-652922E49FD9 at contoso.com> Content-Type: text/plain; charset="utf-8" Hi, I haven?t seen this before but one of my remote cluster users reported the system in question is experiencing high loads and is with Scale unmounting the file system. This is the output she is seeing: Wed Jan 27 22:18:34.168 2021: [I] GPFS vCPU limits: Low warning limit 3 vCPUs, High warning limit 256 vCPUs, Hard limit 1536 vCPUs. Wed Jan 27 22:18:34.169 2021: [I] GPFS vCPU limits include all vCPUs that Linux sees as online or possibly online via hot add, ht/smt changes, etc. Wed Jan 27 22:18:34.170 2021: [X] GPFS detected 1792 vCPUs. This exceeds the warning limit of 256 vCPUs and the hard limit of 1536 vCPUs. GPFS will shutdown Any help will be appreciated. Thanks, Joe ==== Joe Mervini Sandia National Laboratories High Performance Computing 505.844.6770 jamervi at sandia.gov -------------- next part -------------- An HTML attachment was scrubbed... URL: < http://gpfsug.org/pipermail/gpfsug-discuss/attachments/20210128/930fadb1/attachment.html > ------------------------------ _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss End of gpfsug-discuss Digest, Vol 108, Issue 18 *********************************************** -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: From knop at us.ibm.com Thu Jan 28 18:55:36 2021 From: knop at us.ibm.com (Felipe Knop) Date: Thu, 28 Jan 2021 18:55:36 +0000 Subject: [gpfsug-discuss] Number of vCPUs exceeded In-Reply-To: <59193954-B649-4DF5-AD21-652922E49FD9@contoso.com> References: <59193954-B649-4DF5-AD21-652922E49FD9@contoso.com> Message-ID: An HTML attachment was scrubbed... URL: From UWEFALKE at de.ibm.com Thu Jan 28 19:54:38 2021 From: UWEFALKE at de.ibm.com (Uwe Falke) Date: Thu, 28 Jan 2021 20:54:38 +0100 Subject: [gpfsug-discuss] gpfsug-discuss Digest, Vol 108, Issue 18 In-Reply-To: References: Message-ID: sounds quite complicated. if all public holidays can be ignored it is simple: the algorithm has only to run on week days (the effective age of files would not change on weekend days.). To find the latest date to remove files: Now, enumerate the weekdays, starting with Mon=1 If your max age is T find the integer multiple of 5 and the remainder such that T=T_i*5 +R Determine the current DoW in terms of your enumeration. if DoW - R > 0, your max age date is Dx=D-(R+7*T_i) else your max age date is Dx=D-(R+2+7*T_i dates can be easily compiled in epoch, like D_e=$(date +%s), Dx_e = D_e - 86400*(R+7*T_i) or Dx_e = D_e - 86400*(R+2+7*T_i) you then need to convert the found epoch time back into a christian date which could be done by date --date='@ To: gpfsug-discuss at spectrumscale.org Date: 28/01/2021 19:43 Subject: [EXTERNAL] Re: [gpfsug-discuss] gpfsug-discuss Digest, Vol 108, Issue 18 Sent by: gpfsug-discuss-bounces at spectrumscale.org To calculate this directly (if you don't want to depend on a utility) consider the following steps. There are many more such algorithms in the wonderful book Calenderical Calculations. 1. Take the last two digits of the year. 2. Divide by 4, discarding any fraction. 3. Add the day of the month. 4. Add the month's key value: JFM AMJ JAS OND 144 025 036 146 5. Subtract 1 for January or February of a leap year. 6. For a Gregorian date, add 0 for 1900's, 6 for 2000's, 4 for 1700's, 2 for 1800's; for other years, add or subtract multiples of 400. 7. For a Julian date, add 1 for 1700's, and 1 for every additional century you go back. 8. Add the last two digits of the year. 9. Divide by 7 and take the remainder. --- Madhav mzp at us.ibm.com gpfsug-discuss-request---01/28/2021 01:32:13 PM---Send gpfsug-discuss mailing list submissions to gpfsug-discuss at spectrumscale.org From: gpfsug-discuss-request at spectrumscale.org To: gpfsug-discuss at spectrumscale.org Date: 01/28/2021 01:32 PM Subject: [EXTERNAL] gpfsug-discuss Digest, Vol 108, Issue 18 Sent by: gpfsug-discuss-bounces at spectrumscale.org Send gpfsug-discuss mailing list submissions to gpfsug-discuss at spectrumscale.org To subscribe or unsubscribe via the World Wide Web, visit http://gpfsug.org/mailman/listinfo/gpfsug-discuss or, via email, send a message with subject or body 'help' to gpfsug-discuss-request at spectrumscale.org You can reach the person managing the list at gpfsug-discuss-owner at spectrumscale.org When replying, please edit your Subject line so it is more specific than "Re: Contents of gpfsug-discuss digest..." Today's Topics: 1. Re: Policy Rules Syntax to find files older than X days excluding weekends in the calculation.... (Owen Morgan) 2. Number of vCPUs exceeded (Mervini, Joseph A) ---------------------------------------------------------------------- Message: 1 Date: Thu, 28 Jan 2021 14:27:35 +0000 From: Owen Morgan To: "mark.bergman at uphs.upenn.edu" , "gpfsug-discuss at spectrumscale.org" Subject: Re: [gpfsug-discuss] Policy Rules Syntax to find files older than X days excluding weekends in the calculation.... Message-ID: Content-Type: text/plain; charset="utf-8" Mark, Thank you for taking the time to comment, I genuinely appreciate it! I will digest and look at the mmfind examples (to be honest, didn't know it was a thing.....). Everything I know about Spectrum Scale (and Spectrum Archive) has been self taught so...... I'm pretty sure I'm missing Soooooooooo much useful info! I wish there was like a dummies guide (I've read the redbooks and admin guides as best I can but I know my knowledge is patchy at best)! Once digested I may, or may not, have further questions but I genuinely thank you for your assistance. Owen. [Sent from Front] Owen Morgan Data Wrangler Motion Picture Solutions Ltd T: E: owen.morgan at motionpicturesolutions.com | W: motionpicturesolutions.com A: Mission Hall, 9-11 North End Road, London, W14 8ST Motion Picture Solutions Ltd is a company registered in England and Wales under number 5388229, VAT number 201330482 On Wed, Jan 27 at 11:53 pm, > mark.bergman at uphs.upenn.edu< mailto:mark.bergman at uphs.upenn.edu> wrote: In the message dated: Wed, 27 Jan 2021 22:17:09 +0000, The pithy ruminations from Owen Morgan on [[External] [gpfsug-discuss] Policy Rules Syntax to find files older than X days excluding weekends in the calculation....] were: => Hi Everyone, => => First question from me I appreciate this is policy engine thing as => opposed to more fundamental Spectrum Scale so hope its ok! It's great. => => I'm trying to find a 'neat' way within a couple of policy rules to => measure different time intervals (in days) but solely interested in WEEK => DAYS only (ie delete files older than X week days only). Policy SQL syntax gives me a headache. For this kind of task, I find that mmfind is your friend -- it's in the "examples" source dir within /usr/lpp/mmfs. Trivial to compile & install. Easier to debug, and it will generate the SQL. => => An example is one of the rules a team would like implemented is delete => all files older than 10 business days (ie week days only. We are What about "delete all files older than 12 calendar days" -- by definition, those files are older than 10 business days as well. => ignoring public holidays as if they don't exist). Followed by a separate => rule for a different folder of deleting all files older than 4 business => days. Or, older than 6 calendar days. Or, run this nightly: #! /bin/bash dateOffset=0 if [ `date '+%u'` -le 4 ] ; then # Mon=1, Tue=2, Wed=3, Thu=4 # # For a file to be more than 4 business days old on-or-before the # 4th day of the week, it must span the weekend, so offset the number # of required days in the file age dateOffset=2 fi mmfind -mtime $((4 + $dateOffset)) /path/to/Nuke/After/4/Days -xarg rm -f => => Thanks in advance, => => Owen. [Sent from Front] => => Owen Morgan Data Wrangler Motion Picture Solutions Ltd T: E: => owen.morgan at motionpicturesolutions.com< mailto:owen.morgan at motionpicturesolutions.com> | W: motionpicturesolutions.com => A: Mission Hall, 9-11 North End Road, London, W14 8ST Motion Picture => Solutions Ltd is a company registered in England and Wales under number => 5388229, VAT number 201330482 => -------------- next part -------------- An HTML attachment was scrubbed... URL: < http://gpfsug.org/pipermail/gpfsug-discuss/attachments/20210128/201a280e/attachment-0001.html > ------------------------------ Message: 2 Date: Thu, 28 Jan 2021 18:26:37 +0000 From: "Mervini, Joseph A" To: "gpfsug-discuss at spectrumscale.org" Subject: [gpfsug-discuss] Number of vCPUs exceeded Message-ID: <59193954-B649-4DF5-AD21-652922E49FD9 at contoso.com> Content-Type: text/plain; charset="utf-8" Hi, I haven?t seen this before but one of my remote cluster users reported the system in question is experiencing high loads and is with Scale unmounting the file system. This is the output she is seeing: Wed Jan 27 22:18:34.168 2021: [I] GPFS vCPU limits: Low warning limit 3 vCPUs, High warning limit 256 vCPUs, Hard limit 1536 vCPUs. Wed Jan 27 22:18:34.169 2021: [I] GPFS vCPU limits include all vCPUs that Linux sees as online or possibly online via hot add, ht/smt changes, etc. Wed Jan 27 22:18:34.170 2021: [X] GPFS detected 1792 vCPUs. This exceeds the warning limit of 256 vCPUs and the hard limit of 1536 vCPUs. GPFS will shutdown Any help will be appreciated. Thanks, Joe ==== Joe Mervini Sandia National Laboratories High Performance Computing 505.844.6770 jamervi at sandia.gov -------------- next part -------------- An HTML attachment was scrubbed... URL: < http://gpfsug.org/pipermail/gpfsug-discuss/attachments/20210128/930fadb1/attachment.html > ------------------------------ _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss End of gpfsug-discuss Digest, Vol 108, Issue 18 *********************************************** _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss From mzp at us.ibm.com Fri Jan 29 12:38:37 2021 From: mzp at us.ibm.com (Madhav Ponamgi1) Date: Fri, 29 Jan 2021 07:38:37 -0500 Subject: [gpfsug-discuss] gpfsug-discuss Digest, Vol 108, Issue 20 In-Reply-To: References: Message-ID: Here is a simple C function posted from comp.lang.c many years ago that works for a restricted range (year > 1752) based on the algorithm I described earlier. dayofweek(y, m, d) { y -= m < 3; return (y + y/4 - y/100 + y/400 + "-bed=pen+mad."[m] + d) % 7; } --- Madhav From: gpfsug-discuss-request at spectrumscale.org To: gpfsug-discuss at spectrumscale.org Date: 01/29/2021 07:00 AM Subject: [EXTERNAL] gpfsug-discuss Digest, Vol 108, Issue 20 Sent by: gpfsug-discuss-bounces at spectrumscale.org Send gpfsug-discuss mailing list submissions to gpfsug-discuss at spectrumscale.org To subscribe or unsubscribe via the World Wide Web, visit http://gpfsug.org/mailman/listinfo/gpfsug-discuss or, via email, send a message with subject or body 'help' to gpfsug-discuss-request at spectrumscale.org You can reach the person managing the list at gpfsug-discuss-owner at spectrumscale.org When replying, please edit your Subject line so it is more specific than "Re: Contents of gpfsug-discuss digest..." Today's Topics: 1. Re: gpfsug-discuss Digest, Vol 108, Issue 18 (Uwe Falke) ---------------------------------------------------------------------- Message: 1 Date: Thu, 28 Jan 2021 20:54:38 +0100 From: "Uwe Falke" To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] gpfsug-discuss Digest, Vol 108, Issue 18 Message-ID: Content-Type: text/plain; charset="ISO-8859-1" sounds quite complicated. if all public holidays can be ignored it is simple: the algorithm has only to run on week days (the effective age of files would not change on weekend days.). To find the latest date to remove files: Now, enumerate the weekdays, starting with Mon=1 If your max age is T find the integer multiple of 5 and the remainder such that T=T_i*5 +R Determine the current DoW in terms of your enumeration. if DoW - R > 0, your max age date is Dx=D-(R+7*T_i) else your max age date is Dx=D-(R+2+7*T_i dates can be easily compiled in epoch, like D_e=$(date +%s), Dx_e = D_e - 86400*(R+7*T_i) or Dx_e = D_e - 86400*(R+2+7*T_i) you then need to convert the found epoch time back into a christian date which could be done by date --date='@ To: gpfsug-discuss at spectrumscale.org Date: 28/01/2021 19:43 Subject: [EXTERNAL] Re: [gpfsug-discuss] gpfsug-discuss Digest, Vol 108, Issue 18 Sent by: gpfsug-discuss-bounces at spectrumscale.org To calculate this directly (if you don't want to depend on a utility) consider the following steps. There are many more such algorithms in the wonderful book Calenderical Calculations. 1. Take the last two digits of the year. 2. Divide by 4, discarding any fraction. 3. Add the day of the month. 4. Add the month's key value: JFM AMJ JAS OND 144 025 036 146 5. Subtract 1 for January or February of a leap year. 6. For a Gregorian date, add 0 for 1900's, 6 for 2000's, 4 for 1700's, 2 for 1800's; for other years, add or subtract multiples of 400. 7. For a Julian date, add 1 for 1700's, and 1 for every additional century you go back. 8. Add the last two digits of the year. 9. Divide by 7 and take the remainder. --- Madhav mzp at us.ibm.com gpfsug-discuss-request---01/28/2021 01:32:13 PM---Send gpfsug-discuss mailing list submissions to gpfsug-discuss at spectrumscale.org From: gpfsug-discuss-request at spectrumscale.org To: gpfsug-discuss at spectrumscale.org Date: 01/28/2021 01:32 PM Subject: [EXTERNAL] gpfsug-discuss Digest, Vol 108, Issue 18 Sent by: gpfsug-discuss-bounces at spectrumscale.org Send gpfsug-discuss mailing list submissions to gpfsug-discuss at spectrumscale.org To subscribe or unsubscribe via the World Wide Web, visit http://gpfsug.org/mailman/listinfo/gpfsug-discuss or, via email, send a message with subject or body 'help' to gpfsug-discuss-request at spectrumscale.org You can reach the person managing the list at gpfsug-discuss-owner at spectrumscale.org When replying, please edit your Subject line so it is more specific than "Re: Contents of gpfsug-discuss digest..." Today's Topics: 1. Re: Policy Rules Syntax to find files older than X days excluding weekends in the calculation.... (Owen Morgan) 2. Number of vCPUs exceeded (Mervini, Joseph A) ---------------------------------------------------------------------- Message: 1 Date: Thu, 28 Jan 2021 14:27:35 +0000 From: Owen Morgan To: "mark.bergman at uphs.upenn.edu" , "gpfsug-discuss at spectrumscale.org" Subject: Re: [gpfsug-discuss] Policy Rules Syntax to find files older than X days excluding weekends in the calculation.... Message-ID: Content-Type: text/plain; charset="utf-8" Mark, Thank you for taking the time to comment, I genuinely appreciate it! I will digest and look at the mmfind examples (to be honest, didn't know it was a thing.....). Everything I know about Spectrum Scale (and Spectrum Archive) has been self taught so...... I'm pretty sure I'm missing Soooooooooo much useful info! I wish there was like a dummies guide (I've read the redbooks and admin guides as best I can but I know my knowledge is patchy at best)! Once digested I may, or may not, have further questions but I genuinely thank you for your assistance. Owen. [Sent from Front] Owen Morgan Data Wrangler Motion Picture Solutions Ltd T: E: owen.morgan at motionpicturesolutions.com | W: motionpicturesolutions.com A: Mission Hall, 9-11 North End Road, London, W14 8ST Motion Picture Solutions Ltd is a company registered in England and Wales under number 5388229, VAT number 201330482 On Wed, Jan 27 at 11:53 pm, > mark.bergman at uphs.upenn.edu< mailto:mark.bergman at uphs.upenn.edu> wrote: In the message dated: Wed, 27 Jan 2021 22:17:09 +0000, The pithy ruminations from Owen Morgan on [[External] [gpfsug-discuss] Policy Rules Syntax to find files older than X days excluding weekends in the calculation....] were: => Hi Everyone, => => First question from me I appreciate this is policy engine thing as => opposed to more fundamental Spectrum Scale so hope its ok! It's great. => => I'm trying to find a 'neat' way within a couple of policy rules to => measure different time intervals (in days) but solely interested in WEEK => DAYS only (ie delete files older than X week days only). Policy SQL syntax gives me a headache. For this kind of task, I find that mmfind is your friend -- it's in the "examples" source dir within /usr/lpp/mmfs. Trivial to compile & install. Easier to debug, and it will generate the SQL. => => An example is one of the rules a team would like implemented is delete => all files older than 10 business days (ie week days only. We are What about "delete all files older than 12 calendar days" -- by definition, those files are older than 10 business days as well. => ignoring public holidays as if they don't exist). Followed by a separate => rule for a different folder of deleting all files older than 4 business => days. Or, older than 6 calendar days. Or, run this nightly: #! /bin/bash dateOffset=0 if [ `date '+%u'` -le 4 ] ; then # Mon=1, Tue=2, Wed=3, Thu=4 # # For a file to be more than 4 business days old on-or-before the # 4th day of the week, it must span the weekend, so offset the number # of required days in the file age dateOffset=2 fi mmfind -mtime $((4 + $dateOffset)) /path/to/Nuke/After/4/Days -xarg rm -f => => Thanks in advance, => => Owen. [Sent from Front] => => Owen Morgan Data Wrangler Motion Picture Solutions Ltd T: E: => owen.morgan at motionpicturesolutions.com< mailto:owen.morgan at motionpicturesolutions.com> | W: motionpicturesolutions.com< http://motionpicturesolutions.com > => A: Mission Hall, 9-11 North End Road, London, W14 8ST Motion Picture => Solutions Ltd is a company registered in England and Wales under number => 5388229, VAT number 201330482 => -------------- next part -------------- An HTML attachment was scrubbed... URL: < http://gpfsug.org/pipermail/gpfsug-discuss/attachments/20210128/201a280e/attachment-0001.html > ------------------------------ Message: 2 Date: Thu, 28 Jan 2021 18:26:37 +0000 From: "Mervini, Joseph A" To: "gpfsug-discuss at spectrumscale.org" Subject: [gpfsug-discuss] Number of vCPUs exceeded Message-ID: <59193954-B649-4DF5-AD21-652922E49FD9 at contoso.com> Content-Type: text/plain; charset="utf-8" Hi, I haven?t seen this before but one of my remote cluster users reported the system in question is experiencing high loads and is with Scale unmounting the file system. This is the output she is seeing: Wed Jan 27 22:18:34.168 2021: [I] GPFS vCPU limits: Low warning limit 3 vCPUs, High warning limit 256 vCPUs, Hard limit 1536 vCPUs. Wed Jan 27 22:18:34.169 2021: [I] GPFS vCPU limits include all vCPUs that Linux sees as online or possibly online via hot add, ht/smt changes, etc. Wed Jan 27 22:18:34.170 2021: [X] GPFS detected 1792 vCPUs. This exceeds the warning limit of 256 vCPUs and the hard limit of 1536 vCPUs. GPFS will shutdown Any help will be appreciated. Thanks, Joe ==== Joe Mervini Sandia National Laboratories High Performance Computing 505.844.6770 jamervi at sandia.gov -------------- next part -------------- An HTML attachment was scrubbed... URL: < http://gpfsug.org/pipermail/gpfsug-discuss/attachments/20210128/930fadb1/attachment.html > ------------------------------ _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss End of gpfsug-discuss Digest, Vol 108, Issue 18 *********************************************** _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss ------------------------------ _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss End of gpfsug-discuss Digest, Vol 108, Issue 20 *********************************************** -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: From jpr9c at virginia.edu Fri Jan 29 19:47:13 2021 From: jpr9c at virginia.edu (Ruffner, Scott (jpr9c)) Date: Fri, 29 Jan 2021 19:47:13 +0000 Subject: [gpfsug-discuss] Adding client nodes using a shared NFS root image. Message-ID: <4A332838-9D59-477D-AAE2-F79F8AAD143B@virginia.edu> Hi everyone, We want all of our compute nodes (bare metal) to directly participate in the cluster as client nodes; of course, they are sharing a common root image. Adding nodes via the regular mmaddnode (with the dsh operation to replicate files to the clients) isn?t really viable, but if I short-circuit that, and simply generate the /var/mmfs/gen files and then manually copy those and the keyfiles to the shared root images, is that safe? Am I going about this the entirely wrong way? -- Scott Ruffner Senior HPC Engineer UVa Research Computing (434)924-6778(o) (434)295-0250(h) sruffner at virginia.edu -------------- next part -------------- An HTML attachment was scrubbed... URL: From david_johnson at brown.edu Fri Jan 29 19:52:04 2021 From: david_johnson at brown.edu (david_johnson at brown.edu) Date: Fri, 29 Jan 2021 14:52:04 -0500 Subject: [gpfsug-discuss] Adding client nodes using a shared NFS root image. In-Reply-To: <4A332838-9D59-477D-AAE2-F79F8AAD143B@virginia.edu> References: <4A332838-9D59-477D-AAE2-F79F8AAD143B@virginia.edu> Message-ID: <094EDEFE-4B15-4214-90C4-CD83BC76A10A@brown.edu> We use mmsdrrestore after the node boots. In our case these are diskless nodes provisioned by xCAT. The post install script takes care of ensuring infiniband is lit up, and does the mmsdrrestore followed by mmstartup. -- ddj Dave Johnson > On Jan 29, 2021, at 2:47 PM, Ruffner, Scott (jpr9c) wrote: > > ? > Hi everyone, > > We want all of our compute nodes (bare metal) to directly participate in the cluster as client nodes; of course, they are sharing a common root image. > > Adding nodes via the regular mmaddnode (with the dsh operation to replicate files to the clients) isn?t really viable, but if I short-circuit that, and simply generate the /var/mmfs/gen files and then manually copy those and the keyfiles to the shared root images, is that safe? > > Am I going about this the entirely wrong way? > > -- > Scott Ruffner > Senior HPC Engineer > UVa Research Computing > (434)924-6778(o) > (434)295-0250(h) > sruffner at virginia.edu > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From jpr9c at virginia.edu Fri Jan 29 20:04:32 2021 From: jpr9c at virginia.edu (Ruffner, Scott (jpr9c)) Date: Fri, 29 Jan 2021 20:04:32 +0000 Subject: [gpfsug-discuss] Adding client nodes using a shared NFS root image. In-Reply-To: <094EDEFE-4B15-4214-90C4-CD83BC76A10A@brown.edu> References: <4A332838-9D59-477D-AAE2-F79F8AAD143B@virginia.edu> <094EDEFE-4B15-4214-90C4-CD83BC76A10A@brown.edu> Message-ID: <6A72D8F2-65ED-431C-B13F-3D4F189A53DF@virginia.edu> Thanks David! Slick solution. -- Scott Ruffner Senior HPC Engineer UVa Research Computing (434)924-6778(o) (434)295-0250(h) sruffner at virginia.edu From: on behalf of "david_johnson at brown.edu" Reply-To: gpfsug main discussion list Date: Friday, January 29, 2021 at 2:52 PM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] Adding client nodes using a shared NFS root image. We use mmsdrrestore after the node boots. In our case these are diskless nodes provisioned by xCAT. The post install script takes care of ensuring infiniband is lit up, and does the mmsdrrestore followed by mmstartup. -- ddj Dave Johnson On Jan 29, 2021, at 2:47 PM, Ruffner, Scott (jpr9c) wrote: Hi everyone, We want all of our compute nodes (bare metal) to directly participate in the cluster as client nodes; of course, they are sharing a common root image. Adding nodes via the regular mmaddnode (with the dsh operation to replicate files to the clients) isn?t really viable, but if I short-circuit that, and simply generate the /var/mmfs/gen files and then manually copy those and the keyfiles to the shared root images, is that safe? Am I going about this the entirely wrong way? -- Scott Ruffner Senior HPC Engineer UVa Research Computing (434)924-6778(o) (434)295-0250(h) sruffner at virginia.edu _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From owen.morgan at motionpicturesolutions.com Sat Jan 30 00:31:27 2021 From: owen.morgan at motionpicturesolutions.com (Owen Morgan) Date: Sat, 30 Jan 2021 00:31:27 +0000 Subject: [gpfsug-discuss] gpfsug-discuss Digest, Vol 108, Issue 18 Message-ID: Hi all, Sorry I appear to have missed a load of replies and screwed up the threading thing when looking online... not used to this email group thing! Might look at the slack option! Just wanted to clarify my general issue a bit: So the methodology I've started to implement is per department policy files where all rules related to managing a specific teams assets are all in one policy file and then I have fine control over when and how each departments rule run, when, and potentially (if it mattered) what order etc. So team a want me to manage two folders where in folder 1a all files older than 4 week days of age are deleted, and in filder 1b all files older than 8 week days are deleted. They now want me to manage a different set of two folders with two different "thresholds" for how old they need to be in week days before they delete (ie. I now need additional rules for folders 2a and 2b). The issue is for each scenario there is a different 'offset' required depending on the day of the week the policy is run to maintian the number of weekdays required (the 'threshold' is always in weekdays, so intervening weekends need to be added to take them into account). For instance when run on a Monday, if the threshold were 4 weekdays of age, I need to be deleting files that were created on the previous Tuesday. Which is 6 days (ie 4 days + 2 weekend days). If the threshold was 8 week days the threhold in terms of the policy would be 12 (ie 8 plus 2x 2 weekend days). The only way I was able to work this out in the sql like policy file was to split the week days into groups where the offset would be the same (so for 4 week days, Monday through Thursday share the offset of 2 - which then has to be added to the 4 for the desired result) and then a separate rule for the Friday. However for every addition of a different threshold I have to write all new groups to match the days etc.. so the policy ends up with 6 rules but 150 lines of definition macros.... I was trying to work out if there was a more concise way of, within the sql like framework, programmatically calculating the day offest the needs to be added to the threshold to allow a more generic function that could just automatically work it out.... The algorithm I have recently thought up is to effectively calculate the difference in weeks between the current run time and the desired deletion day and multiply it by 2. Psudocode it would be (threshold is the number of week days for the rule, offset is the number that needs to be added to account for the weekends between those dates): If current day of month - threshold = sunday, then add 1 to the threshold value (sundays are de oted as the week start so Saturday would represent the previous week). Offset = (difference between current week and week of (current day of month - threshold)) x 2 A worked example: Threshold = 11 week days Policy run on the 21st Jan which is the week 4 of 2021 21st - 11 days = Sunday 10th Therefore need to add 1 to threshold to push the day into the previous week. New threshold is 12 Saturday 9th is in week 2 of 2021 so the offset is week 4 - week 2 = 2 (ie difference in weeks) x 2 which is 4. Add 4 to the original 11 to make 15. So for the policy running on the 21st Jan to delete only files older than 11 week days of age I need to set my rule to be Delete where ((Current_date - creation_time) >= interval '15' days Unfortunately, I'm now struggling to implement that algorithm..... it seems the SQL-ness is very limited and I cant declare variables to use or stuff.... its a shame as that algorithm is generic so only needs to be written once and you could have ad many unique rules as you want all with different thresholds etc... Is there another way to get the same results? I would prefer to stay in the bounds of the SQL policy rule setup as that is the framework I have created and started to implement.. Hope the above gives more clarity to what Im asking.... sorry if one of the previous rplies addresses this, if it does I clearly was confused by the response (I seriously feel like an amateur at this at the moment and am having to learn all these finer things as I go). Thanks in advance, Owen. Owen Morgan Data Wrangler Motion Picture Solutions Ltd T: E: owen.morgan at motionpicturesolutions.com | W: motionpicturesolutions.com A: Mission Hall, 9-11 North End Road, London, W14 8ST Motion Picture Solutions Ltd is a company registered in England and Wales under number 5388229, VAT number 201330482 -------------- next part -------------- An HTML attachment was scrubbed... URL: From anacreo at gmail.com Sat Jan 30 02:53:49 2021 From: anacreo at gmail.com (Alec) Date: Fri, 29 Jan 2021 18:53:49 -0800 Subject: [gpfsug-discuss] gpfsug-discuss Digest, Vol 108, Issue 18 In-Reply-To: References: Message-ID: Based on the problem you have. I would write an mmfind / mmxarg command that sets a custom attr such as puge.after, have a ksh/perl/php script that simply makes the necessary calculations using all the tricks it has... Skip files that already have the attribute set, or are too new to bother having the attribute. Then use a single purge policy to query all files that have a purge.after set to the appropriate datestamp. You could get way more concise with this mechanism and have a much simpler process. Alec On Fri, Jan 29, 2021 at 4:32 PM Owen Morgan < owen.morgan at motionpicturesolutions.com> wrote: > Hi all, > > Sorry I appear to have missed a load of replies and screwed up the > threading thing when looking online... not used to this email group thing! > Might look at the slack option! > > Just wanted to clarify my general issue a bit: > > So the methodology I've started to implement is per department policy > files where all rules related to managing a specific teams assets are all > in one policy file and then I have fine control over when and how each > departments rule run, when, and potentially (if it mattered) what order etc. > > > So team a want me to manage two folders where in folder 1a all files older > than 4 week days of age are deleted, and in filder 1b all files older than > 8 week days are deleted. > > They now want me to manage a different set of two folders with two > different "thresholds" for how old they need to be in week days before they > delete (ie. I now need additional rules for folders 2a and 2b). > > > The issue is for each scenario there is a different 'offset' required > depending on the day of the week the policy is run to maintian the number > of weekdays required (the 'threshold' is always in weekdays, so intervening > weekends need to be added to take them into account). > > For instance when run on a Monday, if the threshold were 4 weekdays of > age, I need to be deleting files that were created on the previous Tuesday. > Which is 6 days (ie 4 days + 2 weekend days). If the threshold was 8 week > days the threhold in terms of the policy would be 12 (ie 8 plus 2x 2 > weekend days). > > > The only way I was able to work this out in the sql like policy file was > to split the week days into groups where the offset would be the same (so > for 4 week days, Monday through Thursday share the offset of 2 - which then > has to be added to the 4 for the desired result) and then a separate rule > for the Friday. > > > However for every addition of a different threshold I have to write all > new groups to match the days etc.. so the policy ends up with 6 rules but > 150 lines of definition macros.... > > > I was trying to work out if there was a more concise way of, within the > sql like framework, programmatically calculating the day offest the needs > to be added to the threshold to allow a more generic function that could > just automatically work it out.... > > > The algorithm I have recently thought up is to effectively calculate the > difference in weeks between the current run time and the desired deletion > day and multiply it by 2. > > > Psudocode it would be (threshold is the number of week days for the rule, > offset is the number that needs to be added to account for the weekends > between those dates): > > > If current day of month - threshold = sunday, then add 1 to the threshold > value (sundays are de oted as the week start so Saturday would represent > the previous week). > > Offset = (difference between current week and week of (current day of > month - threshold)) x 2 > > A worked example: > > Threshold = 11 week days > Policy run on the 21st Jan which is the week 4 of 2021 > > 21st - 11 days = Sunday 10th > > Therefore need to add 1 to threshold to push the day into the previous > week. New threshold is 12 > > Saturday 9th is in week 2 of 2021 so the offset is week 4 - week 2 = 2 (ie > difference in weeks) x 2 which is 4. > > Add 4 to the original 11 to make 15. > > So for the policy running on the 21st Jan to delete only files older than > 11 week days of age I need to set my rule to be > > Delete where ((Current_date - creation_time) >= interval '15' days > > > Unfortunately, I'm now struggling to implement that algorithm..... it > seems the SQL-ness is very limited and I cant declare variables to use or > stuff.... its a shame as that algorithm is generic so only needs to be > written once and you could have ad many unique rules as you want all with > different thresholds etc... > > Is there another way to get the same results? > > I would prefer to stay in the bounds of the SQL policy rule setup as that > is the framework I have created and started to implement.. > > Hope the above gives more clarity to what Im asking.... sorry if one of > the previous rplies addresses this, if it does I clearly was confused by > the response (I seriously feel like an amateur at this at the moment and am > having to learn all these finer things as I go). > > Thanks in advance, > > Owen. > > Owen Morgan? > Data Wrangler > Motion Picture Solutions Ltd > T: > E: *owen.morgan at motionpicturesolutions.com* > | W: > *motionpicturesolutions.com* > A: Mission Hall, 9?11 North End Road , London , W14 8ST > Motion Picture Solutions Ltd is a company registered in England and Wales > under number 5388229, VAT number 201330482 > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: From anacreo at gmail.com Sat Jan 30 03:07:24 2021 From: anacreo at gmail.com (Alec) Date: Fri, 29 Jan 2021 19:07:24 -0800 Subject: [gpfsug-discuss] gpfsug-discuss Digest, Vol 108, Issue 18 In-Reply-To: References: Message-ID: Also a caution on this... you may want to retain the file's modified time in something like purge.modified... so you can also re-calc for files where purge.modified != file modified time. Else you may purge something too early. Alec On Fri, Jan 29, 2021 at 6:53 PM Alec wrote: > Based on the problem you have. > > I would write an mmfind / mmxarg command that sets a custom attr such as > puge.after, have a ksh/perl/php script that simply makes the necessary > calculations using all the tricks it has... Skip files that already have > the attribute set, or are too new to bother having the attribute. > > Then use a single purge policy to query all files that have a purge.after > set to the appropriate datestamp. > > You could get way more concise with this mechanism and have a much simpler > process. > > Alec > > On Fri, Jan 29, 2021 at 4:32 PM Owen Morgan < > owen.morgan at motionpicturesolutions.com> wrote: > >> Hi all, >> >> Sorry I appear to have missed a load of replies and screwed up the >> threading thing when looking online... not used to this email group thing! >> Might look at the slack option! >> >> Just wanted to clarify my general issue a bit: >> >> So the methodology I've started to implement is per department policy >> files where all rules related to managing a specific teams assets are all >> in one policy file and then I have fine control over when and how each >> departments rule run, when, and potentially (if it mattered) what order etc. >> >> >> So team a want me to manage two folders where in folder 1a all files >> older than 4 week days of age are deleted, and in filder 1b all files older >> than 8 week days are deleted. >> >> They now want me to manage a different set of two folders with two >> different "thresholds" for how old they need to be in week days before they >> delete (ie. I now need additional rules for folders 2a and 2b). >> >> >> The issue is for each scenario there is a different 'offset' required >> depending on the day of the week the policy is run to maintian the number >> of weekdays required (the 'threshold' is always in weekdays, so intervening >> weekends need to be added to take them into account). >> >> For instance when run on a Monday, if the threshold were 4 weekdays of >> age, I need to be deleting files that were created on the previous Tuesday. >> Which is 6 days (ie 4 days + 2 weekend days). If the threshold was 8 week >> days the threhold in terms of the policy would be 12 (ie 8 plus 2x 2 >> weekend days). >> >> >> The only way I was able to work this out in the sql like policy file was >> to split the week days into groups where the offset would be the same (so >> for 4 week days, Monday through Thursday share the offset of 2 - which then >> has to be added to the 4 for the desired result) and then a separate rule >> for the Friday. >> >> >> However for every addition of a different threshold I have to write all >> new groups to match the days etc.. so the policy ends up with 6 rules but >> 150 lines of definition macros.... >> >> >> I was trying to work out if there was a more concise way of, within the >> sql like framework, programmatically calculating the day offest the needs >> to be added to the threshold to allow a more generic function that could >> just automatically work it out.... >> >> >> The algorithm I have recently thought up is to effectively calculate the >> difference in weeks between the current run time and the desired deletion >> day and multiply it by 2. >> >> >> Psudocode it would be (threshold is the number of week days for the rule, >> offset is the number that needs to be added to account for the weekends >> between those dates): >> >> >> If current day of month - threshold = sunday, then add 1 to the threshold >> value (sundays are de oted as the week start so Saturday would represent >> the previous week). >> >> Offset = (difference between current week and week of (current day of >> month - threshold)) x 2 >> >> A worked example: >> >> Threshold = 11 week days >> Policy run on the 21st Jan which is the week 4 of 2021 >> >> 21st - 11 days = Sunday 10th >> >> Therefore need to add 1 to threshold to push the day into the previous >> week. New threshold is 12 >> >> Saturday 9th is in week 2 of 2021 so the offset is week 4 - week 2 = 2 >> (ie difference in weeks) x 2 which is 4. >> >> Add 4 to the original 11 to make 15. >> >> So for the policy running on the 21st Jan to delete only files older than >> 11 week days of age I need to set my rule to be >> >> Delete where ((Current_date - creation_time) >= interval '15' days >> >> >> Unfortunately, I'm now struggling to implement that algorithm..... it >> seems the SQL-ness is very limited and I cant declare variables to use or >> stuff.... its a shame as that algorithm is generic so only needs to be >> written once and you could have ad many unique rules as you want all with >> different thresholds etc... >> >> Is there another way to get the same results? >> >> I would prefer to stay in the bounds of the SQL policy rule setup as that >> is the framework I have created and started to implement.. >> >> Hope the above gives more clarity to what Im asking.... sorry if one of >> the previous rplies addresses this, if it does I clearly was confused by >> the response (I seriously feel like an amateur at this at the moment and am >> having to learn all these finer things as I go). >> >> Thanks in advance, >> >> Owen. >> >> Owen Morgan? >> Data Wrangler >> Motion Picture Solutions Ltd >> T: >> E: *owen.morgan at motionpicturesolutions.com* >> | W: >> *motionpicturesolutions.com* >> A: Mission Hall, 9?11 North End Road , London , W14 8ST >> Motion Picture Solutions Ltd is a company registered in England and Wales >> under number 5388229, VAT number 201330482 >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at spectrumscale.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From owen.morgan at motionpicturesolutions.com Sat Jan 30 03:39:42 2021 From: owen.morgan at motionpicturesolutions.com (Owen Morgan) Date: Sat, 30 Jan 2021 03:39:42 +0000 Subject: [gpfsug-discuss] gpfsug-discuss Digest, Vol 108, Issue 18 Message-ID: Alec, Thank you for your response! I get it now! And, I also understand some of the other peoples responses better as well! Not only does this make sense I also suppose that it shows I have to broaden my 'ideas' as to what tools avaliable can be used more than mmapplypolicy and policy files alone. Using the power of all of them provides more ability than just focusing on one! Just want to thank you, and the other respondents as you've genuinely helped me and I've learnt new things in the process (until I posted the original question I didn't even know mmfind was a thing!) Thanks! Owen. Owen Morgan Data Wrangler Motion Picture Solutions Ltd T: E: owen.morgan at motionpicturesolutions.com | W: motionpicturesolutions.com A: Mission Hall, 9-11 North End Road, London, W14 8ST Motion Picture Solutions Ltd is a company registered in England and Wales under number 5388229, VAT number 201330482 -------------- next part -------------- An HTML attachment was scrubbed... URL: From anacreo at gmail.com Sat Jan 30 04:40:44 2021 From: anacreo at gmail.com (Alec) Date: Fri, 29 Jan 2021 20:40:44 -0800 Subject: [gpfsug-discuss] gpfsug-discuss Digest, Vol 108, Issue 18 In-Reply-To: References: Message-ID: No problem at all. If you can't get mmfind compiled... you can do everything it does via mmapplypolicy. But it is certainly easier with mmfind to add in options dynamically. I have modified the program that mmfind invokes... I forget offhand tr_Polsomething.pl to add functions such as -gpfsCompress_lz4 and -gpfsIsCompressed. Spectrum Scale really has way more power than most people know what to do with... I wish there was a much richer library of scripts available. For instance with mmfind, this saved my bacon a few days ago.. as our 416TB file system had less than 400GB free... mmfind -polArgs "-a 8 -N node1,node2 -B 20" /sasfilesystem -mtime +1800 -name '*.sas7bdat' -size +1G -not -gpfsIsCompressed -gpfsCompress_lz4 (I had to add in my own -gpfsIsCompressed and -gpfsCompress_lz4 features... but that was fairly easy) -- Find any file named '*.sas7bdat' over 1800 days (5 years), larger than 1G, and compress it down using lz4... Farmed it out to my two app nodes 8 threads each... and 14000 files compressed overnight. Next morning I had an extra 5TB of free space.. funny thing is I needed to run it on my app nodes to slow down their write capacity so we didn't get a fatal out of capacity. If you really want to have fun, check out the ksh93 built in time functions pairs nicely with this requirement. Output the day of the week corresponding to the last day of February 2008. $ printf "%(%a)T\n" "final day Feb 2008" Fri Output the date corresponding to the third Wednesday in May 2008. $ printf "%(%D)T\n" "3rd wednesday may 2008" 05/21/08 Output what date it was 4 weeks ago. $ printf "%(%D)T\n" "4 weeks ago" 02/18/08 Read more: https://blog.fpmurphy.com/2008/10/ksh93-date-manipulation.html#ixzz6l0Egm6hp On Fri, Jan 29, 2021 at 7:39 PM Owen Morgan < owen.morgan at motionpicturesolutions.com> wrote: > Alec, > > Thank you for your response! > > I get it now! And, I also understand some of the other peoples responses > better as well! > > Not only does this make sense I also suppose that it shows I have to > broaden my 'ideas' as to what tools avaliable can be used more than > mmapplypolicy and policy files alone. Using the power of all of them > provides more ability than just focusing on one! > > Just want to thank you, and the other respondents as you've genuinely > helped me and I've learnt new things in the process (until I posted the > original question I didn't even know mmfind was a thing!) > > Thanks! > > Owen. > > Owen Morgan? > Data Wrangler > Motion Picture Solutions Ltd > T: > E: *owen.morgan at motionpicturesolutions.com* > | W: > *motionpicturesolutions.com* > A: Mission Hall, 9?11 North End Road , London , W14 8ST > Motion Picture Solutions Ltd is a company registered in England and Wales > under number 5388229, VAT number 201330482 > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: From Walter.Sklenka at EDV-Design.at Sat Jan 30 05:45:47 2021 From: Walter.Sklenka at EDV-Design.at (Walter Sklenka) Date: Sat, 30 Jan 2021 05:45:47 +0000 Subject: [gpfsug-discuss] OPA HFI and Mellanox HCA on same NSD Server with only ib rdma enabled Message-ID: <14218088180e4613847984c44e0321d8@Mail.EDVDesign.cloudia> Hi! Is it possible to mix OPAcards and Infininiband HCAs on the same server? In the faq https://www.ibm.com/support/knowledgecenter/en/STXKQY/gpfsclustersfaq.html#rdma They talk about RDMA : "RDMA is NOT supported on a node when both Mellanox HCAs and Intel Omni-Path HFIs are ENABLED for RDMA." So do I understand right: When we do NOT enable the opa interface we can still enable IB ? The reason I ask is, that we have a gpfs cluster of 6 NSD Servers (wih access to storage) with opa interfaces which provide access to remote cluster also via OPA. A new cluster with HDR interfaces will be implemented soon They shell have access to the same filesystems When we add HDR interfaces to NSD servers and enable rdma on this network while disabling rdma on opa we would accept the worse performance via opa . We hope that this provides still better perf and less technical overhead than using routers Or am I totally wrong? Thank you very much and keep healthy! Best regards Walter Mit freundlichen Gr??en Walter Sklenka Technical Consultant EDV-Design Informationstechnologie GmbH Giefinggasse 6/1/2, A-1210 Wien Tel: +43 1 29 22 165-31 Fax: +43 1 29 22 165-90 E-Mail: sklenka at edv-design.at Internet: www.edv-design.at -------------- next part -------------- An HTML attachment was scrubbed... URL: From jonathan.buzzard at strath.ac.uk Sat Jan 30 10:29:39 2021 From: jonathan.buzzard at strath.ac.uk (Jonathan Buzzard) Date: Sat, 30 Jan 2021 10:29:39 +0000 Subject: [gpfsug-discuss] gpfsug-discuss Digest, Vol 108, Issue 18 In-Reply-To: References: Message-ID: On 30/01/2021 00:31, Owen Morgan wrote: [SNIP] > > I would prefer to stay in the bounds of the SQL policy rule setup as > that is the framework I have created and started to implement.. > In general SQL is Turing complete. Though I have not checked in detail I believe the SQL of the policy engine is too. I would also note that SQL has a whole bunch of time/date functions. So something like define(offset, 4) define(day, DAYOFWEEK(CURRENT_TIMESTAMP)) define(age,(DAYS(CURRENT_TIMESTAMP)-DAYS(ACCESS_TIME))) define(workingdays, CASE WHEN day=1 THEN offest+1 WHEN day=6 THEN offset WHEN day=7 THEN offset+1 ELSE offset+2 ) /* delete all files from files older than 4 working days */ RULE purge4 DELETE WHERE (age>workingdays) FOR FILESET dummies JAB. -- Jonathan A. Buzzard Tel: +44141-5483420 HPC System Administrator, ARCHIE-WeSt. University of Strathclyde, John Anderson Building, Glasgow. G4 0NG From giovanni.bracco at enea.it Sat Jan 30 17:07:43 2021 From: giovanni.bracco at enea.it (Giovanni Bracco) Date: Sat, 30 Jan 2021 18:07:43 +0100 Subject: [gpfsug-discuss] OPA HFI and Mellanox HCA on same NSD Server with only ib rdma enabled In-Reply-To: <14218088180e4613847984c44e0321d8@Mail.EDVDesign.cloudia> References: <14218088180e4613847984c44e0321d8@Mail.EDVDesign.cloudia> Message-ID: <3bb0f4ca-f6ee-6013-45a0-e783470089f0@enea.it> In our HPC infrastructure we have 6 NSD server, running CentOS 7.4, each of them with with 1 Intel QDR HCA to a QDR Cluster (now 100 nodes SandyBridge cpu it was 300 nodes CentOS 6.5), 1 OPA HCA to the main OPA Cluster (400 nodes Skylake cpu, CentOS 7.3) and 1 Mellanox FDR to DDN storages and it works nicely using RDMA since 2018. GPFS 4.2.3-19. See F. Iannone et al., "CRESCO ENEA HPC clusters: a working example of a multifabric GPFS Spectrum Scale layout," 2019 International Conference on High Performance Computing & Simulation (HPCS), Dublin, Ireland, 2019, pp. 1051-1052, doi: 10.1109/HPCS48598.2019.918813 When setting up the system the main trick has been: just use CentOS drivers and do not install OFED We do not use IPoIB. Giovanni On 30/01/21 06:45, Walter Sklenka wrote: > Hi! > > Is it possible to mix OPAcards and Infininiband HCAs on the same server? > > In the faq > https://www.ibm.com/support/knowledgecenter/en/STXKQY/gpfsclustersfaq.html#rdma > > > They talk about RDMA : > > ?RDMA is NOT ?supported on a node when both Mellanox HCAs and Intel > Omni-Path HFIs are ENABLED for RDMA.? > > So do I understand right: When we do NOT enable ?the opa interface we > can still enable IB ? > > The reason I ask ?is, that we have a gpfs cluster of 6 NSD Servers ?(wih > access to storage) ?with opa interfaces which provide access to remote > cluster ?also via OPA. > > A new cluster with HDR interfaces will be implemented soon > > They shell have access to the same filesystems > > When we add HDR interfaces to? NSD servers? and enable rdma on this > network ?while disabling rdma on opa we would accept the worse > performance via opa . We hope that this provides ?still better perf and > less technical overhead ?than using routers > > Or am I totally wrong? > > Thank you very much and keep healthy! > > Best regards > > Walter > > Mit freundlichen Gr??en > */Walter Sklenka/* > */Technical Consultant/* > > EDV-Design Informationstechnologie GmbH > Giefinggasse 6/1/2, A-1210 Wien > Tel: +43 1 29 22 165-31 > Fax: +43 1 29 22 165-90 > E-Mail: sklenka at edv-design.at > Internet: www.edv-design.at > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -- Giovanni Bracco phone +39 351 8804788 E-mail giovanni.bracco at enea.it WWW http://www.afs.enea.it/bracco From Walter.Sklenka at EDV-Design.at Sat Jan 30 20:01:51 2021 From: Walter.Sklenka at EDV-Design.at (Walter Sklenka) Date: Sat, 30 Jan 2021 20:01:51 +0000 Subject: [gpfsug-discuss] OPA HFI and Mellanox HCA on same NSD Server with only ib rdma enabled In-Reply-To: <3bb0f4ca-f6ee-6013-45a0-e783470089f0@enea.it> References: <14218088180e4613847984c44e0321d8@Mail.EDVDesign.cloudia> <3bb0f4ca-f6ee-6013-45a0-e783470089f0@enea.it> Message-ID: Hi Giovanni! Thats great! Many thanks for your fast and detailed answer!!!! So this is the way we will go too! Have a nice weekend and keep healthy! Best regards Walter -----Original Message----- From: Giovanni Bracco Sent: Samstag, 30. J?nner 2021 18:08 To: gpfsug main discussion list ; Walter Sklenka Subject: Re: [gpfsug-discuss] OPA HFI and Mellanox HCA on same NSD Server with only ib rdma enabled In our HPC infrastructure we have 6 NSD server, running CentOS 7.4, each of them with with 1 Intel QDR HCA to a QDR Cluster (now 100 nodes SandyBridge cpu it was 300 nodes CentOS 6.5), 1 OPA HCA to the main OPA Cluster (400 nodes Skylake cpu, CentOS 7.3) and 1 Mellanox FDR to DDN storages and it works nicely using RDMA since 2018. GPFS 4.2.3-19. See F. Iannone et al., "CRESCO ENEA HPC clusters: a working example of a multifabric GPFS Spectrum Scale layout," 2019 International Conference on High Performance Computing & Simulation (HPCS), Dublin, Ireland, 2019, pp. 1051-1052, doi: 10.1109/HPCS48598.2019.918813 When setting up the system the main trick has been: just use CentOS drivers and do not install OFED We do not use IPoIB. Giovanni On 30/01/21 06:45, Walter Sklenka wrote: > Hi! > > Is it possible to mix OPAcards and Infininiband HCAs on the same server? > > In the faq > https://www.ibm.com/support/knowledgecenter/en/STXKQY/gpfsclustersfaq. > html#rdma > > > They talk about RDMA : > > "RDMA is NOT ?supported on a node when both Mellanox HCAs and Intel > Omni-Path HFIs are ENABLED for RDMA." > > So do I understand right: When we do NOT enable ?the opa interface we > can still enable IB ? > > The reason I ask ?is, that we have a gpfs cluster of 6 NSD Servers ? > (wih access to storage) ?with opa interfaces which provide access to > remote cluster ?also via OPA. > > A new cluster with HDR interfaces will be implemented soon > > They shell have access to the same filesystems > > When we add HDR interfaces to? NSD servers? and enable rdma on this > network ?while disabling rdma on opa we would accept the worse > performance via opa . We hope that this provides ?still better perf > and less technical overhead ?than using routers > > Or am I totally wrong? > > Thank you very much and keep healthy! > > Best regards > > Walter > > Mit freundlichen Gr??en > */Walter Sklenka/* > */Technical Consultant/* > > EDV-Design Informationstechnologie GmbH Giefinggasse 6/1/2, A-1210 > Wien > Tel: +43 1 29 22 165-31 > Fax: +43 1 29 22 165-90 > E-Mail: sklenka at edv-design.at > Internet: www.edv-design.at > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -- Giovanni Bracco phone +39 351 8804788 E-mail giovanni.bracco at enea.it WWW http://www.afs.enea.it/bracco From S.J.Thompson at bham.ac.uk Mon Jan 4 12:21:05 2021 From: S.J.Thompson at bham.ac.uk (Simon Thompson) Date: Mon, 4 Jan 2021 12:21:05 +0000 Subject: [gpfsug-discuss] Spectrum Protect and disk pools Message-ID: <36E75FAC-F5D8-45B1-B2AB-EAF4922A0DC6@bham.ac.uk> Hi All, We use Spectrum Protect (TSM) to backup our Scale filesystems. We have the backup setup to use multiple nodes with the PROXY node function turned on (and to some extent also use multiple target servers). This all feels like it is nice and parallel, on the TSM servers, we have disk pools for any ?small? files to drop into (I think we set anything smaller than 20GB) to prevent lots of small files stalling tape drive writes. Whilst digging into why we have slow backups at times, we found that the disk pool empties with a single thread (one drive). And looking at the docs: https://www.ibm.com/support/pages/concurrent-migration-processes-and-constraints This implies that we are limited to the number of client nodes stored in the pool. i.e. because we have one node and PROXY nodes, we are essentially limited to a single thread streaming out of the disk pool when full. Have we understood this correctly as if so, this appears to make the whole purpose of PROXY nodes sort of pointless if you have lots of small files. Or is there some other setting we should be looking at to increase the number of threads when the disk pool is emptying? (The disk pool itself has Migration Processes: 6) Thanks Simon -------------- next part -------------- An HTML attachment was scrubbed... URL: From jordi.caubet at es.ibm.com Mon Jan 4 13:36:40 2021 From: jordi.caubet at es.ibm.com (Jordi Caubet Serrabou) Date: Mon, 4 Jan 2021 13:36:40 +0000 Subject: [gpfsug-discuss] Spectrum Protect and disk pools In-Reply-To: <36E75FAC-F5D8-45B1-B2AB-EAF4922A0DC6@bham.ac.uk> References: <36E75FAC-F5D8-45B1-B2AB-EAF4922A0DC6@bham.ac.uk> Message-ID: An HTML attachment was scrubbed... URL: From scale at us.ibm.com Mon Jan 4 13:37:50 2021 From: scale at us.ibm.com (IBM Spectrum Scale) Date: Mon, 4 Jan 2021 19:07:50 +0530 Subject: [gpfsug-discuss] Spectrum Protect and disk pools In-Reply-To: <36E75FAC-F5D8-45B1-B2AB-EAF4922A0DC6@bham.ac.uk> References: <36E75FAC-F5D8-45B1-B2AB-EAF4922A0DC6@bham.ac.uk> Message-ID: Hi Diane, Can you help Simon with the below query. Or else would you know who would be the best person to be contacted here. Regards, The Spectrum Scale (GPFS) team ------------------------------------------------------------------------------------------------------------------ If you feel that your question can benefit other users of Spectrum Scale (GPFS), then please post it to the public IBM developerWroks Forum at https://www.ibm.com/developerworks/community/forums/html/forum?id=11111111-0000-0000-0000-000000000479. If your query concerns a potential software error in Spectrum Scale (GPFS) and you have an IBM software maintenance contract please contact 1-800-237-5511 in the United States or your local IBM Service Center in other countries. The forum is informally monitored as time permits and should not be used for priority messages to the Spectrum Scale (GPFS) team. From: Simon Thompson To: "gpfsug-discuss at spectrumscale.org" Date: 04-01-2021 05.51 PM Subject: [EXTERNAL] [gpfsug-discuss] Spectrum Protect and disk pools Sent by: gpfsug-discuss-bounces at spectrumscale.org Hi All, We use Spectrum Protect (TSM) to backup our Scale filesystems. We have the backup setup to use multiple nodes with the PROXY node function turned on (and to some extent also use multiple target servers). This all feels like it is nice and parallel, on the TSM servers, we have disk pools for any ?small? files to drop into (I think we set anything smaller than 20GB) to prevent lots of small files stalling tape drive writes. Whilst digging into why we have slow backups at times, we found that the disk pool empties with a single thread (one drive). And looking at the docs: https://www.ibm.com/support/pages/concurrent-migration-processes-and-constraints This implies that we are limited to the number of client nodes stored in the pool. i.e. because we have one node and PROXY nodes, we are essentially limited to a single thread streaming out of the disk pool when full. Have we understood this correctly as if so, this appears to make the whole purpose of PROXY nodes sort of pointless if you have lots of small files. Or is there some other setting we should be looking at to increase the number of threads when the disk pool is emptying? (The disk pool itself has Migration Processes: 6) Thanks Simon_______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: From S.J.Thompson at bham.ac.uk Mon Jan 4 13:52:05 2021 From: S.J.Thompson at bham.ac.uk (Simon Thompson) Date: Mon, 4 Jan 2021 13:52:05 +0000 Subject: [gpfsug-discuss] Spectrum Protect and disk pools In-Reply-To: References: <36E75FAC-F5D8-45B1-B2AB-EAF4922A0DC6@bham.ac.uk> Message-ID: <62F6E92A-31B4-45BE-9FF7-E6DBE0F7526B@bham.ac.uk> Hi Jordi, Thanks, yes it is a disk pool: Protect: TSM01>q stg BACKUP_DISK f=d Storage Pool Name: BACKUP_DISK Storage Pool Type: Primary Device Class Name: DISK Storage Type: DEVCLASS ? Next Storage Pool: BACKUP_ONSTAPE So it is a disk pool ? though it is made up of multiple disk files ? /tsmdisk/stgpool/tsmins- BACKUP_DISK DISK 200.0 G 0.0 On-Line t3/bkup_diskvol01.dsm /tsmdisk/stgpool/tsmins- BACKUP_DISK DISK 200.0 G 0.0 On-Line t3/bkup_diskvol02.dsm /tsmdisk/stgpool/tsmins- BACKUP_DISK DISK 200.0 G 0.0 On-Line t3/bkup_diskvol03.dsm Will look into the FILE pool as this sounds like it might be less single threaded than now ? Thanks Simon From: on behalf of "jordi.caubet at es.ibm.com" Reply to: "gpfsug-discuss at spectrumscale.org" Date: Monday, 4 January 2021 at 13:36 To: "gpfsug-discuss at spectrumscale.org" Subject: Re: [gpfsug-discuss] Spectrum Protect and disk pools Simon, which kind of storage pool are you using, DISK or FILE ? I understand DISK pool from your mail. DISK pool does not behave the same as FILE pool. DISK pool is limited by the number of nodes or MIGProcess setting (the minimum of both) as the document states. Using proxy helps you backup in parallel from multiple nodes to the stg pool but from Protect perspective it is a single node. Even multiple nodes are sending they run "asnodename" so single node from Protect perspective. If using FILE pool, you can define the number of volumes within the FILE pool and when migrating to tape, it will migrate each volume in parallel with the limit of MIGProcess setting. So it would be the minimum of #volumes and MIGProcess value. I know more deep technical skills in Protect are on this mailing list so feel free to add something or correct me. Best Regards, -- Jordi Caubet Serrabou IBM Storage Client Technical Specialist (IBM Spain) Ext. Phone: (+34) 679.79.17.84 (internal 55834) E-mail: jordi.caubet at es.ibm.com -----gpfsug-discuss-bounces at spectrumscale.org wrote: ----- To: "gpfsug-discuss at spectrumscale.org" > From: Simon Thompson Sent by: gpfsug-discuss-bounces at spectrumscale.org Date: 01/04/2021 01:21PM Subject: [EXTERNAL] [gpfsug-discuss] Spectrum Protect and disk pools Hi All, We use Spectrum Protect (TSM) to backup our Scale filesystems. We have the backup setup to use multiple nodes with the PROXY node function turned on (and to some extent also use multiple target servers). This all feels like it is nice and parallel, on the TSM servers, we have disk pools for any ?small? files to drop into (I think we set anything smaller than 20GB) to prevent lots of small files stalling tape drive writes. Whilst digging into why we have slow backups at times, we found that the disk pool empties with a single thread (one drive). And looking at the docs: https://www.ibm.com/support/pages/concurrent-migration-processes-and-constraints This implies that we are limited to the number of client nodes stored in the pool. i.e. because we have one node and PROXY nodes, we are essentially limited to a single thread streaming out of the disk pool when full. Have we understood this correctly as if so, this appears to make the whole purpose of PROXY nodes sort of pointless if you have lots of small files. Or is there some other setting we should be looking at to increase the number of threads when the disk pool is emptying? (The disk pool itself has Migration Processes: 6) Thanks Simon _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss Salvo indicado de otro modo m?s arriba / Unless stated otherwise above: International Business Machines, S.A. Santa Hortensia, 26-28, 28002 Madrid Registro Mercantil de Madrid; Folio 1; Tomo 1525; Hoja M-28146 CIF A28-010791 -------------- next part -------------- An HTML attachment was scrubbed... URL: From skylar2 at uw.edu Mon Jan 4 15:27:31 2021 From: skylar2 at uw.edu (Skylar Thompson) Date: Mon, 4 Jan 2021 07:27:31 -0800 Subject: [gpfsug-discuss] Spectrum Protect and disk pools In-Reply-To: <36E75FAC-F5D8-45B1-B2AB-EAF4922A0DC6@bham.ac.uk> References: <36E75FAC-F5D8-45B1-B2AB-EAF4922A0DC6@bham.ac.uk> Message-ID: <20210104152731.mwgcj2caojjalony@thargelion> I think the collocation settings of the target pool for the migration come into play as well. If you have multiple filespaces associated with a node and collocation is set to FILESPACE, then you should be able to get one migration process per filespace rather than one per node/collocation group. On Mon, Jan 04, 2021 at 12:21:05PM +0000, Simon Thompson wrote: > Hi All, > > We use Spectrum Protect (TSM) to backup our Scale filesystems. We have the backup setup to use multiple nodes with the PROXY node function turned on (and to some extent also use multiple target servers). > > This all feels like it is nice and parallel, on the TSM servers, we have disk pools for any ???small??? files to drop into (I think we set anything smaller than 20GB) to prevent lots of small files stalling tape drive writes. > > Whilst digging into why we have slow backups at times, we found that the disk pool empties with a single thread (one drive). And looking at the docs: > https://www.ibm.com/support/pages/concurrent-migration-processes-and-constraints > > This implies that we are limited to the number of client nodes stored in the pool. i.e. because we have one node and PROXY nodes, we are essentially limited to a single thread streaming out of the disk pool when full. > > Have we understood this correctly as if so, this appears to make the whole purpose of PROXY nodes sort of pointless if you have lots of small files. Or is there some other setting we should be looking at to increase the number of threads when the disk pool is emptying? (The disk pool itself has Migration Processes: 6) > > Thanks > > Simon > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss -- -- Skylar Thompson (skylar2 at u.washington.edu) -- Genome Sciences Department (UW Medicine), System Administrator -- Foege Building S046, (206)-685-7354 -- Pronouns: He/Him/His From jonathan.buzzard at strath.ac.uk Mon Jan 4 16:24:25 2021 From: jonathan.buzzard at strath.ac.uk (Jonathan Buzzard) Date: Mon, 4 Jan 2021 16:24:25 +0000 Subject: [gpfsug-discuss] Spectrum Protect and disk pools In-Reply-To: <36E75FAC-F5D8-45B1-B2AB-EAF4922A0DC6@bham.ac.uk> References: <36E75FAC-F5D8-45B1-B2AB-EAF4922A0DC6@bham.ac.uk> Message-ID: On 04/01/2021 12:21, Simon Thompson wrote: > CAUTION: This email originated outside the University. Check before > clicking links or attachments. > > Hi All, > > We use Spectrum Protect (TSM) to backup our Scale filesystems. We have > the backup setup to use multiple nodes with the PROXY node function > turned on (and to some extent also use multiple target servers). > > This all feels like it is nice and parallel, on the TSM servers, we have > disk pools for any ?small? files to drop into (I think we set anything > smaller than 20GB) to prevent lots of small files stalling tape drive > writes. > > Whilst digging into why we have slow backups at times, we found that the > disk pool empties with a single thread (one drive). And looking at the docs: > > https://www.ibm.com/support/pages/concurrent-migration-processes-and-constraints > > > This implies that we are limited to the number of client nodes stored in > the pool. i.e. because we have one node and PROXY nodes, we are > essentially limited to a single thread streaming out of the disk pool > when full. > > Have we understood this correctly as if so, this appears to make the > whole purpose of PROXY nodes sort of pointless if you have lots of small > files. Or is there some other setting we should be looking at to > increase the number of threads when the disk pool is emptying? (The disk > pool itself has Migration Processes: 6) > I have found in the past that the speed of the disk pool can make a large difference. That is a RAID5/6 of 7200RPM drives was inadequate and there was a significant boost in backup in moving to 15k RPM disks. Also your DB really needs to be on SSD, again this affords a large boost in backup speed. The other rule of thumb I have always worked with is that the disk pool should be sized for the daily churn. That is your backup should disappear into the disk pool and then when the backup is finished you can then spit the disk pool out to the primary and copy pools. If you are needing to drain the disk pool mid backup your disk pool is too small. TL;DR your TSM disks (DB and disk pool) need to be some of the best storage you have to maximize backup speed. JAB. -- Jonathan A. Buzzard Tel: +44141-5483420 HPC System Administrator, ARCHIE-WeSt. University of Strathclyde, John Anderson Building, Glasgow. G4 0NG From Alec.Effrat at wellsfargo.com Mon Jan 4 17:30:39 2021 From: Alec.Effrat at wellsfargo.com (Alec.Effrat at wellsfargo.com) Date: Mon, 4 Jan 2021 17:30:39 +0000 Subject: [gpfsug-discuss] Spectrum Protect and disk pools In-Reply-To: References: <36E75FAC-F5D8-45B1-B2AB-EAF4922A0DC6@bham.ac.uk> Message-ID: <151a06b1b52545fca2f92d3a5e3ce943@wellsfargo.com> I am not sure what platform you run on but for AIX with a fully virtualized LPAR we needed to enable "mtu_bypass" on the en device that was used for our backups. Prior to this setting we could not exceed 250 MB/s on our 10G interface, after that we run at 1.6GB/s solid per 10G virtual adapter, fueled by Spectrum Scale and a different backup engine. We did lose a lot of sleep trying to figure this one out, but are very pleased with the end result. Alec Effrat SAS Lead, AVP Business Intelligence Competency Center SAS Administration Cell?949-246-7713 alec.effrat at wellsfargo.com -----Original Message----- From: gpfsug-discuss-bounces at spectrumscale.org On Behalf Of Jonathan Buzzard Sent: Monday, January 4, 2021 8:24 AM To: gpfsug-discuss at spectrumscale.org Subject: Re: [gpfsug-discuss] Spectrum Protect and disk pools On 04/01/2021 12:21, Simon Thompson wrote: > CAUTION: This email originated outside the University. Check before > clicking links or attachments. > > Hi All, > > We use Spectrum Protect (TSM) to backup our Scale filesystems. We have > the backup setup to use multiple nodes with the PROXY node function > turned on (and to some extent also use multiple target servers). > > This all feels like it is nice and parallel, on the TSM servers, we > have disk pools for any ?small? files to drop into (I think we set > anything smaller than 20GB) to prevent lots of small files stalling > tape drive writes. > > Whilst digging into why we have slow backups at times, we found that > the disk pool empties with a single thread (one drive). And looking at the docs: > > https://www.ibm.com/support/pages/concurrent-migration-processes-and-c > onstraints > .ibm.com%2Fsupport%2Fpages%2Fconcurrent-migration-processes-and-constr > aints&data=04%7C01%7Cjonathan.buzzard%40strath.ac.uk%7C99158004dad04c7 > 9a58808d8b0ab39b8%7C631e0763153347eba5cd0457bee5944e%7C0%7C0%7C6374535 > 96745356438%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMz > IiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=ZPUkTB5Vy5S0%2BL67neMp4C > 1lxIuphMS5HuTkBYcmcMU%3D&reserved=0> > > This implies that we are limited to the number of client nodes stored > in the pool. i.e. because we have one node and PROXY nodes, we are > essentially limited to a single thread streaming out of the disk pool > when full. > > Have we understood this correctly as if so, this appears to make the > whole purpose of PROXY nodes sort of pointless if you have lots of > small files. Or is there some other setting we should be looking at to > increase the number of threads when the disk pool is emptying? (The > disk pool itself has Migration Processes: 6) > I have found in the past that the speed of the disk pool can make a large difference. That is a RAID5/6 of 7200RPM drives was inadequate and there was a significant boost in backup in moving to 15k RPM disks. Also your DB really needs to be on SSD, again this affords a large boost in backup speed. The other rule of thumb I have always worked with is that the disk pool should be sized for the daily churn. That is your backup should disappear into the disk pool and then when the backup is finished you can then spit the disk pool out to the primary and copy pools. If you are needing to drain the disk pool mid backup your disk pool is too small. TL;DR your TSM disks (DB and disk pool) need to be some of the best storage you have to maximize backup speed. JAB. -- Jonathan A. Buzzard Tel: +44141-5483420 HPC System Administrator, ARCHIE-WeSt. University of Strathclyde, John Anderson Building, Glasgow. G4 0NG _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss From xhejtman at ics.muni.cz Wed Jan 6 17:46:58 2021 From: xhejtman at ics.muni.cz (Lukas Hejtmanek) Date: Wed, 6 Jan 2021 18:46:58 +0100 Subject: [gpfsug-discuss] S3 API and POSIX rights Message-ID: <20210106174658.GA1764842@ics.muni.cz> Hello, we are playing a bit with Spectrum Scale OBJ storage. We were able to get working unified access for NFS and OBJ but only if we use swift clients. If we use s3 client for OBJ, all objects are owned by swift user and large objects are multiparted wich is not suitable for unified access. Should the unified access work also for S3 API? Or only swift is supported currently? -- Luk?? Hejtm?nek Linux Administrator only because Full Time Multitasking Ninja is not an official job title From jcatana at gmail.com Wed Jan 6 22:30:08 2021 From: jcatana at gmail.com (Josh Catana) Date: Wed, 6 Jan 2021 17:30:08 -0500 Subject: [gpfsug-discuss] S3 API and POSIX rights In-Reply-To: <20210106174658.GA1764842@ics.muni.cz> References: <20210106174658.GA1764842@ics.muni.cz> Message-ID: Swift and s3 are both object storage, but different protocol implementation. Not compatible. I use minio to share data for s3 compatibility. On Wed, Jan 6, 2021, 12:52 PM Lukas Hejtmanek wrote: > Hello, > > we are playing a bit with Spectrum Scale OBJ storage. We were able to get > working unified access for NFS and OBJ but only if we use swift clients. > If we > use s3 client for OBJ, all objects are owned by swift user and large > objects > are multiparted wich is not suitable for unified access. > > Should the unified access work also for S3 API? Or only swift is supported > currently? > > -- > Luk?? Hejtm?nek > > Linux Administrator only because > Full Time Multitasking Ninja > is not an official job title > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: From brnelson at us.ibm.com Thu Jan 7 00:18:28 2021 From: brnelson at us.ibm.com (Brian Nelson) Date: Wed, 6 Jan 2021 18:18:28 -0600 Subject: [gpfsug-discuss] S3 API and POSIX rights Message-ID: Unfortunately, these features are not supported. Multipart uploads are not supported with Unified File and Object for the reason you mentioned, as the separate parts of the object are written as separate files. And because the S3 and Swift authentication is handled differently, the user is not passed through in the S3 path. Without the user information, the Unified File and Object layer is not able to set the file ownership to the external authentication user. Ownership is set to the default of 'swift' in that case. -Brian =================================== Brian Nelson 512-286-7735 (T/L) 363-7735 IBM Spectrum Scale brnelson at us.ibm.com On Wed, Jan 6, 2021, 12:52 PM Lukas Hejtmanek wrote: > Hello, > > we are playing a bit with Spectrum Scale OBJ storage. We were able to get > working unified access for NFS and OBJ but only if we use swift clients. > If we > use s3 client for OBJ, all objects are owned by swift user and large > objects > are multiparted wich is not suitable for unified access. > > Should the unified access work also for S3 API? Or only swift is supported > currently? > > -- > Luk?? Hejtm?nek > > Linux Administrator only because > Full Time Multitasking Ninja > is not an official job title > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: From scale at us.ibm.com Thu Jan 7 08:36:25 2021 From: scale at us.ibm.com (IBM Spectrum Scale) Date: Thu, 7 Jan 2021 14:06:25 +0530 Subject: [gpfsug-discuss] S3 API and POSIX rights In-Reply-To: <20210106174658.GA1764842@ics.muni.cz> References: <20210106174658.GA1764842@ics.muni.cz> Message-ID: Hi Brian, Can you please answer the below S3 API related query. Or would you know who would be the right person to forward this to. Regards, The Spectrum Scale (GPFS) team ------------------------------------------------------------------------------------------------------------------ If you feel that your question can benefit other users of Spectrum Scale (GPFS), then please post it to the public IBM developerWroks Forum at https://www.ibm.com/developerworks/community/forums/html/forum?id=11111111-0000-0000-0000-000000000479. If your query concerns a potential software error in Spectrum Scale (GPFS) and you have an IBM software maintenance contract please contact 1-800-237-5511 in the United States or your local IBM Service Center in other countries. The forum is informally monitored as time permits and should not be used for priority messages to the Spectrum Scale (GPFS) team. From: Lukas Hejtmanek To: gpfsug-discuss at spectrumscale.org Date: 06-01-2021 11.22 PM Subject: [EXTERNAL] [gpfsug-discuss] S3 API and POSIX rights Sent by: gpfsug-discuss-bounces at spectrumscale.org Hello, we are playing a bit with Spectrum Scale OBJ storage. We were able to get working unified access for NFS and OBJ but only if we use swift clients. If we use s3 client for OBJ, all objects are owned by swift user and large objects are multiparted wich is not suitable for unified access. Should the unified access work also for S3 API? Or only swift is supported currently? -- Luk?? Hejtm?nek Linux Administrator only because Full Time Multitasking Ninja is not an official job title _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: From juergen.hannappel at desy.de Fri Jan 8 12:27:27 2021 From: juergen.hannappel at desy.de (Hannappel, Juergen) Date: Fri, 8 Jan 2021 13:27:27 +0100 (CET) Subject: [gpfsug-discuss] GPFS_CLEAR_FILE_CACHE fails on Read-Only FS Message-ID: <933204168.28588491.1610108847123.JavaMail.zimbra@desy.de> Hi, in a program after reading a file I did a gpfs_fcntl() with GPFS_CLEAR_FILE_CACHE to get rid of the now unused pages in the file cache. That works fine, but if the file system is read-only (in a remote cluster) this fails with a message that the file system is read only. Is that expected behaviour or an unexpected feature (aka bug)? -- Dr. J?rgen Hannappel DESY/IT Tel. : +49 40 8998-4616 -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 1711 bytes Desc: S/MIME Cryptographic Signature URL: From scale at us.ibm.com Fri Jan 8 13:42:25 2021 From: scale at us.ibm.com (IBM Spectrum Scale) Date: Fri, 8 Jan 2021 08:42:25 -0500 Subject: [gpfsug-discuss] GPFS_CLEAR_FILE_CACHE fails on Read-Only FS In-Reply-To: <933204168.28588491.1610108847123.JavaMail.zimbra@desy.de> References: <933204168.28588491.1610108847123.JavaMail.zimbra@desy.de> Message-ID: It seems like a defect. Could you please open a help case and if possible provide a sample program and the steps you took to create the problem? Also, please provide the version of Scale you are using where you see this behavior. This should result in a defect being opened against GPFS which will then be addressed by a member of the development team. Regards, The Spectrum Scale (GPFS) team ------------------------------------------------------------------------------------------------------------------ If you feel that your question can benefit other users of Spectrum Scale (GPFS), then please post it to the public IBM developerWroks Forum at https://www.ibm.com/developerworks/community/forums/html/forum?id=11111111-0000-0000-0000-000000000479 . If your query concerns a potential software error in Spectrum Scale (GPFS) and you have an IBM software maintenance contract please contact 1-800-237-5511 in the United States or your local IBM Service Center in other countries. The forum is informally monitored as time permits and should not be used for priority messages to the Spectrum Scale (GPFS) team. From: "Hannappel, Juergen" To: gpfsug main discussion list Date: 01/08/2021 07:33 AM Subject: [EXTERNAL] [gpfsug-discuss] GPFS_CLEAR_FILE_CACHE fails on Read-Only FS Sent by: gpfsug-discuss-bounces at spectrumscale.org Hi, in a program after reading a file I did a gpfs_fcntl() with GPFS_CLEAR_FILE_CACHE to get rid of the now unused pages in the file cache. That works fine, but if the file system is read-only (in a remote cluster) this fails with a message that the file system is read only. Is that expected behaviour or an unexpected feature (aka bug)? -- Dr. J?rgen Hannappel DESY/IT Tel. : +49 40 8998-4616 [attachment "smime.p7s" deleted by Frederick Stock/Pittsburgh/IBM] _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From hoov at us.ibm.com Mon Jan 11 17:32:48 2021 From: hoov at us.ibm.com (Theodore Hoover Jr) Date: Mon, 11 Jan 2021 17:32:48 +0000 Subject: [gpfsug-discuss] Spectrum Scale Cloud Online Survey In-Reply-To: References: Message-ID: An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: Image.16082105961220.jpg Type: image/jpeg Size: 6839 bytes Desc: not available URL: From Philipp.Rehs at uni-duesseldorf.de Mon Jan 11 18:53:29 2021 From: Philipp.Rehs at uni-duesseldorf.de (Rehs, Philipp Helo) Date: Mon, 11 Jan 2021 18:53:29 +0000 Subject: [gpfsug-discuss] GPFS GUI does not create snapshots Message-ID: <63152da6-4464-4497-b4d2-11f8d2260614@email.android.com> Hello, we have gpfs GUI on 4.2.3.22 running and it suddenly stopped to create new snapshots from schedule. I can manually create snapshots but none is created from schedule. How can I debug it? Kind regards Philipp Rehs -------------- next part -------------- An HTML attachment was scrubbed... URL: From S.J.Thompson at bham.ac.uk Mon Jan 11 19:07:44 2021 From: S.J.Thompson at bham.ac.uk (Simon Thompson) Date: Mon, 11 Jan 2021 19:07:44 +0000 Subject: [gpfsug-discuss] GPFS GUI does not create snapshots Message-ID: Have you tried restarting the gpfs.gui service? At some point in the past we have seen similar and restarting the GUI made it start again. Simon From: on behalf of "Philipp.Rehs at uni-duesseldorf.de" Reply to: "gpfsug-discuss at spectrumscale.org" Date: Monday, 11 January 2021 at 19:03 To: "gpfsug-discuss at spectrumscale.org" Subject: [gpfsug-discuss] GPFS GUI does not create snapshots Hello, we have gpfs GUI on 4.2.3.22 running and it suddenly stopped to create new snapshots from schedule. I can manually create snapshots but none is created from schedule. How can I debug it? Kind regards Philipp Rehs -------------- next part -------------- An HTML attachment was scrubbed... URL: From Philipp.Rehs at uni-duesseldorf.de Mon Jan 11 19:16:52 2021 From: Philipp.Rehs at uni-duesseldorf.de (Rehs, Philipp Helo) Date: Mon, 11 Jan 2021 19:16:52 +0000 Subject: [gpfsug-discuss] GPFS GUI does not create snapshots Message-ID: Hello Simon, I have already rebooted the server but no change. I also see no calls to mmcrsnapshot in the journalctl sudo log. Maybe there is a service which is not running? Kind regards Philipp Am 11.01.2021 20:07 schrieb Simon Thompson : Have you tried restarting the gpfs.gui service? At some point in the past we have seen similar and restarting the GUI made it start again. Simon From: on behalf of "Philipp.Rehs at uni-duesseldorf.de" Reply to: "gpfsug-discuss at spectrumscale.org" Date: Monday, 11 January 2021 at 19:03 To: "gpfsug-discuss at spectrumscale.org" Subject: [gpfsug-discuss] GPFS GUI does not create snapshots Hello, we have gpfs GUI on 4.2.3.22 running and it suddenly stopped to create new snapshots from schedule. I can manually create snapshots but none is created from schedule. How can I debug it? Kind regards Philipp Rehs -------------- next part -------------- An HTML attachment was scrubbed... URL: From rohwedder at de.ibm.com Tue Jan 12 10:46:16 2021 From: rohwedder at de.ibm.com (Markus Rohwedder) Date: Tue, 12 Jan 2021 11:46:16 +0100 Subject: [gpfsug-discuss] GPFS GUI does not create snapshots In-Reply-To: References: Message-ID: Hello Philipp. there is no additional service that covers the snapshot scheduling besides the GUI service. Please note, that in case you have two GUI instances running, the snapshot scheduling would have moved to the second instance in case you reboot. The GUI/REST application logs are located in /var/log/cnlog/mgtsrv, but I propose to open a support case for this issue. Mit freundlichen Gr??en / Kind regards Dr. Markus Rohwedder IBM Systems / Lab Services Europe / EMEA Storage Competence Center Phone: +49 162 4159920 IBM Deutschland GmbH E-Mail: rohwedder at de.ibm.com Am Weiher 24 65451 Kelsterbach Germany IBM Deutschland GmbH / Vorsitzender des Aufsichtsrats: Sebastian Krause / Gesch?ftsf?hrung: Gregor Pillen (Vorsitzender), Agnes Heftberger, Norbert Janzen, Markus Koerner, Christian Noll, Nicole Reimer / Sitz der Gesellschaft: 71139 Ehningen, IBM-Allee 1 / Registergericht: Amtsgericht Stuttgart, HRB14562 From: "Rehs, Philipp Helo" To: gpfsug main discussion list Date: 11.01.2021 20:17 Subject: [EXTERNAL] Re: [gpfsug-discuss] GPFS GUI does not create snapshots Sent by: gpfsug-discuss-bounces at spectrumscale.org Hello Simon, I have already rebooted the server but no change. I also see no calls to mmcrsnapshot in the journalctl sudo log. Maybe there is a service which is not running? Kind regards Philipp Am 11.01.2021 20:07 schrieb Simon Thompson : Have you tried restarting the gpfs.gui service? At some point in the past we have seen similar and restarting the GUI made it start again. Simon From: on behalf of "Philipp.Rehs at uni-duesseldorf.de" Reply to: "gpfsug-discuss at spectrumscale.org" Date: Monday, 11 January 2021 at 19:03 To: "gpfsug-discuss at spectrumscale.org" Subject: [gpfsug-discuss] GPFS GUI does not create snapshots Hello, we have gpfs GUI on 4.2.3.22 running and it suddenly stopped to create new snapshots from schedule. I can manually create snapshots but none is created from schedule. How can I debug it? Kind regards Philipp Rehs _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: ecblank.gif Type: image/gif Size: 45 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: 1E685739.gif Type: image/gif Size: 4659 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: From cabrillo at ifca.unican.es Tue Jan 12 14:32:23 2021 From: cabrillo at ifca.unican.es (Iban Cabrillo) Date: Tue, 12 Jan 2021 15:32:23 +0100 (CET) Subject: [gpfsug-discuss] Disk in unrecovered state Message-ID: <1959609895.205347.1610461943831.JavaMail.zimbra@ifca.unican.es> Dear, Since this moning I have a couple of disk (7) in down state, I have tried to start them again but after that they change to unrecovered. These "failed" disk are only DATA. Both pool Data and Metadata has two failures groups, and set replica to 2. The Metadata disks are in two different enclosures one for each filure group. The filesystem has been unmounted , but when i have tried to run the mmfsck told me the I should remove the down disk [root at gpfs06 ~]# mmlsdisk gpfs2 -L | grep -v up disk driver sector failure holds holds storage ------------ -------- ------ ----------- -------- ----- ------------- ------------ ------- ------------ --------- ..... nsd18jbod1 nsd 512 2 No Yes to be emptied unrecovered 26 data nsd19jbod1 nsd 512 2 No Yes ready unrecovered 27 data nsd19jbod2 nsd 512 3 No Yes ready down 46 data nsd24jbod2 nsd 512 3 No Yes ready down 51 data nsd57jbod1 nsd 512 2 No Yes ready down 109 data nsd61jbod1 nsd 512 2 No Yes ready down 113 data nsd71jbod1 nsd 512 2 No Yes ready down 123 data ..... Any help is welcomed. Regards, I -------------- next part -------------- An HTML attachment was scrubbed... URL: From bbanister at jumptrading.com Tue Jan 12 15:11:22 2021 From: bbanister at jumptrading.com (Bryan Banister) Date: Tue, 12 Jan 2021 15:11:22 +0000 Subject: [gpfsug-discuss] Disk in unrecovered state In-Reply-To: <1959609895.205347.1610461943831.JavaMail.zimbra@ifca.unican.es> References: <1959609895.205347.1610461943831.JavaMail.zimbra@ifca.unican.es> Message-ID: Definitely recommend getting a IBM Case in and ask someone for direct assistance (Zoom even). Then also check that you can access all of the underlying storage with READ ONLY operations from all defined NSD Servers in the NSD ServerList for nsd18jbod1 and nsd19jbod1. Given the name of the NSDs, sound like there is not any RAID protection on theses disks. If so then you would have serious data loss issues with one of the drives corrupted. Hope that helps, -Bryan From: gpfsug-discuss-bounces at spectrumscale.org On Behalf Of Iban Cabrillo Sent: Tuesday, January 12, 2021 8:32 AM To: gpfsug-discuss Subject: [gpfsug-discuss] Disk in unrecovered state [EXTERNAL EMAIL] Dear, Since this moning I have a couple of disk (7) in down state, I have tried to start them again but after that they change to unrecovered. These "failed" disk are only DATA. Both pool Data and Metadata has two failures groups, and set replica to 2. The Metadata disks are in two different enclosures one for each filure group. The filesystem has been unmounted , but when i have tried to run the mmfsck told me the I should remove the down disk [root at gpfs06 ~]# mmlsdisk gpfs2 -L | grep -v up disk driver sector failure holds holds storage ------------ -------- ------ ----------- -------- ----- ------------- ------------ ------- ------------ --------- ..... nsd18jbod1 nsd 512 2 No Yes to be emptied unrecovered 26 data nsd19jbod1 nsd 512 2 No Yes ready unrecovered 27 data nsd19jbod2 nsd 512 3 No Yes ready down 46 data nsd24jbod2 nsd 512 3 No Yes ready down 51 data nsd57jbod1 nsd 512 2 No Yes ready down 109 data nsd61jbod1 nsd 512 2 No Yes ready down 113 data nsd71jbod1 nsd 512 2 No Yes ready down 123 data ..... Any help is welcomed. Regards, I -------------- next part -------------- An HTML attachment was scrubbed... URL: From Renar.Grunenberg at huk-coburg.de Tue Jan 12 15:21:33 2021 From: Renar.Grunenberg at huk-coburg.de (Grunenberg, Renar) Date: Tue, 12 Jan 2021 15:21:33 +0000 Subject: [gpfsug-discuss] Disk in unrecovered state In-Reply-To: <1959609895.205347.1610461943831.JavaMail.zimbra@ifca.unican.es> References: <1959609895.205347.1610461943831.JavaMail.zimbra@ifca.unican.es> Message-ID: <3a7cf4ec9ee8458cb8275453f7b9d0a1@huk-coburg.de> Hallo Iban, first you should check the path to the disk. (mmlsnsd -m) It seems to be broken from the OS view. This should fixed first. If you see no dev entry you have a HW problem. If this is fixed then you can start each disk individuell to see there are something start here. On wich scale version do you are? Renar Grunenberg Abteilung Informatik - Betrieb HUK-COBURG Bahnhofsplatz 96444 Coburg Telefon: 09561 96-44110 Telefax: 09561 96-44104 E-Mail: Renar.Grunenberg at huk-coburg.de Internet: www.huk.de ________________________________ HUK-COBURG Haftpflicht-Unterst?tzungs-Kasse kraftfahrender Beamter Deutschlands a. G. in Coburg Reg.-Gericht Coburg HRB 100; St.-Nr. 9212/101/00021 Sitz der Gesellschaft: Bahnhofsplatz, 96444 Coburg Vorsitzender des Aufsichtsrats: Prof. Dr. Heinrich R. Schradin. Vorstand: Klaus-J?rgen Heitmann (Sprecher), Stefan Gronbach, Dr. Hans Olav Her?y, Dr. J?rg Rheinl?nder, Sarah R?ssler, Thomas Sehn, Daniel Thomas. ________________________________ Diese Nachricht enth?lt vertrauliche und/oder rechtlich gesch?tzte Informationen. Wenn Sie nicht der richtige Adressat sind oder diese Nachricht irrt?mlich erhalten haben, informieren Sie bitte sofort den Absender und vernichten Sie diese Nachricht. Das unerlaubte Kopieren sowie die unbefugte Weitergabe dieser Nachricht ist nicht gestattet. This information may contain confidential and/or privileged information. If you are not the intended recipient (or have received this information in error) please notify the sender immediately and destroy this information. Any unauthorized copying, disclosure or distribution of the material in this information is strictly forbidden. ________________________________ Von: gpfsug-discuss-bounces at spectrumscale.org Im Auftrag von Iban Cabrillo Gesendet: Dienstag, 12. Januar 2021 15:32 An: gpfsug-discuss Betreff: [gpfsug-discuss] Disk in unrecovered state Dear, Since this moning I have a couple of disk (7) in down state, I have tried to start them again but after that they change to unrecovered. These "failed" disk are only DATA. Both pool Data and Metadata has two failures groups, and set replica to 2. The Metadata disks are in two different enclosures one for each filure group. The filesystem has been unmounted , but when i have tried to run the mmfsck told me the I should remove the down disk [root at gpfs06 ~]# mmlsdisk gpfs2 -L | grep -v up disk driver sector failure holds holds storage ------------ -------- ------ ----------- -------- ----- ------------- ------------ ------- ------------ --------- ..... nsd18jbod1 nsd 512 2 No Yes to be emptied unrecovered 26 data nsd19jbod1 nsd 512 2 No Yes ready unrecovered 27 data nsd19jbod2 nsd 512 3 No Yes ready down 46 data nsd24jbod2 nsd 512 3 No Yes ready down 51 data nsd57jbod1 nsd 512 2 No Yes ready down 109 data nsd61jbod1 nsd 512 2 No Yes ready down 113 data nsd71jbod1 nsd 512 2 No Yes ready down 123 data ..... Any help is welcomed. Regards, I -------------- next part -------------- An HTML attachment was scrubbed... URL: From cabrillo at ifca.unican.es Tue Jan 12 15:59:03 2021 From: cabrillo at ifca.unican.es (Iban Cabrillo) Date: Tue, 12 Jan 2021 16:59:03 +0100 (CET) Subject: [gpfsug-discuss] Disk in unrecovered state In-Reply-To: <3a7cf4ec9ee8458cb8275453f7b9d0a1@huk-coburg.de> References: <1959609895.205347.1610461943831.JavaMail.zimbra@ifca.unican.es> <3a7cf4ec9ee8458cb8275453f7b9d0a1@huk-coburg.de> Message-ID: <753758367.214256.1610467143052.JavaMail.zimbra@ifca.unican.es> Hi Renar, The version we are installed is 5.0.4-3, and the paths to these wrong disks seems to be fine: [root at gpfs06 ~]# mmlsnsd -m| grep nsd18jbod1 nsd18jbod1 0A0A00675EE76CF5 /dev/sds gpfs05.ifca.es server node nsd18jbod1 0A0A00675EE76CF5 /dev/sdby gpfs06.ifca.es server node [root at gpfs06 ~]# mmlsnsd -m| grep nsd19jbod1 nsd19jbod1 0A0A00665EE76CF6 /dev/sdt gpfs05.ifca.es server node nsd19jbod1 0A0A00665EE76CF6 /dev/sdaa gpfs06.ifca.es server node [root at gpfs06 ~]# mmlsnsd -m| grep nsd19jbod2 nsd19jbod2 0A0A00695EE79A12 /dev/sdt gpfs07.ifca.es server node nsd19jbod2 0A0A00695EE79A12 /dev/sdat gpfs08.ifca.es server node [root at gpfs06 ~]# mmlsnsd -m| grep nsd24jbod2 nsd24jbod2 0A0A00685EE79749 /dev/sdbn gpfs07.ifca.es server node nsd24jbod2 0A0A00685EE79749 /dev/sdcg gpfs08.ifca.es server node [root at gpfs06 ~]# mmlsnsd -m| grep nsd57jbod1 nsd57jbod1 0A0A00665F243CE1 /dev/sdbg gpfs05.ifca.es server node nsd57jbod1 0A0A00665F243CE1 /dev/sdbx gpfs06.ifca.es server node [root at gpfs06 ~]# mmlsnsd -m| grep nsd61jbod1 nsd61jbod1 0A0A00665F243CFA /dev/sdbk gpfs05.ifca.es server node nsd61jbod1 0A0A00665F243CFA /dev/sdy gpfs06.ifca.es server node [root at gpfs06 ~]# mmlsnsd -m| grep nsd71jbod1 nsd71jbod1 0A0A00665F243D38 /dev/sdbu gpfs05.ifca.es server node nsd71jbod1 0A0A00665F243D38 /dev/sdbv gpfs06.ifca.es server node trying to start 19jbod1 again: [root at gpfs06 ~]# mmchdisk gpfs2 start -d nsd19jbod1 mmnsddiscover: Attempting to rediscover the disks. This may take a while ... mmnsddiscover: Finished. gpfs06.ifca.es: Rediscovered nsd server access to nsd19jbod1. gpfs05.ifca.es: Rediscovered nsd server access to nsd19jbod1. Failed to open gpfs2. Log recovery failed. Input/output error Initial disk state was updated successfully, but another error may have changed the state again. mmchdisk: Command failed. Examine previous error messages to determine cause. Regards, I From olaf.weiser at de.ibm.com Tue Jan 12 16:30:24 2021 From: olaf.weiser at de.ibm.com (Olaf Weiser) Date: Tue, 12 Jan 2021 16:30:24 +0000 Subject: [gpfsug-discuss] Disk in unrecovered state In-Reply-To: <753758367.214256.1610467143052.JavaMail.zimbra@ifca.unican.es> References: <753758367.214256.1610467143052.JavaMail.zimbra@ifca.unican.es>, <1959609895.205347.1610461943831.JavaMail.zimbra@ifca.unican.es><3a7cf4ec9ee8458cb8275453f7b9d0a1@huk-coburg.de> Message-ID: An HTML attachment was scrubbed... URL: From nikhilk at us.ibm.com Tue Jan 12 17:32:08 2021 From: nikhilk at us.ibm.com (Nikhil Khandelwal) Date: Tue, 12 Jan 2021 17:32:08 +0000 Subject: [gpfsug-discuss] Disk in unrecovered state In-Reply-To: References: , <753758367.214256.1610467143052.JavaMail.zimbra@ifca.unican.es>, <1959609895.205347.1610461943831.JavaMail.zimbra@ifca.unican.es><3a7cf4ec9ee8458cb8275453f7b9d0a1@huk-coburg.de> Message-ID: An HTML attachment was scrubbed... URL: From cabrillo at ifca.unican.es Wed Jan 13 10:23:20 2021 From: cabrillo at ifca.unican.es (Iban Cabrillo) Date: Wed, 13 Jan 2021 11:23:20 +0100 (CET) Subject: [gpfsug-discuss] Disk in unrecovered state In-Reply-To: References: <753758367.214256.1610467143052.JavaMail.zimbra@ifca.unican.es> <1959609895.205347.1610461943831.JavaMail.zimbra@ifca.unican.es> <3a7cf4ec9ee8458cb8275453f7b9d0a1@huk-coburg.de> Message-ID: <538725591.388718.1610533400379.JavaMail.zimbra@ifca.unican.es> Hi Guys, Devices seems to be accesible from both server primary and secondary, and thr harware state is "Optimal" [root at gpfs05 ~]# mmlsnsd -m| grep nsd18jbod1 nsd18jbod1 0A0A00675EE76CF5 /dev/sds gpfs05.ifca.es server node nsd18jbod1 0A0A00675EE76CF5 /dev/sdby gpfs06.ifca.es server node [root at gpfs05 ~]# #dd if=/dev/sds [root at gpfs05 ~]# man od [root at gpfs05 ~]# dd if=/dev/sds bs=4k count=2 | od -c 2+0 records in 2+0 records out 8192 bytes (8.2 kB) copied, 0.000249162 s, 32.9 MB/s 0000000 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 * 0000700 001 \0 356 376 377 377 001 \0 \0 \0 377 377 377 377 \0 \0 0000720 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 * 0000760 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 U 252 0001000 E F I P A R T \0 \0 001 \0 \ \0 \0 \0 0001020 \r 0 267 u \0 \0 \0 \0 001 \0 \0 \0 \0 \0 \0 \0 0001040 257 * 201 243 003 \0 \0 \0 " \0 \0 \0 \0 \0 \0 \0 0001060 216 * 201 243 003 \0 \0 \0 240 ! 302 3 . R \f M 0001100 200 241 323 024 245 h | G 002 \0 \0 \0 \0 \0 \0 \0 0001120 200 \0 \0 \0 200 \0 \0 \0 p b 203 F \0 \0 \0 \0 0001140 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 * 0002000 220 374 257 7 } 357 226 N 221 303 - z 340 U 261 t 0002020 316 343 324 ' } 033 K C 203 a 314 = 220 k 336 023 0002040 0 \0 \0 \0 \0 \0 \0 \0 177 * 201 243 003 \0 \0 \0 0002060 001 \0 \0 \0 \0 \0 \0 @ G \0 P \0 F \0 S \0 0002100 : \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 0002120 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 * 0020000 Regards, I -------------- next part -------------- An HTML attachment was scrubbed... URL: From Achim.Rehor at de.ibm.com Wed Jan 13 11:51:44 2021 From: Achim.Rehor at de.ibm.com (Achim Rehor) Date: Wed, 13 Jan 2021 12:51:44 +0100 Subject: [gpfsug-discuss] Disk in unrecovered state In-Reply-To: <753758367.214256.1610467143052.JavaMail.zimbra@ifca.unican.es> References: <1959609895.205347.1610461943831.JavaMail.zimbra@ifca.unican.es><3a7cf4ec9ee8458cb8275453f7b9d0a1@huk-coburg.de> <753758367.214256.1610467143052.JavaMail.zimbra@ifca.unican.es> Message-ID: Hi Iban, given that you have physical access to the disks and they are readable ( i see you checked that via dd command) you should mmchdisk start them. Note: as you have down disks in more than one FG, you will need to be able to at least get one good copy of the metadata readable .. in order to be able to mmchdisk start a disk. In that case i would run : mmchdisk start -a (so gpfs can get data from all readable disks) Mit freundlichen Gr??en / Kind regards Achim Rehor Remote Technical Support Engineer Storage IBM Systems Storage Support - EMEA Storage Competence Center (ESCC) Spectrum Scale / Elastic Storage Server ------------------------------------------------------------------------------------------------------------------------------------------- IBM Deutschland Am Weiher 24 65451 Kelsterbach Phone: +49-170-4521194 E-Mail: Achim.Rehor at de.ibm.com ------------------------------------------------------------------------------------------------------------------------------------------- IBM Deutschland GmbH / Vorsitzender des Aufsichtsrats: Sebastian Krause Gesch?ftsf?hrung: Gregor Pillen (Vorsitzender), Agnes Heftberger, Norbert Janzen, Markus Koerner, Christian Noll, Nicole Reimer Sitz der Gesellschaft: Ehningen / Registergericht: Amtsgericht Stuttgart, HRB 14562 / WEEE-Reg.-Nr. DE 99369940 gpfsug-discuss-bounces at spectrumscale.org wrote on 12/01/2021 16:59:03: > From: Iban Cabrillo > To: gpfsug-discuss > Date: 12/01/2021 16:59 > Subject: [EXTERNAL] Re: [gpfsug-discuss] Disk in unrecovered state > Sent by: gpfsug-discuss-bounces at spectrumscale.org > > Hi Renar, > The version we are installed is 5.0.4-3, and the paths to these > wrong disks seems to be fine: > > [root at gpfs06 ~]# mmlsnsd -m| grep nsd18jbod1 > nsd18jbod1 0A0A00675EE76CF5 /dev/sds gpfs05.ifca.es > server node > nsd18jbod1 0A0A00675EE76CF5 /dev/sdby gpfs06.ifca.es > server node > [root at gpfs06 ~]# mmlsnsd -m| grep nsd19jbod1 > nsd19jbod1 0A0A00665EE76CF6 /dev/sdt gpfs05.ifca.es > server node > nsd19jbod1 0A0A00665EE76CF6 /dev/sdaa gpfs06.ifca.es > server node > [root at gpfs06 ~]# mmlsnsd -m| grep nsd19jbod2 > nsd19jbod2 0A0A00695EE79A12 /dev/sdt gpfs07.ifca.es > server node > nsd19jbod2 0A0A00695EE79A12 /dev/sdat gpfs08.ifca.es > server node > [root at gpfs06 ~]# mmlsnsd -m| grep nsd24jbod2 > nsd24jbod2 0A0A00685EE79749 /dev/sdbn gpfs07.ifca.es > server node > nsd24jbod2 0A0A00685EE79749 /dev/sdcg gpfs08.ifca.es > server node > [root at gpfs06 ~]# mmlsnsd -m| grep nsd57jbod1 > nsd57jbod1 0A0A00665F243CE1 /dev/sdbg gpfs05.ifca.es > server node > nsd57jbod1 0A0A00665F243CE1 /dev/sdbx gpfs06.ifca.es > server node > [root at gpfs06 ~]# mmlsnsd -m| grep nsd61jbod1 > nsd61jbod1 0A0A00665F243CFA /dev/sdbk gpfs05.ifca.es > server node > nsd61jbod1 0A0A00665F243CFA /dev/sdy gpfs06.ifca.es > server node > [root at gpfs06 ~]# mmlsnsd -m| grep nsd71jbod1 > nsd71jbod1 0A0A00665F243D38 /dev/sdbu gpfs05.ifca.es > server node > nsd71jbod1 0A0A00665F243D38 /dev/sdbv gpfs06.ifca.es > server node > > trying to start 19jbod1 again: > [root at gpfs06 ~]# mmchdisk gpfs2 start -d nsd19jbod1 > mmnsddiscover: Attempting to rediscover the disks. This may take awhile ... > mmnsddiscover: Finished. > gpfs06.ifca.es: Rediscovered nsd server access to nsd19jbod1. > gpfs05.ifca.es: Rediscovered nsd server access to nsd19jbod1. > Failed to open gpfs2. > Log recovery failed. > Input/output error > Initial disk state was updated successfully, but another error may > have changed the state again. > mmchdisk: Command failed. Examine previous error messages to determine cause. > > Regards, I > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > INVALID URI REMOVED > u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx- > siA1ZOg&r=RGTETs2tk0Kz_VOpznDVDkqChhnfLapOTkxLvgmR2- > M&m=f4oAWXtPlhIm5cEShA0Amlf1ZUG3PyXvVbzB9e- > I3hk&s=SA1wXw8XXPjvMbSU6TILc2vnC4KxkfoboM8RolqBmuc&e= > From cabrillo at ifca.unican.es Wed Jan 13 12:26:17 2021 From: cabrillo at ifca.unican.es (Iban Cabrillo) Date: Wed, 13 Jan 2021 13:26:17 +0100 (CET) Subject: [gpfsug-discuss] Disk in unrecovered state In-Reply-To: References: <1959609895.205347.1610461943831.JavaMail.zimbra@ifca.unican.es> <3a7cf4ec9ee8458cb8275453f7b9d0a1@huk-coburg.de> <753758367.214256.1610467143052.JavaMail.zimbra@ifca.unican.es> Message-ID: <1691473688.446197.1610540777278.JavaMail.zimbra@ifca.unican.es> Thanks a lot!! Guys, this do the trick Now, whole disks are up again and the FS has been mounted without troubles, Cheers, I From anacreo at gmail.com Wed Jan 20 11:09:27 2021 From: anacreo at gmail.com (Alec) Date: Wed, 20 Jan 2021 03:09:27 -0800 Subject: [gpfsug-discuss] Spectrum Scale 5 and Reading Compressed Data Message-ID: We have AIX and Spectrum Scale 5.1 and are compressing older data. We can compress data at about 10GB/minute and decompress data wicked fast using mmchattr, when a user reads data from a compressed file via application open / read calls.... it moves at about 5MB/s. Normally our I/O pipeline allows for 2400MB/s on a single file read. What can we look at to speed up the read of the compressed data, are there any tunables that might affect this? As it is now if the backup daemon is backing up a compressed file, it can get stuck for hours, I will go and mmchattr to decompress the file, within a minute the file is decompressed, and backed up, then I simply recompress the file once backup has moved on. Any advice on how to improve the compressed reads under AIX would be very helpful. Alec -------------- next part -------------- An HTML attachment was scrubbed... URL: From daniel.kidger at uk.ibm.com Wed Jan 20 11:59:39 2021 From: daniel.kidger at uk.ibm.com (Daniel Kidger) Date: Wed, 20 Jan 2021 11:59:39 +0000 Subject: [gpfsug-discuss] Spectrum Scale 5 and Reading Compressed Data In-Reply-To: References: Message-ID: An HTML attachment was scrubbed... URL: From janfrode at tanso.net Wed Jan 20 14:47:07 2021 From: janfrode at tanso.net (Jan-Frode Myklebust) Date: Wed, 20 Jan 2021 15:47:07 +0100 Subject: [gpfsug-discuss] Spectrum Scale 5 and Reading Compressed Data In-Reply-To: References: Message-ID: This sounds like a bug to me... (I wouldn't expect mmchattr works on different node than other file access). I would check "mmdiag --iohist verbose" during these slow reads, to see if it gives a hint at what it's doing, versus what it shows during "mmchattr". Maybe one is triggering prefetch, while the other is some kind of random IO ? Also might be worth to try a mmtrace. Compare the traces for mmtrace start trace="all 0 vnode 1 vnop 1 io 1" cat compressedLargeFile mmtrace stop vs.: mmtrace start trace="all 0 vnode 1 vnop 1 io 1" mmchattr --compress no someLargeFile mmtrace stop (but please make sure that the file wasn't already uncompressed in pagepool in this second run). -jf On Wed, Jan 20, 2021 at 12:59 PM Daniel Kidger wrote: > I think you need to think about which node the file is being decompressed > on (and if that node has plenty of space in the page pool.) > iirc mmchattr works on one of the 'manager' nodes not necessarily the node > you typed the command on? > Daniel > > _________________________________________________________ > *Daniel Kidger Ph.D.* > IBM Technical Sales Specialist > Spectrum Scale, Spectrum Discover and IBM Cloud Object Storage > > +44-(0)7818 522 266 > daniel.kidger at uk.ibm.com > > > > > > > > > > ----- Original message ----- > From: Alec > Sent by: gpfsug-discuss-bounces at spectrumscale.org > To: gpfsug-discuss at spectrumscale.org > Cc: > Subject: [EXTERNAL] [gpfsug-discuss] Spectrum Scale 5 and Reading > Compressed Data > Date: Wed, Jan 20, 2021 11:10 > > We have AIX and Spectrum Scale 5.1 and are compressing older data. > > We can compress data at about 10GB/minute and decompress data wicked fast > using mmchattr, when a user reads data from a compressed file via > application open / read calls.... it moves at about 5MB/s. Normally our > I/O pipeline allows for 2400MB/s on a single file read. > > What can we look at to speed up the read of the compressed data, are there > any tunables that might affect this? > > As it is now if the backup daemon is backing up a compressed file, it can > get stuck for hours, I will go and mmchattr to decompress the file, within > a minute the file is decompressed, and backed up, then I simply recompress > the file once backup has moved on. > > Any advice on how to improve the compressed reads under AIX would be very > helpful. > > Alec > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > > https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=HlQDuUjgJx4p54QzcXd0_zTwf4Cr2t3NINalNhLTA2E&m=USgQqOp8HDCg0DYjdjSVFvVOwq1rMgRYPP_hoZqgUyI&s=_hdEB3EvWW-8ZzdS1D1roh92-AicdrVMywJwQGlKTIQ&e= > > > > Unless stated otherwise above: > IBM United Kingdom Limited - Registered in England and Wales with number > 741598. > Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: From anacreo at gmail.com Wed Jan 20 22:10:39 2021 From: anacreo at gmail.com (Alec) Date: Wed, 20 Jan 2021 14:10:39 -0800 Subject: [gpfsug-discuss] Spectrum Scale 5 and Reading Compressed Data Message-ID: I see a lot of references to the page pool. Our page pool is only 8 gb and our files can be very large into the terrabytes. I will try increasing the page pool in dev to 2x a test file and see if the problem resolves. Any documentation on the correlation here would be nice. I will see if I can get rights for the debug as well. Alec -------------- next part -------------- An HTML attachment was scrubbed... URL: From anacreo at gmail.com Fri Jan 22 11:44:56 2021 From: anacreo at gmail.com (Alec) Date: Fri, 22 Jan 2021 03:44:56 -0800 Subject: [gpfsug-discuss] Spectrum Scale 5 and Reading Compressed Data In-Reply-To: References: Message-ID: When comparing compression performance I see the following performance, is anyone else getting significantly higher on any other systems? Read Speeds: lz4 with null fill data, ~ 90MB/s lz4 with a SAS data set, ~40-50MB/s z with null fill data, ~ 15MB/s z with a SAS data set, ~ 5MB/s While on a 4G page pool I tested each of these file sizes and got roughly identical performance in all cases: 1 GB, 5 GB, and 10GB. This was on an S824 (p8) with read performance typically going to 1.2GB/s of read on a single thread (non-compressed). Doing a "very limited test" in Production hardware E850, 8gb Page Pool, with ~2.4 GB/s of read on a single thread (non-compressed) I got very similar results. In all cases the work was done from the NSD master, and due to the file sizes and the difference in page pool, i'd expect the 1gb files to move at a significantly faster pace if pagepool was a factor. If anyone could tell me what performance they get on their platform and what OS or Hardware they're using, I'd very much be interested. I'm debating if using GPFS to migrate the files to a .gz compressed version, and then providing a fifo mechanism to pipe through the compressed data wouldn't be a better solution. Alec On Wed, Jan 20, 2021 at 2:10 PM Alec wrote: > I see a lot of references to the page pool. Our page pool is only 8 gb and > our files can be very large into the terrabytes. > > I will try increasing the page pool in dev to 2x a test file and see if > the problem resolves. > > Any documentation on the correlation here would be nice. > > I will see if I can get rights for the debug as well. > > Alec > -------------- next part -------------- An HTML attachment was scrubbed... URL: From cabrillo at ifca.unican.es Wed Jan 27 13:20:08 2021 From: cabrillo at ifca.unican.es (Iban Cabrillo) Date: Wed, 27 Jan 2021 14:20:08 +0100 (CET) Subject: [gpfsug-discuss] cannot unmount fs Message-ID: <1692854831.1435321.1611753608516.JavaMail.zimbra@ifca.unican.es> Dear, We have a couple of GPFS fs, gpfs mount on /gpfs and gpfs2 mount on /gpfs/external, the problem is the mount path of the second fs sometimes is missied I am trying to mmumount this FS in order to change the mount path. but I cann't. If I make a mmumont gpfs2 or mmumount /gpfs/external I get this error: [root at gpfsgui ~]# mmumount gpfs2 Wed Jan 27 14:11:07 CET 2021: mmumount: Unmounting file systems ... umount: /gpfs/external: not mounted (/gpfs/external path exists) If I try to mmchfs -T XXX , the system says that the FS is already mounted. But there is no error in the logs. Any Idea? Regards, I -------------- next part -------------- An HTML attachment was scrubbed... URL: From olaf.weiser at de.ibm.com Wed Jan 27 13:28:44 2021 From: olaf.weiser at de.ibm.com (Olaf Weiser) Date: Wed, 27 Jan 2021 13:28:44 +0000 Subject: [gpfsug-discuss] cannot unmount fs In-Reply-To: <1692854831.1435321.1611753608516.JavaMail.zimbra@ifca.unican.es> References: <1692854831.1435321.1611753608516.JavaMail.zimbra@ifca.unican.es> Message-ID: An HTML attachment was scrubbed... URL: From owen.morgan at motionpicturesolutions.com Wed Jan 27 17:14:45 2021 From: owen.morgan at motionpicturesolutions.com (Owen Morgan) Date: Wed, 27 Jan 2021 17:14:45 +0000 Subject: [gpfsug-discuss] General Introduction Message-ID: Hi Everyone, First off thanks for this user group existing! I've already watched a load of the great webinars that were uploaded to YouTube! My name is Owen Morgan and I'm currently the 'Archivist' at Motion Picture Solutions in the UK. MPS is a post-production and distribution facility for the major studios and a multitude of smaller studios. Their main area of operation is mastering and localisation of feature films along with trailer creation etc.. They also then have a combined Hard drive and Internet based distribution arm that can distribute all that content to all cinemas in the UK and, with a huge number of growing partners and co-investors, globally as well. My role started of primarily as just archiving data to tar based LTO tapes, but in recent times has moved to using Spectrum Scale and Spectrum Archive and now to pretty much managing those systems from a sysadmin level. Recently MPS invested in a Spectrum Scale system for their content network, and again, I'm starting to take over management of that both on a ILM perspective and actively involved with maintenance and support. Enough about me. I have a 'first question' but will send that over separately over the next day or so to stop this email being a novella! Thanks and nice to meet people! Owen. [Sent from Front] Owen Morgan Data Wrangler Motion Picture Solutions Ltd T: E: owen.morgan at motionpicturesolutions.com | W: motionpicturesolutions.com A: Mission Hall, 9-11 North End Road, London, W14 8ST Motion Picture Solutions Ltd is a company registered in England and Wales under number 5388229, VAT number 201330482 -------------- next part -------------- An HTML attachment was scrubbed... URL: From owen.morgan at motionpicturesolutions.com Wed Jan 27 22:17:09 2021 From: owen.morgan at motionpicturesolutions.com (Owen Morgan) Date: Wed, 27 Jan 2021 22:17:09 +0000 Subject: [gpfsug-discuss] Policy Rules Syntax to find files older than X days excluding weekends in the calculation.... Message-ID: Hi Everyone, First question from me I appreciate this is policy engine thing as opposed to more fundamental Spectrum Scale so hope its ok! I'm trying to find a 'neat' way within a couple of policy rules to measure different time intervals (in days) but solely interested in WEEK DAYS only (ie delete files older than X week days only). An example is one of the rules a team would like implemented is delete all files older than 10 business days (ie week days only. We are ignoring public holidays as if they don't exist). Followed by a separate rule for a different folder of deleting all files older than 4 business days. The only way I've been able to facilitate this so far for the 4 business days is to separate out Fridays as a separate rule (because Friday - 4 days are all week days), then a separate rule for Monday through Thursday (because timestamp - 4 days has to factor in weekends, so has to actually set the INTERVAL to 6 days). Likewise for the 10 days rule I have to have a method to separate out Monday - Wednesday, and Thursday and Friday. I feel my 'solution', which does work, is extremely messy and not ideal should they want to add more rules as it just makes the policy file very long full of random definitions for all the different scenarios. So whilst the 'rules' are simple thanks to definitions, its the definitions themselves that are stacking up... depending on the interval required I have to create a unique set of is_weekday definitions and unique is_older_than_xdays definitions. here is a snippet of the policy: define( is_older_than_4days, ( (CURRENT_TIMESTAMP - CREATION_TIME) >= INTERVAL '4' DAYS ) ) define( is_older_than_6days, ( (CURRENT_TIMESTAMP - CREATION_TIME) >= INTERVAL '6' DAYS ) ) define( is_weekday_ex_fri, ( DAYOFWEEK(CURRENT_DATE) IN (2,3,4,5) ) ) define( is_weekday_ex_fri, ( DAYOFWEEK(CURRENT_DATE) = 6 ) ) RULE 'rule name' WHEN is_weekday_ex_fri DELETE WHERE include_list /* an include list just not added above */ AND is_older_than_6days RULE 'rule name' WHEN is_fri DELETE WHERE include_list /* an include list just not added above */ AND is_older_than_4days Are there any 'neat' other ways that are a tad more 'concise' for calculating INTERVAL X weekdays only which is easily and concisely extendable for any permutation of intervals required. I'm not sure how much SQL you can shoehorn into a policy before mmapplypolicy / policy engine isn't happy. Thanks in advance, Owen. [Sent from Front] Owen Morgan Data Wrangler Motion Picture Solutions Ltd T: E: owen.morgan at motionpicturesolutions.com | W: motionpicturesolutions.com A: Mission Hall, 9-11 North End Road, London, W14 8ST Motion Picture Solutions Ltd is a company registered in England and Wales under number 5388229, VAT number 201330482 -------------- next part -------------- An HTML attachment was scrubbed... URL: From owen.morgan at motionpicturesolutions.com Thu Jan 28 14:27:35 2021 From: owen.morgan at motionpicturesolutions.com (Owen Morgan) Date: Thu, 28 Jan 2021 14:27:35 +0000 Subject: [gpfsug-discuss] Policy Rules Syntax to find files older than X days excluding weekends in the calculation.... In-Reply-To: <1360632-1611790655.643971@r36M.X7Dl.WWDV> References: , <1360632-1611790655.643971@r36M.X7Dl.WWDV> Message-ID: Mark, Thank you for taking the time to comment, I genuinely appreciate it! I will digest and look at the mmfind examples (to be honest, didn't know it was a thing.....). Everything I know about Spectrum Scale (and Spectrum Archive) has been self taught so...... I'm pretty sure I'm missing Soooooooooo much useful info! I wish there was like a dummies guide (I've read the redbooks and admin guides as best I can but I know my knowledge is patchy at best)! Once digested I may, or may not, have further questions but I genuinely thank you for your assistance. Owen. [Sent from Front] Owen Morgan Data Wrangler Motion Picture Solutions Ltd T: E: owen.morgan at motionpicturesolutions.com | W: motionpicturesolutions.com A: Mission Hall, 9-11 North End Road, London, W14 8ST Motion Picture Solutions Ltd is a company registered in England and Wales under number 5388229, VAT number 201330482 On Wed, Jan 27 at 11:53 pm, > mark.bergman at uphs.upenn.edu wrote: In the message dated: Wed, 27 Jan 2021 22:17:09 +0000, The pithy ruminations from Owen Morgan on [[External] [gpfsug-discuss] Policy Rules Syntax to find files older than X days excluding weekends in the calculation....] were: => Hi Everyone, => => First question from me I appreciate this is policy engine thing as => opposed to more fundamental Spectrum Scale so hope its ok! It's great. => => I'm trying to find a 'neat' way within a couple of policy rules to => measure different time intervals (in days) but solely interested in WEEK => DAYS only (ie delete files older than X week days only). Policy SQL syntax gives me a headache. For this kind of task, I find that mmfind is your friend -- it's in the "examples" source dir within /usr/lpp/mmfs. Trivial to compile & install. Easier to debug, and it will generate the SQL. => => An example is one of the rules a team would like implemented is delete => all files older than 10 business days (ie week days only. We are What about "delete all files older than 12 calendar days" -- by definition, those files are older than 10 business days as well. => ignoring public holidays as if they don't exist). Followed by a separate => rule for a different folder of deleting all files older than 4 business => days. Or, older than 6 calendar days. Or, run this nightly: #! /bin/bash dateOffset=0 if [ `date '+%u'` -le 4 ] ; then # Mon=1, Tue=2, Wed=3, Thu=4 # # For a file to be more than 4 business days old on-or-before the # 4th day of the week, it must span the weekend, so offset the number # of required days in the file age dateOffset=2 fi mmfind -mtime $((4 + $dateOffset)) /path/to/Nuke/After/4/Days -xarg rm -f => => Thanks in advance, => => Owen. [Sent from Front] => => Owen Morgan Data Wrangler Motion Picture Solutions Ltd T: E: => owen.morgan at motionpicturesolutions.com | W: motionpicturesolutions.com => A: Mission Hall, 9-11 North End Road, London, W14 8ST Motion Picture => Solutions Ltd is a company registered in England and Wales under number => 5388229, VAT number 201330482 => -------------- next part -------------- An HTML attachment was scrubbed... URL: From jamervi at sandia.gov Thu Jan 28 18:26:37 2021 From: jamervi at sandia.gov (Mervini, Joseph A) Date: Thu, 28 Jan 2021 18:26:37 +0000 Subject: [gpfsug-discuss] Number of vCPUs exceeded Message-ID: <59193954-B649-4DF5-AD21-652922E49FD9@contoso.com> Hi, I haven?t seen this before but one of my remote cluster users reported the system in question is experiencing high loads and is with Scale unmounting the file system. This is the output she is seeing: Wed Jan 27 22:18:34.168 2021: [I] GPFS vCPU limits: Low warning limit 3 vCPUs, High warning limit 256 vCPUs, Hard limit 1536 vCPUs. Wed Jan 27 22:18:34.169 2021: [I] GPFS vCPU limits include all vCPUs that Linux sees as online or possibly online via hot add, ht/smt changes, etc. Wed Jan 27 22:18:34.170 2021: [X] GPFS detected 1792 vCPUs. This exceeds the warning limit of 256 vCPUs and the hard limit of 1536 vCPUs. GPFS will shutdown Any help will be appreciated. Thanks, Joe ==== Joe Mervini Sandia National Laboratories High Performance Computing 505.844.6770 jamervi at sandia.gov -------------- next part -------------- An HTML attachment was scrubbed... URL: From mzp at us.ibm.com Thu Jan 28 18:42:56 2021 From: mzp at us.ibm.com (Madhav Ponamgi1) Date: Thu, 28 Jan 2021 13:42:56 -0500 Subject: [gpfsug-discuss] gpfsug-discuss Digest, Vol 108, Issue 18 In-Reply-To: References: Message-ID: To calculate this directly (if you don't want to depend on a utility) consider the following steps. There are many more such algorithms in the wonderful book Calenderical Calculations. Take the last two digits of the year. Divide by 4, discarding any fraction. Add the day of the month. Add the month's key value: JFM AMJ JAS OND 144 025 036 146 Subtract 1 for January or February of a leap year. For a Gregorian date, add 0 for 1900's, 6 for 2000's, 4 for 1700's, 2 for 1800's; for other years, add or subtract multiples of 400. For a Julian date, add 1 for 1700's, and 1 for every additional century you go back. Add the last two digits of the year. Divide by 7 and take the remainder. --- Madhav mzp at us.ibm.com From: gpfsug-discuss-request at spectrumscale.org To: gpfsug-discuss at spectrumscale.org Date: 01/28/2021 01:32 PM Subject: [EXTERNAL] gpfsug-discuss Digest, Vol 108, Issue 18 Sent by: gpfsug-discuss-bounces at spectrumscale.org Send gpfsug-discuss mailing list submissions to gpfsug-discuss at spectrumscale.org To subscribe or unsubscribe via the World Wide Web, visit http://gpfsug.org/mailman/listinfo/gpfsug-discuss or, via email, send a message with subject or body 'help' to gpfsug-discuss-request at spectrumscale.org You can reach the person managing the list at gpfsug-discuss-owner at spectrumscale.org When replying, please edit your Subject line so it is more specific than "Re: Contents of gpfsug-discuss digest..." Today's Topics: 1. Re: Policy Rules Syntax to find files older than X days excluding weekends in the calculation.... (Owen Morgan) 2. Number of vCPUs exceeded (Mervini, Joseph A) ---------------------------------------------------------------------- Message: 1 Date: Thu, 28 Jan 2021 14:27:35 +0000 From: Owen Morgan To: "mark.bergman at uphs.upenn.edu" , "gpfsug-discuss at spectrumscale.org" Subject: Re: [gpfsug-discuss] Policy Rules Syntax to find files older than X days excluding weekends in the calculation.... Message-ID: Content-Type: text/plain; charset="utf-8" Mark, Thank you for taking the time to comment, I genuinely appreciate it! I will digest and look at the mmfind examples (to be honest, didn't know it was a thing.....). Everything I know about Spectrum Scale (and Spectrum Archive) has been self taught so...... I'm pretty sure I'm missing Soooooooooo much useful info! I wish there was like a dummies guide (I've read the redbooks and admin guides as best I can but I know my knowledge is patchy at best)! Once digested I may, or may not, have further questions but I genuinely thank you for your assistance. Owen. [Sent from Front] Owen Morgan Data Wrangler Motion Picture Solutions Ltd T: E: owen.morgan at motionpicturesolutions.com | W: motionpicturesolutions.com A: Mission Hall, 9-11 North End Road, London, W14 8ST Motion Picture Solutions Ltd is a company registered in England and Wales under number 5388229, VAT number 201330482 On Wed, Jan 27 at 11:53 pm, > mark.bergman at uphs.upenn.edu< mailto:mark.bergman at uphs.upenn.edu> wrote: In the message dated: Wed, 27 Jan 2021 22:17:09 +0000, The pithy ruminations from Owen Morgan on [[External] [gpfsug-discuss] Policy Rules Syntax to find files older than X days excluding weekends in the calculation....] were: => Hi Everyone, => => First question from me I appreciate this is policy engine thing as => opposed to more fundamental Spectrum Scale so hope its ok! It's great. => => I'm trying to find a 'neat' way within a couple of policy rules to => measure different time intervals (in days) but solely interested in WEEK => DAYS only (ie delete files older than X week days only). Policy SQL syntax gives me a headache. For this kind of task, I find that mmfind is your friend -- it's in the "examples" source dir within /usr/lpp/mmfs. Trivial to compile & install. Easier to debug, and it will generate the SQL. => => An example is one of the rules a team would like implemented is delete => all files older than 10 business days (ie week days only. We are What about "delete all files older than 12 calendar days" -- by definition, those files are older than 10 business days as well. => ignoring public holidays as if they don't exist). Followed by a separate => rule for a different folder of deleting all files older than 4 business => days. Or, older than 6 calendar days. Or, run this nightly: #! /bin/bash dateOffset=0 if [ `date '+%u'` -le 4 ] ; then # Mon=1, Tue=2, Wed=3, Thu=4 # # For a file to be more than 4 business days old on-or-before the # 4th day of the week, it must span the weekend, so offset the number # of required days in the file age dateOffset=2 fi mmfind -mtime $((4 + $dateOffset)) /path/to/Nuke/After/4/Days -xarg rm -f => => Thanks in advance, => => Owen. [Sent from Front] => => Owen Morgan Data Wrangler Motion Picture Solutions Ltd T: E: => owen.morgan at motionpicturesolutions.com< mailto:owen.morgan at motionpicturesolutions.com> | W: motionpicturesolutions.com< http://motionpicturesolutions.com > => A: Mission Hall, 9-11 North End Road, London, W14 8ST Motion Picture => Solutions Ltd is a company registered in England and Wales under number => 5388229, VAT number 201330482 => -------------- next part -------------- An HTML attachment was scrubbed... URL: < http://gpfsug.org/pipermail/gpfsug-discuss/attachments/20210128/201a280e/attachment-0001.html > ------------------------------ Message: 2 Date: Thu, 28 Jan 2021 18:26:37 +0000 From: "Mervini, Joseph A" To: "gpfsug-discuss at spectrumscale.org" Subject: [gpfsug-discuss] Number of vCPUs exceeded Message-ID: <59193954-B649-4DF5-AD21-652922E49FD9 at contoso.com> Content-Type: text/plain; charset="utf-8" Hi, I haven?t seen this before but one of my remote cluster users reported the system in question is experiencing high loads and is with Scale unmounting the file system. This is the output she is seeing: Wed Jan 27 22:18:34.168 2021: [I] GPFS vCPU limits: Low warning limit 3 vCPUs, High warning limit 256 vCPUs, Hard limit 1536 vCPUs. Wed Jan 27 22:18:34.169 2021: [I] GPFS vCPU limits include all vCPUs that Linux sees as online or possibly online via hot add, ht/smt changes, etc. Wed Jan 27 22:18:34.170 2021: [X] GPFS detected 1792 vCPUs. This exceeds the warning limit of 256 vCPUs and the hard limit of 1536 vCPUs. GPFS will shutdown Any help will be appreciated. Thanks, Joe ==== Joe Mervini Sandia National Laboratories High Performance Computing 505.844.6770 jamervi at sandia.gov -------------- next part -------------- An HTML attachment was scrubbed... URL: < http://gpfsug.org/pipermail/gpfsug-discuss/attachments/20210128/930fadb1/attachment.html > ------------------------------ _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss End of gpfsug-discuss Digest, Vol 108, Issue 18 *********************************************** -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: From knop at us.ibm.com Thu Jan 28 18:55:36 2021 From: knop at us.ibm.com (Felipe Knop) Date: Thu, 28 Jan 2021 18:55:36 +0000 Subject: [gpfsug-discuss] Number of vCPUs exceeded In-Reply-To: <59193954-B649-4DF5-AD21-652922E49FD9@contoso.com> References: <59193954-B649-4DF5-AD21-652922E49FD9@contoso.com> Message-ID: An HTML attachment was scrubbed... URL: From UWEFALKE at de.ibm.com Thu Jan 28 19:54:38 2021 From: UWEFALKE at de.ibm.com (Uwe Falke) Date: Thu, 28 Jan 2021 20:54:38 +0100 Subject: [gpfsug-discuss] gpfsug-discuss Digest, Vol 108, Issue 18 In-Reply-To: References: Message-ID: sounds quite complicated. if all public holidays can be ignored it is simple: the algorithm has only to run on week days (the effective age of files would not change on weekend days.). To find the latest date to remove files: Now, enumerate the weekdays, starting with Mon=1 If your max age is T find the integer multiple of 5 and the remainder such that T=T_i*5 +R Determine the current DoW in terms of your enumeration. if DoW - R > 0, your max age date is Dx=D-(R+7*T_i) else your max age date is Dx=D-(R+2+7*T_i dates can be easily compiled in epoch, like D_e=$(date +%s), Dx_e = D_e - 86400*(R+7*T_i) or Dx_e = D_e - 86400*(R+2+7*T_i) you then need to convert the found epoch time back into a christian date which could be done by date --date='@ To: gpfsug-discuss at spectrumscale.org Date: 28/01/2021 19:43 Subject: [EXTERNAL] Re: [gpfsug-discuss] gpfsug-discuss Digest, Vol 108, Issue 18 Sent by: gpfsug-discuss-bounces at spectrumscale.org To calculate this directly (if you don't want to depend on a utility) consider the following steps. There are many more such algorithms in the wonderful book Calenderical Calculations. 1. Take the last two digits of the year. 2. Divide by 4, discarding any fraction. 3. Add the day of the month. 4. Add the month's key value: JFM AMJ JAS OND 144 025 036 146 5. Subtract 1 for January or February of a leap year. 6. For a Gregorian date, add 0 for 1900's, 6 for 2000's, 4 for 1700's, 2 for 1800's; for other years, add or subtract multiples of 400. 7. For a Julian date, add 1 for 1700's, and 1 for every additional century you go back. 8. Add the last two digits of the year. 9. Divide by 7 and take the remainder. --- Madhav mzp at us.ibm.com gpfsug-discuss-request---01/28/2021 01:32:13 PM---Send gpfsug-discuss mailing list submissions to gpfsug-discuss at spectrumscale.org From: gpfsug-discuss-request at spectrumscale.org To: gpfsug-discuss at spectrumscale.org Date: 01/28/2021 01:32 PM Subject: [EXTERNAL] gpfsug-discuss Digest, Vol 108, Issue 18 Sent by: gpfsug-discuss-bounces at spectrumscale.org Send gpfsug-discuss mailing list submissions to gpfsug-discuss at spectrumscale.org To subscribe or unsubscribe via the World Wide Web, visit http://gpfsug.org/mailman/listinfo/gpfsug-discuss or, via email, send a message with subject or body 'help' to gpfsug-discuss-request at spectrumscale.org You can reach the person managing the list at gpfsug-discuss-owner at spectrumscale.org When replying, please edit your Subject line so it is more specific than "Re: Contents of gpfsug-discuss digest..." Today's Topics: 1. Re: Policy Rules Syntax to find files older than X days excluding weekends in the calculation.... (Owen Morgan) 2. Number of vCPUs exceeded (Mervini, Joseph A) ---------------------------------------------------------------------- Message: 1 Date: Thu, 28 Jan 2021 14:27:35 +0000 From: Owen Morgan To: "mark.bergman at uphs.upenn.edu" , "gpfsug-discuss at spectrumscale.org" Subject: Re: [gpfsug-discuss] Policy Rules Syntax to find files older than X days excluding weekends in the calculation.... Message-ID: Content-Type: text/plain; charset="utf-8" Mark, Thank you for taking the time to comment, I genuinely appreciate it! I will digest and look at the mmfind examples (to be honest, didn't know it was a thing.....). Everything I know about Spectrum Scale (and Spectrum Archive) has been self taught so...... I'm pretty sure I'm missing Soooooooooo much useful info! I wish there was like a dummies guide (I've read the redbooks and admin guides as best I can but I know my knowledge is patchy at best)! Once digested I may, or may not, have further questions but I genuinely thank you for your assistance. Owen. [Sent from Front] Owen Morgan Data Wrangler Motion Picture Solutions Ltd T: E: owen.morgan at motionpicturesolutions.com | W: motionpicturesolutions.com A: Mission Hall, 9-11 North End Road, London, W14 8ST Motion Picture Solutions Ltd is a company registered in England and Wales under number 5388229, VAT number 201330482 On Wed, Jan 27 at 11:53 pm, > mark.bergman at uphs.upenn.edu< mailto:mark.bergman at uphs.upenn.edu> wrote: In the message dated: Wed, 27 Jan 2021 22:17:09 +0000, The pithy ruminations from Owen Morgan on [[External] [gpfsug-discuss] Policy Rules Syntax to find files older than X days excluding weekends in the calculation....] were: => Hi Everyone, => => First question from me I appreciate this is policy engine thing as => opposed to more fundamental Spectrum Scale so hope its ok! It's great. => => I'm trying to find a 'neat' way within a couple of policy rules to => measure different time intervals (in days) but solely interested in WEEK => DAYS only (ie delete files older than X week days only). Policy SQL syntax gives me a headache. For this kind of task, I find that mmfind is your friend -- it's in the "examples" source dir within /usr/lpp/mmfs. Trivial to compile & install. Easier to debug, and it will generate the SQL. => => An example is one of the rules a team would like implemented is delete => all files older than 10 business days (ie week days only. We are What about "delete all files older than 12 calendar days" -- by definition, those files are older than 10 business days as well. => ignoring public holidays as if they don't exist). Followed by a separate => rule for a different folder of deleting all files older than 4 business => days. Or, older than 6 calendar days. Or, run this nightly: #! /bin/bash dateOffset=0 if [ `date '+%u'` -le 4 ] ; then # Mon=1, Tue=2, Wed=3, Thu=4 # # For a file to be more than 4 business days old on-or-before the # 4th day of the week, it must span the weekend, so offset the number # of required days in the file age dateOffset=2 fi mmfind -mtime $((4 + $dateOffset)) /path/to/Nuke/After/4/Days -xarg rm -f => => Thanks in advance, => => Owen. [Sent from Front] => => Owen Morgan Data Wrangler Motion Picture Solutions Ltd T: E: => owen.morgan at motionpicturesolutions.com< mailto:owen.morgan at motionpicturesolutions.com> | W: motionpicturesolutions.com => A: Mission Hall, 9-11 North End Road, London, W14 8ST Motion Picture => Solutions Ltd is a company registered in England and Wales under number => 5388229, VAT number 201330482 => -------------- next part -------------- An HTML attachment was scrubbed... URL: < http://gpfsug.org/pipermail/gpfsug-discuss/attachments/20210128/201a280e/attachment-0001.html > ------------------------------ Message: 2 Date: Thu, 28 Jan 2021 18:26:37 +0000 From: "Mervini, Joseph A" To: "gpfsug-discuss at spectrumscale.org" Subject: [gpfsug-discuss] Number of vCPUs exceeded Message-ID: <59193954-B649-4DF5-AD21-652922E49FD9 at contoso.com> Content-Type: text/plain; charset="utf-8" Hi, I haven?t seen this before but one of my remote cluster users reported the system in question is experiencing high loads and is with Scale unmounting the file system. This is the output she is seeing: Wed Jan 27 22:18:34.168 2021: [I] GPFS vCPU limits: Low warning limit 3 vCPUs, High warning limit 256 vCPUs, Hard limit 1536 vCPUs. Wed Jan 27 22:18:34.169 2021: [I] GPFS vCPU limits include all vCPUs that Linux sees as online or possibly online via hot add, ht/smt changes, etc. Wed Jan 27 22:18:34.170 2021: [X] GPFS detected 1792 vCPUs. This exceeds the warning limit of 256 vCPUs and the hard limit of 1536 vCPUs. GPFS will shutdown Any help will be appreciated. Thanks, Joe ==== Joe Mervini Sandia National Laboratories High Performance Computing 505.844.6770 jamervi at sandia.gov -------------- next part -------------- An HTML attachment was scrubbed... URL: < http://gpfsug.org/pipermail/gpfsug-discuss/attachments/20210128/930fadb1/attachment.html > ------------------------------ _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss End of gpfsug-discuss Digest, Vol 108, Issue 18 *********************************************** _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss From mzp at us.ibm.com Fri Jan 29 12:38:37 2021 From: mzp at us.ibm.com (Madhav Ponamgi1) Date: Fri, 29 Jan 2021 07:38:37 -0500 Subject: [gpfsug-discuss] gpfsug-discuss Digest, Vol 108, Issue 20 In-Reply-To: References: Message-ID: Here is a simple C function posted from comp.lang.c many years ago that works for a restricted range (year > 1752) based on the algorithm I described earlier. dayofweek(y, m, d) { y -= m < 3; return (y + y/4 - y/100 + y/400 + "-bed=pen+mad."[m] + d) % 7; } --- Madhav From: gpfsug-discuss-request at spectrumscale.org To: gpfsug-discuss at spectrumscale.org Date: 01/29/2021 07:00 AM Subject: [EXTERNAL] gpfsug-discuss Digest, Vol 108, Issue 20 Sent by: gpfsug-discuss-bounces at spectrumscale.org Send gpfsug-discuss mailing list submissions to gpfsug-discuss at spectrumscale.org To subscribe or unsubscribe via the World Wide Web, visit http://gpfsug.org/mailman/listinfo/gpfsug-discuss or, via email, send a message with subject or body 'help' to gpfsug-discuss-request at spectrumscale.org You can reach the person managing the list at gpfsug-discuss-owner at spectrumscale.org When replying, please edit your Subject line so it is more specific than "Re: Contents of gpfsug-discuss digest..." Today's Topics: 1. Re: gpfsug-discuss Digest, Vol 108, Issue 18 (Uwe Falke) ---------------------------------------------------------------------- Message: 1 Date: Thu, 28 Jan 2021 20:54:38 +0100 From: "Uwe Falke" To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] gpfsug-discuss Digest, Vol 108, Issue 18 Message-ID: Content-Type: text/plain; charset="ISO-8859-1" sounds quite complicated. if all public holidays can be ignored it is simple: the algorithm has only to run on week days (the effective age of files would not change on weekend days.). To find the latest date to remove files: Now, enumerate the weekdays, starting with Mon=1 If your max age is T find the integer multiple of 5 and the remainder such that T=T_i*5 +R Determine the current DoW in terms of your enumeration. if DoW - R > 0, your max age date is Dx=D-(R+7*T_i) else your max age date is Dx=D-(R+2+7*T_i dates can be easily compiled in epoch, like D_e=$(date +%s), Dx_e = D_e - 86400*(R+7*T_i) or Dx_e = D_e - 86400*(R+2+7*T_i) you then need to convert the found epoch time back into a christian date which could be done by date --date='@ To: gpfsug-discuss at spectrumscale.org Date: 28/01/2021 19:43 Subject: [EXTERNAL] Re: [gpfsug-discuss] gpfsug-discuss Digest, Vol 108, Issue 18 Sent by: gpfsug-discuss-bounces at spectrumscale.org To calculate this directly (if you don't want to depend on a utility) consider the following steps. There are many more such algorithms in the wonderful book Calenderical Calculations. 1. Take the last two digits of the year. 2. Divide by 4, discarding any fraction. 3. Add the day of the month. 4. Add the month's key value: JFM AMJ JAS OND 144 025 036 146 5. Subtract 1 for January or February of a leap year. 6. For a Gregorian date, add 0 for 1900's, 6 for 2000's, 4 for 1700's, 2 for 1800's; for other years, add or subtract multiples of 400. 7. For a Julian date, add 1 for 1700's, and 1 for every additional century you go back. 8. Add the last two digits of the year. 9. Divide by 7 and take the remainder. --- Madhav mzp at us.ibm.com gpfsug-discuss-request---01/28/2021 01:32:13 PM---Send gpfsug-discuss mailing list submissions to gpfsug-discuss at spectrumscale.org From: gpfsug-discuss-request at spectrumscale.org To: gpfsug-discuss at spectrumscale.org Date: 01/28/2021 01:32 PM Subject: [EXTERNAL] gpfsug-discuss Digest, Vol 108, Issue 18 Sent by: gpfsug-discuss-bounces at spectrumscale.org Send gpfsug-discuss mailing list submissions to gpfsug-discuss at spectrumscale.org To subscribe or unsubscribe via the World Wide Web, visit http://gpfsug.org/mailman/listinfo/gpfsug-discuss or, via email, send a message with subject or body 'help' to gpfsug-discuss-request at spectrumscale.org You can reach the person managing the list at gpfsug-discuss-owner at spectrumscale.org When replying, please edit your Subject line so it is more specific than "Re: Contents of gpfsug-discuss digest..." Today's Topics: 1. Re: Policy Rules Syntax to find files older than X days excluding weekends in the calculation.... (Owen Morgan) 2. Number of vCPUs exceeded (Mervini, Joseph A) ---------------------------------------------------------------------- Message: 1 Date: Thu, 28 Jan 2021 14:27:35 +0000 From: Owen Morgan To: "mark.bergman at uphs.upenn.edu" , "gpfsug-discuss at spectrumscale.org" Subject: Re: [gpfsug-discuss] Policy Rules Syntax to find files older than X days excluding weekends in the calculation.... Message-ID: Content-Type: text/plain; charset="utf-8" Mark, Thank you for taking the time to comment, I genuinely appreciate it! I will digest and look at the mmfind examples (to be honest, didn't know it was a thing.....). Everything I know about Spectrum Scale (and Spectrum Archive) has been self taught so...... I'm pretty sure I'm missing Soooooooooo much useful info! I wish there was like a dummies guide (I've read the redbooks and admin guides as best I can but I know my knowledge is patchy at best)! Once digested I may, or may not, have further questions but I genuinely thank you for your assistance. Owen. [Sent from Front] Owen Morgan Data Wrangler Motion Picture Solutions Ltd T: E: owen.morgan at motionpicturesolutions.com | W: motionpicturesolutions.com A: Mission Hall, 9-11 North End Road, London, W14 8ST Motion Picture Solutions Ltd is a company registered in England and Wales under number 5388229, VAT number 201330482 On Wed, Jan 27 at 11:53 pm, > mark.bergman at uphs.upenn.edu< mailto:mark.bergman at uphs.upenn.edu> wrote: In the message dated: Wed, 27 Jan 2021 22:17:09 +0000, The pithy ruminations from Owen Morgan on [[External] [gpfsug-discuss] Policy Rules Syntax to find files older than X days excluding weekends in the calculation....] were: => Hi Everyone, => => First question from me I appreciate this is policy engine thing as => opposed to more fundamental Spectrum Scale so hope its ok! It's great. => => I'm trying to find a 'neat' way within a couple of policy rules to => measure different time intervals (in days) but solely interested in WEEK => DAYS only (ie delete files older than X week days only). Policy SQL syntax gives me a headache. For this kind of task, I find that mmfind is your friend -- it's in the "examples" source dir within /usr/lpp/mmfs. Trivial to compile & install. Easier to debug, and it will generate the SQL. => => An example is one of the rules a team would like implemented is delete => all files older than 10 business days (ie week days only. We are What about "delete all files older than 12 calendar days" -- by definition, those files are older than 10 business days as well. => ignoring public holidays as if they don't exist). Followed by a separate => rule for a different folder of deleting all files older than 4 business => days. Or, older than 6 calendar days. Or, run this nightly: #! /bin/bash dateOffset=0 if [ `date '+%u'` -le 4 ] ; then # Mon=1, Tue=2, Wed=3, Thu=4 # # For a file to be more than 4 business days old on-or-before the # 4th day of the week, it must span the weekend, so offset the number # of required days in the file age dateOffset=2 fi mmfind -mtime $((4 + $dateOffset)) /path/to/Nuke/After/4/Days -xarg rm -f => => Thanks in advance, => => Owen. [Sent from Front] => => Owen Morgan Data Wrangler Motion Picture Solutions Ltd T: E: => owen.morgan at motionpicturesolutions.com< mailto:owen.morgan at motionpicturesolutions.com> | W: motionpicturesolutions.com< http://motionpicturesolutions.com > => A: Mission Hall, 9-11 North End Road, London, W14 8ST Motion Picture => Solutions Ltd is a company registered in England and Wales under number => 5388229, VAT number 201330482 => -------------- next part -------------- An HTML attachment was scrubbed... URL: < http://gpfsug.org/pipermail/gpfsug-discuss/attachments/20210128/201a280e/attachment-0001.html > ------------------------------ Message: 2 Date: Thu, 28 Jan 2021 18:26:37 +0000 From: "Mervini, Joseph A" To: "gpfsug-discuss at spectrumscale.org" Subject: [gpfsug-discuss] Number of vCPUs exceeded Message-ID: <59193954-B649-4DF5-AD21-652922E49FD9 at contoso.com> Content-Type: text/plain; charset="utf-8" Hi, I haven?t seen this before but one of my remote cluster users reported the system in question is experiencing high loads and is with Scale unmounting the file system. This is the output she is seeing: Wed Jan 27 22:18:34.168 2021: [I] GPFS vCPU limits: Low warning limit 3 vCPUs, High warning limit 256 vCPUs, Hard limit 1536 vCPUs. Wed Jan 27 22:18:34.169 2021: [I] GPFS vCPU limits include all vCPUs that Linux sees as online or possibly online via hot add, ht/smt changes, etc. Wed Jan 27 22:18:34.170 2021: [X] GPFS detected 1792 vCPUs. This exceeds the warning limit of 256 vCPUs and the hard limit of 1536 vCPUs. GPFS will shutdown Any help will be appreciated. Thanks, Joe ==== Joe Mervini Sandia National Laboratories High Performance Computing 505.844.6770 jamervi at sandia.gov -------------- next part -------------- An HTML attachment was scrubbed... URL: < http://gpfsug.org/pipermail/gpfsug-discuss/attachments/20210128/930fadb1/attachment.html > ------------------------------ _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss End of gpfsug-discuss Digest, Vol 108, Issue 18 *********************************************** _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss ------------------------------ _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss End of gpfsug-discuss Digest, Vol 108, Issue 20 *********************************************** -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: From jpr9c at virginia.edu Fri Jan 29 19:47:13 2021 From: jpr9c at virginia.edu (Ruffner, Scott (jpr9c)) Date: Fri, 29 Jan 2021 19:47:13 +0000 Subject: [gpfsug-discuss] Adding client nodes using a shared NFS root image. Message-ID: <4A332838-9D59-477D-AAE2-F79F8AAD143B@virginia.edu> Hi everyone, We want all of our compute nodes (bare metal) to directly participate in the cluster as client nodes; of course, they are sharing a common root image. Adding nodes via the regular mmaddnode (with the dsh operation to replicate files to the clients) isn?t really viable, but if I short-circuit that, and simply generate the /var/mmfs/gen files and then manually copy those and the keyfiles to the shared root images, is that safe? Am I going about this the entirely wrong way? -- Scott Ruffner Senior HPC Engineer UVa Research Computing (434)924-6778(o) (434)295-0250(h) sruffner at virginia.edu -------------- next part -------------- An HTML attachment was scrubbed... URL: From david_johnson at brown.edu Fri Jan 29 19:52:04 2021 From: david_johnson at brown.edu (david_johnson at brown.edu) Date: Fri, 29 Jan 2021 14:52:04 -0500 Subject: [gpfsug-discuss] Adding client nodes using a shared NFS root image. In-Reply-To: <4A332838-9D59-477D-AAE2-F79F8AAD143B@virginia.edu> References: <4A332838-9D59-477D-AAE2-F79F8AAD143B@virginia.edu> Message-ID: <094EDEFE-4B15-4214-90C4-CD83BC76A10A@brown.edu> We use mmsdrrestore after the node boots. In our case these are diskless nodes provisioned by xCAT. The post install script takes care of ensuring infiniband is lit up, and does the mmsdrrestore followed by mmstartup. -- ddj Dave Johnson > On Jan 29, 2021, at 2:47 PM, Ruffner, Scott (jpr9c) wrote: > > ? > Hi everyone, > > We want all of our compute nodes (bare metal) to directly participate in the cluster as client nodes; of course, they are sharing a common root image. > > Adding nodes via the regular mmaddnode (with the dsh operation to replicate files to the clients) isn?t really viable, but if I short-circuit that, and simply generate the /var/mmfs/gen files and then manually copy those and the keyfiles to the shared root images, is that safe? > > Am I going about this the entirely wrong way? > > -- > Scott Ruffner > Senior HPC Engineer > UVa Research Computing > (434)924-6778(o) > (434)295-0250(h) > sruffner at virginia.edu > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From jpr9c at virginia.edu Fri Jan 29 20:04:32 2021 From: jpr9c at virginia.edu (Ruffner, Scott (jpr9c)) Date: Fri, 29 Jan 2021 20:04:32 +0000 Subject: [gpfsug-discuss] Adding client nodes using a shared NFS root image. In-Reply-To: <094EDEFE-4B15-4214-90C4-CD83BC76A10A@brown.edu> References: <4A332838-9D59-477D-AAE2-F79F8AAD143B@virginia.edu> <094EDEFE-4B15-4214-90C4-CD83BC76A10A@brown.edu> Message-ID: <6A72D8F2-65ED-431C-B13F-3D4F189A53DF@virginia.edu> Thanks David! Slick solution. -- Scott Ruffner Senior HPC Engineer UVa Research Computing (434)924-6778(o) (434)295-0250(h) sruffner at virginia.edu From: on behalf of "david_johnson at brown.edu" Reply-To: gpfsug main discussion list Date: Friday, January 29, 2021 at 2:52 PM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] Adding client nodes using a shared NFS root image. We use mmsdrrestore after the node boots. In our case these are diskless nodes provisioned by xCAT. The post install script takes care of ensuring infiniband is lit up, and does the mmsdrrestore followed by mmstartup. -- ddj Dave Johnson On Jan 29, 2021, at 2:47 PM, Ruffner, Scott (jpr9c) wrote: Hi everyone, We want all of our compute nodes (bare metal) to directly participate in the cluster as client nodes; of course, they are sharing a common root image. Adding nodes via the regular mmaddnode (with the dsh operation to replicate files to the clients) isn?t really viable, but if I short-circuit that, and simply generate the /var/mmfs/gen files and then manually copy those and the keyfiles to the shared root images, is that safe? Am I going about this the entirely wrong way? -- Scott Ruffner Senior HPC Engineer UVa Research Computing (434)924-6778(o) (434)295-0250(h) sruffner at virginia.edu _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From owen.morgan at motionpicturesolutions.com Sat Jan 30 00:31:27 2021 From: owen.morgan at motionpicturesolutions.com (Owen Morgan) Date: Sat, 30 Jan 2021 00:31:27 +0000 Subject: [gpfsug-discuss] gpfsug-discuss Digest, Vol 108, Issue 18 Message-ID: Hi all, Sorry I appear to have missed a load of replies and screwed up the threading thing when looking online... not used to this email group thing! Might look at the slack option! Just wanted to clarify my general issue a bit: So the methodology I've started to implement is per department policy files where all rules related to managing a specific teams assets are all in one policy file and then I have fine control over when and how each departments rule run, when, and potentially (if it mattered) what order etc. So team a want me to manage two folders where in folder 1a all files older than 4 week days of age are deleted, and in filder 1b all files older than 8 week days are deleted. They now want me to manage a different set of two folders with two different "thresholds" for how old they need to be in week days before they delete (ie. I now need additional rules for folders 2a and 2b). The issue is for each scenario there is a different 'offset' required depending on the day of the week the policy is run to maintian the number of weekdays required (the 'threshold' is always in weekdays, so intervening weekends need to be added to take them into account). For instance when run on a Monday, if the threshold were 4 weekdays of age, I need to be deleting files that were created on the previous Tuesday. Which is 6 days (ie 4 days + 2 weekend days). If the threshold was 8 week days the threhold in terms of the policy would be 12 (ie 8 plus 2x 2 weekend days). The only way I was able to work this out in the sql like policy file was to split the week days into groups where the offset would be the same (so for 4 week days, Monday through Thursday share the offset of 2 - which then has to be added to the 4 for the desired result) and then a separate rule for the Friday. However for every addition of a different threshold I have to write all new groups to match the days etc.. so the policy ends up with 6 rules but 150 lines of definition macros.... I was trying to work out if there was a more concise way of, within the sql like framework, programmatically calculating the day offest the needs to be added to the threshold to allow a more generic function that could just automatically work it out.... The algorithm I have recently thought up is to effectively calculate the difference in weeks between the current run time and the desired deletion day and multiply it by 2. Psudocode it would be (threshold is the number of week days for the rule, offset is the number that needs to be added to account for the weekends between those dates): If current day of month - threshold = sunday, then add 1 to the threshold value (sundays are de oted as the week start so Saturday would represent the previous week). Offset = (difference between current week and week of (current day of month - threshold)) x 2 A worked example: Threshold = 11 week days Policy run on the 21st Jan which is the week 4 of 2021 21st - 11 days = Sunday 10th Therefore need to add 1 to threshold to push the day into the previous week. New threshold is 12 Saturday 9th is in week 2 of 2021 so the offset is week 4 - week 2 = 2 (ie difference in weeks) x 2 which is 4. Add 4 to the original 11 to make 15. So for the policy running on the 21st Jan to delete only files older than 11 week days of age I need to set my rule to be Delete where ((Current_date - creation_time) >= interval '15' days Unfortunately, I'm now struggling to implement that algorithm..... it seems the SQL-ness is very limited and I cant declare variables to use or stuff.... its a shame as that algorithm is generic so only needs to be written once and you could have ad many unique rules as you want all with different thresholds etc... Is there another way to get the same results? I would prefer to stay in the bounds of the SQL policy rule setup as that is the framework I have created and started to implement.. Hope the above gives more clarity to what Im asking.... sorry if one of the previous rplies addresses this, if it does I clearly was confused by the response (I seriously feel like an amateur at this at the moment and am having to learn all these finer things as I go). Thanks in advance, Owen. Owen Morgan Data Wrangler Motion Picture Solutions Ltd T: E: owen.morgan at motionpicturesolutions.com | W: motionpicturesolutions.com A: Mission Hall, 9-11 North End Road, London, W14 8ST Motion Picture Solutions Ltd is a company registered in England and Wales under number 5388229, VAT number 201330482 -------------- next part -------------- An HTML attachment was scrubbed... URL: From anacreo at gmail.com Sat Jan 30 02:53:49 2021 From: anacreo at gmail.com (Alec) Date: Fri, 29 Jan 2021 18:53:49 -0800 Subject: [gpfsug-discuss] gpfsug-discuss Digest, Vol 108, Issue 18 In-Reply-To: References: Message-ID: Based on the problem you have. I would write an mmfind / mmxarg command that sets a custom attr such as puge.after, have a ksh/perl/php script that simply makes the necessary calculations using all the tricks it has... Skip files that already have the attribute set, or are too new to bother having the attribute. Then use a single purge policy to query all files that have a purge.after set to the appropriate datestamp. You could get way more concise with this mechanism and have a much simpler process. Alec On Fri, Jan 29, 2021 at 4:32 PM Owen Morgan < owen.morgan at motionpicturesolutions.com> wrote: > Hi all, > > Sorry I appear to have missed a load of replies and screwed up the > threading thing when looking online... not used to this email group thing! > Might look at the slack option! > > Just wanted to clarify my general issue a bit: > > So the methodology I've started to implement is per department policy > files where all rules related to managing a specific teams assets are all > in one policy file and then I have fine control over when and how each > departments rule run, when, and potentially (if it mattered) what order etc. > > > So team a want me to manage two folders where in folder 1a all files older > than 4 week days of age are deleted, and in filder 1b all files older than > 8 week days are deleted. > > They now want me to manage a different set of two folders with two > different "thresholds" for how old they need to be in week days before they > delete (ie. I now need additional rules for folders 2a and 2b). > > > The issue is for each scenario there is a different 'offset' required > depending on the day of the week the policy is run to maintian the number > of weekdays required (the 'threshold' is always in weekdays, so intervening > weekends need to be added to take them into account). > > For instance when run on a Monday, if the threshold were 4 weekdays of > age, I need to be deleting files that were created on the previous Tuesday. > Which is 6 days (ie 4 days + 2 weekend days). If the threshold was 8 week > days the threhold in terms of the policy would be 12 (ie 8 plus 2x 2 > weekend days). > > > The only way I was able to work this out in the sql like policy file was > to split the week days into groups where the offset would be the same (so > for 4 week days, Monday through Thursday share the offset of 2 - which then > has to be added to the 4 for the desired result) and then a separate rule > for the Friday. > > > However for every addition of a different threshold I have to write all > new groups to match the days etc.. so the policy ends up with 6 rules but > 150 lines of definition macros.... > > > I was trying to work out if there was a more concise way of, within the > sql like framework, programmatically calculating the day offest the needs > to be added to the threshold to allow a more generic function that could > just automatically work it out.... > > > The algorithm I have recently thought up is to effectively calculate the > difference in weeks between the current run time and the desired deletion > day and multiply it by 2. > > > Psudocode it would be (threshold is the number of week days for the rule, > offset is the number that needs to be added to account for the weekends > between those dates): > > > If current day of month - threshold = sunday, then add 1 to the threshold > value (sundays are de oted as the week start so Saturday would represent > the previous week). > > Offset = (difference between current week and week of (current day of > month - threshold)) x 2 > > A worked example: > > Threshold = 11 week days > Policy run on the 21st Jan which is the week 4 of 2021 > > 21st - 11 days = Sunday 10th > > Therefore need to add 1 to threshold to push the day into the previous > week. New threshold is 12 > > Saturday 9th is in week 2 of 2021 so the offset is week 4 - week 2 = 2 (ie > difference in weeks) x 2 which is 4. > > Add 4 to the original 11 to make 15. > > So for the policy running on the 21st Jan to delete only files older than > 11 week days of age I need to set my rule to be > > Delete where ((Current_date - creation_time) >= interval '15' days > > > Unfortunately, I'm now struggling to implement that algorithm..... it > seems the SQL-ness is very limited and I cant declare variables to use or > stuff.... its a shame as that algorithm is generic so only needs to be > written once and you could have ad many unique rules as you want all with > different thresholds etc... > > Is there another way to get the same results? > > I would prefer to stay in the bounds of the SQL policy rule setup as that > is the framework I have created and started to implement.. > > Hope the above gives more clarity to what Im asking.... sorry if one of > the previous rplies addresses this, if it does I clearly was confused by > the response (I seriously feel like an amateur at this at the moment and am > having to learn all these finer things as I go). > > Thanks in advance, > > Owen. > > Owen Morgan? > Data Wrangler > Motion Picture Solutions Ltd > T: > E: *owen.morgan at motionpicturesolutions.com* > | W: > *motionpicturesolutions.com* > A: Mission Hall, 9?11 North End Road , London , W14 8ST > Motion Picture Solutions Ltd is a company registered in England and Wales > under number 5388229, VAT number 201330482 > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: From anacreo at gmail.com Sat Jan 30 03:07:24 2021 From: anacreo at gmail.com (Alec) Date: Fri, 29 Jan 2021 19:07:24 -0800 Subject: [gpfsug-discuss] gpfsug-discuss Digest, Vol 108, Issue 18 In-Reply-To: References: Message-ID: Also a caution on this... you may want to retain the file's modified time in something like purge.modified... so you can also re-calc for files where purge.modified != file modified time. Else you may purge something too early. Alec On Fri, Jan 29, 2021 at 6:53 PM Alec wrote: > Based on the problem you have. > > I would write an mmfind / mmxarg command that sets a custom attr such as > puge.after, have a ksh/perl/php script that simply makes the necessary > calculations using all the tricks it has... Skip files that already have > the attribute set, or are too new to bother having the attribute. > > Then use a single purge policy to query all files that have a purge.after > set to the appropriate datestamp. > > You could get way more concise with this mechanism and have a much simpler > process. > > Alec > > On Fri, Jan 29, 2021 at 4:32 PM Owen Morgan < > owen.morgan at motionpicturesolutions.com> wrote: > >> Hi all, >> >> Sorry I appear to have missed a load of replies and screwed up the >> threading thing when looking online... not used to this email group thing! >> Might look at the slack option! >> >> Just wanted to clarify my general issue a bit: >> >> So the methodology I've started to implement is per department policy >> files where all rules related to managing a specific teams assets are all >> in one policy file and then I have fine control over when and how each >> departments rule run, when, and potentially (if it mattered) what order etc. >> >> >> So team a want me to manage two folders where in folder 1a all files >> older than 4 week days of age are deleted, and in filder 1b all files older >> than 8 week days are deleted. >> >> They now want me to manage a different set of two folders with two >> different "thresholds" for how old they need to be in week days before they >> delete (ie. I now need additional rules for folders 2a and 2b). >> >> >> The issue is for each scenario there is a different 'offset' required >> depending on the day of the week the policy is run to maintian the number >> of weekdays required (the 'threshold' is always in weekdays, so intervening >> weekends need to be added to take them into account). >> >> For instance when run on a Monday, if the threshold were 4 weekdays of >> age, I need to be deleting files that were created on the previous Tuesday. >> Which is 6 days (ie 4 days + 2 weekend days). If the threshold was 8 week >> days the threhold in terms of the policy would be 12 (ie 8 plus 2x 2 >> weekend days). >> >> >> The only way I was able to work this out in the sql like policy file was >> to split the week days into groups where the offset would be the same (so >> for 4 week days, Monday through Thursday share the offset of 2 - which then >> has to be added to the 4 for the desired result) and then a separate rule >> for the Friday. >> >> >> However for every addition of a different threshold I have to write all >> new groups to match the days etc.. so the policy ends up with 6 rules but >> 150 lines of definition macros.... >> >> >> I was trying to work out if there was a more concise way of, within the >> sql like framework, programmatically calculating the day offest the needs >> to be added to the threshold to allow a more generic function that could >> just automatically work it out.... >> >> >> The algorithm I have recently thought up is to effectively calculate the >> difference in weeks between the current run time and the desired deletion >> day and multiply it by 2. >> >> >> Psudocode it would be (threshold is the number of week days for the rule, >> offset is the number that needs to be added to account for the weekends >> between those dates): >> >> >> If current day of month - threshold = sunday, then add 1 to the threshold >> value (sundays are de oted as the week start so Saturday would represent >> the previous week). >> >> Offset = (difference between current week and week of (current day of >> month - threshold)) x 2 >> >> A worked example: >> >> Threshold = 11 week days >> Policy run on the 21st Jan which is the week 4 of 2021 >> >> 21st - 11 days = Sunday 10th >> >> Therefore need to add 1 to threshold to push the day into the previous >> week. New threshold is 12 >> >> Saturday 9th is in week 2 of 2021 so the offset is week 4 - week 2 = 2 >> (ie difference in weeks) x 2 which is 4. >> >> Add 4 to the original 11 to make 15. >> >> So for the policy running on the 21st Jan to delete only files older than >> 11 week days of age I need to set my rule to be >> >> Delete where ((Current_date - creation_time) >= interval '15' days >> >> >> Unfortunately, I'm now struggling to implement that algorithm..... it >> seems the SQL-ness is very limited and I cant declare variables to use or >> stuff.... its a shame as that algorithm is generic so only needs to be >> written once and you could have ad many unique rules as you want all with >> different thresholds etc... >> >> Is there another way to get the same results? >> >> I would prefer to stay in the bounds of the SQL policy rule setup as that >> is the framework I have created and started to implement.. >> >> Hope the above gives more clarity to what Im asking.... sorry if one of >> the previous rplies addresses this, if it does I clearly was confused by >> the response (I seriously feel like an amateur at this at the moment and am >> having to learn all these finer things as I go). >> >> Thanks in advance, >> >> Owen. >> >> Owen Morgan? >> Data Wrangler >> Motion Picture Solutions Ltd >> T: >> E: *owen.morgan at motionpicturesolutions.com* >> | W: >> *motionpicturesolutions.com* >> A: Mission Hall, 9?11 North End Road , London , W14 8ST >> Motion Picture Solutions Ltd is a company registered in England and Wales >> under number 5388229, VAT number 201330482 >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at spectrumscale.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From owen.morgan at motionpicturesolutions.com Sat Jan 30 03:39:42 2021 From: owen.morgan at motionpicturesolutions.com (Owen Morgan) Date: Sat, 30 Jan 2021 03:39:42 +0000 Subject: [gpfsug-discuss] gpfsug-discuss Digest, Vol 108, Issue 18 Message-ID: Alec, Thank you for your response! I get it now! And, I also understand some of the other peoples responses better as well! Not only does this make sense I also suppose that it shows I have to broaden my 'ideas' as to what tools avaliable can be used more than mmapplypolicy and policy files alone. Using the power of all of them provides more ability than just focusing on one! Just want to thank you, and the other respondents as you've genuinely helped me and I've learnt new things in the process (until I posted the original question I didn't even know mmfind was a thing!) Thanks! Owen. Owen Morgan Data Wrangler Motion Picture Solutions Ltd T: E: owen.morgan at motionpicturesolutions.com | W: motionpicturesolutions.com A: Mission Hall, 9-11 North End Road, London, W14 8ST Motion Picture Solutions Ltd is a company registered in England and Wales under number 5388229, VAT number 201330482 -------------- next part -------------- An HTML attachment was scrubbed... URL: From anacreo at gmail.com Sat Jan 30 04:40:44 2021 From: anacreo at gmail.com (Alec) Date: Fri, 29 Jan 2021 20:40:44 -0800 Subject: [gpfsug-discuss] gpfsug-discuss Digest, Vol 108, Issue 18 In-Reply-To: References: Message-ID: No problem at all. If you can't get mmfind compiled... you can do everything it does via mmapplypolicy. But it is certainly easier with mmfind to add in options dynamically. I have modified the program that mmfind invokes... I forget offhand tr_Polsomething.pl to add functions such as -gpfsCompress_lz4 and -gpfsIsCompressed. Spectrum Scale really has way more power than most people know what to do with... I wish there was a much richer library of scripts available. For instance with mmfind, this saved my bacon a few days ago.. as our 416TB file system had less than 400GB free... mmfind -polArgs "-a 8 -N node1,node2 -B 20" /sasfilesystem -mtime +1800 -name '*.sas7bdat' -size +1G -not -gpfsIsCompressed -gpfsCompress_lz4 (I had to add in my own -gpfsIsCompressed and -gpfsCompress_lz4 features... but that was fairly easy) -- Find any file named '*.sas7bdat' over 1800 days (5 years), larger than 1G, and compress it down using lz4... Farmed it out to my two app nodes 8 threads each... and 14000 files compressed overnight. Next morning I had an extra 5TB of free space.. funny thing is I needed to run it on my app nodes to slow down their write capacity so we didn't get a fatal out of capacity. If you really want to have fun, check out the ksh93 built in time functions pairs nicely with this requirement. Output the day of the week corresponding to the last day of February 2008. $ printf "%(%a)T\n" "final day Feb 2008" Fri Output the date corresponding to the third Wednesday in May 2008. $ printf "%(%D)T\n" "3rd wednesday may 2008" 05/21/08 Output what date it was 4 weeks ago. $ printf "%(%D)T\n" "4 weeks ago" 02/18/08 Read more: https://blog.fpmurphy.com/2008/10/ksh93-date-manipulation.html#ixzz6l0Egm6hp On Fri, Jan 29, 2021 at 7:39 PM Owen Morgan < owen.morgan at motionpicturesolutions.com> wrote: > Alec, > > Thank you for your response! > > I get it now! And, I also understand some of the other peoples responses > better as well! > > Not only does this make sense I also suppose that it shows I have to > broaden my 'ideas' as to what tools avaliable can be used more than > mmapplypolicy and policy files alone. Using the power of all of them > provides more ability than just focusing on one! > > Just want to thank you, and the other respondents as you've genuinely > helped me and I've learnt new things in the process (until I posted the > original question I didn't even know mmfind was a thing!) > > Thanks! > > Owen. > > Owen Morgan? > Data Wrangler > Motion Picture Solutions Ltd > T: > E: *owen.morgan at motionpicturesolutions.com* > | W: > *motionpicturesolutions.com* > A: Mission Hall, 9?11 North End Road , London , W14 8ST > Motion Picture Solutions Ltd is a company registered in England and Wales > under number 5388229, VAT number 201330482 > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: From Walter.Sklenka at EDV-Design.at Sat Jan 30 05:45:47 2021 From: Walter.Sklenka at EDV-Design.at (Walter Sklenka) Date: Sat, 30 Jan 2021 05:45:47 +0000 Subject: [gpfsug-discuss] OPA HFI and Mellanox HCA on same NSD Server with only ib rdma enabled Message-ID: <14218088180e4613847984c44e0321d8@Mail.EDVDesign.cloudia> Hi! Is it possible to mix OPAcards and Infininiband HCAs on the same server? In the faq https://www.ibm.com/support/knowledgecenter/en/STXKQY/gpfsclustersfaq.html#rdma They talk about RDMA : "RDMA is NOT supported on a node when both Mellanox HCAs and Intel Omni-Path HFIs are ENABLED for RDMA." So do I understand right: When we do NOT enable the opa interface we can still enable IB ? The reason I ask is, that we have a gpfs cluster of 6 NSD Servers (wih access to storage) with opa interfaces which provide access to remote cluster also via OPA. A new cluster with HDR interfaces will be implemented soon They shell have access to the same filesystems When we add HDR interfaces to NSD servers and enable rdma on this network while disabling rdma on opa we would accept the worse performance via opa . We hope that this provides still better perf and less technical overhead than using routers Or am I totally wrong? Thank you very much and keep healthy! Best regards Walter Mit freundlichen Gr??en Walter Sklenka Technical Consultant EDV-Design Informationstechnologie GmbH Giefinggasse 6/1/2, A-1210 Wien Tel: +43 1 29 22 165-31 Fax: +43 1 29 22 165-90 E-Mail: sklenka at edv-design.at Internet: www.edv-design.at -------------- next part -------------- An HTML attachment was scrubbed... URL: From jonathan.buzzard at strath.ac.uk Sat Jan 30 10:29:39 2021 From: jonathan.buzzard at strath.ac.uk (Jonathan Buzzard) Date: Sat, 30 Jan 2021 10:29:39 +0000 Subject: [gpfsug-discuss] gpfsug-discuss Digest, Vol 108, Issue 18 In-Reply-To: References: Message-ID: On 30/01/2021 00:31, Owen Morgan wrote: [SNIP] > > I would prefer to stay in the bounds of the SQL policy rule setup as > that is the framework I have created and started to implement.. > In general SQL is Turing complete. Though I have not checked in detail I believe the SQL of the policy engine is too. I would also note that SQL has a whole bunch of time/date functions. So something like define(offset, 4) define(day, DAYOFWEEK(CURRENT_TIMESTAMP)) define(age,(DAYS(CURRENT_TIMESTAMP)-DAYS(ACCESS_TIME))) define(workingdays, CASE WHEN day=1 THEN offest+1 WHEN day=6 THEN offset WHEN day=7 THEN offset+1 ELSE offset+2 ) /* delete all files from files older than 4 working days */ RULE purge4 DELETE WHERE (age>workingdays) FOR FILESET dummies JAB. -- Jonathan A. Buzzard Tel: +44141-5483420 HPC System Administrator, ARCHIE-WeSt. University of Strathclyde, John Anderson Building, Glasgow. G4 0NG From giovanni.bracco at enea.it Sat Jan 30 17:07:43 2021 From: giovanni.bracco at enea.it (Giovanni Bracco) Date: Sat, 30 Jan 2021 18:07:43 +0100 Subject: [gpfsug-discuss] OPA HFI and Mellanox HCA on same NSD Server with only ib rdma enabled In-Reply-To: <14218088180e4613847984c44e0321d8@Mail.EDVDesign.cloudia> References: <14218088180e4613847984c44e0321d8@Mail.EDVDesign.cloudia> Message-ID: <3bb0f4ca-f6ee-6013-45a0-e783470089f0@enea.it> In our HPC infrastructure we have 6 NSD server, running CentOS 7.4, each of them with with 1 Intel QDR HCA to a QDR Cluster (now 100 nodes SandyBridge cpu it was 300 nodes CentOS 6.5), 1 OPA HCA to the main OPA Cluster (400 nodes Skylake cpu, CentOS 7.3) and 1 Mellanox FDR to DDN storages and it works nicely using RDMA since 2018. GPFS 4.2.3-19. See F. Iannone et al., "CRESCO ENEA HPC clusters: a working example of a multifabric GPFS Spectrum Scale layout," 2019 International Conference on High Performance Computing & Simulation (HPCS), Dublin, Ireland, 2019, pp. 1051-1052, doi: 10.1109/HPCS48598.2019.918813 When setting up the system the main trick has been: just use CentOS drivers and do not install OFED We do not use IPoIB. Giovanni On 30/01/21 06:45, Walter Sklenka wrote: > Hi! > > Is it possible to mix OPAcards and Infininiband HCAs on the same server? > > In the faq > https://www.ibm.com/support/knowledgecenter/en/STXKQY/gpfsclustersfaq.html#rdma > > > They talk about RDMA : > > ?RDMA is NOT ?supported on a node when both Mellanox HCAs and Intel > Omni-Path HFIs are ENABLED for RDMA.? > > So do I understand right: When we do NOT enable ?the opa interface we > can still enable IB ? > > The reason I ask ?is, that we have a gpfs cluster of 6 NSD Servers ?(wih > access to storage) ?with opa interfaces which provide access to remote > cluster ?also via OPA. > > A new cluster with HDR interfaces will be implemented soon > > They shell have access to the same filesystems > > When we add HDR interfaces to? NSD servers? and enable rdma on this > network ?while disabling rdma on opa we would accept the worse > performance via opa . We hope that this provides ?still better perf and > less technical overhead ?than using routers > > Or am I totally wrong? > > Thank you very much and keep healthy! > > Best regards > > Walter > > Mit freundlichen Gr??en > */Walter Sklenka/* > */Technical Consultant/* > > EDV-Design Informationstechnologie GmbH > Giefinggasse 6/1/2, A-1210 Wien > Tel: +43 1 29 22 165-31 > Fax: +43 1 29 22 165-90 > E-Mail: sklenka at edv-design.at > Internet: www.edv-design.at > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -- Giovanni Bracco phone +39 351 8804788 E-mail giovanni.bracco at enea.it WWW http://www.afs.enea.it/bracco From Walter.Sklenka at EDV-Design.at Sat Jan 30 20:01:51 2021 From: Walter.Sklenka at EDV-Design.at (Walter Sklenka) Date: Sat, 30 Jan 2021 20:01:51 +0000 Subject: [gpfsug-discuss] OPA HFI and Mellanox HCA on same NSD Server with only ib rdma enabled In-Reply-To: <3bb0f4ca-f6ee-6013-45a0-e783470089f0@enea.it> References: <14218088180e4613847984c44e0321d8@Mail.EDVDesign.cloudia> <3bb0f4ca-f6ee-6013-45a0-e783470089f0@enea.it> Message-ID: Hi Giovanni! Thats great! Many thanks for your fast and detailed answer!!!! So this is the way we will go too! Have a nice weekend and keep healthy! Best regards Walter -----Original Message----- From: Giovanni Bracco Sent: Samstag, 30. J?nner 2021 18:08 To: gpfsug main discussion list ; Walter Sklenka Subject: Re: [gpfsug-discuss] OPA HFI and Mellanox HCA on same NSD Server with only ib rdma enabled In our HPC infrastructure we have 6 NSD server, running CentOS 7.4, each of them with with 1 Intel QDR HCA to a QDR Cluster (now 100 nodes SandyBridge cpu it was 300 nodes CentOS 6.5), 1 OPA HCA to the main OPA Cluster (400 nodes Skylake cpu, CentOS 7.3) and 1 Mellanox FDR to DDN storages and it works nicely using RDMA since 2018. GPFS 4.2.3-19. See F. Iannone et al., "CRESCO ENEA HPC clusters: a working example of a multifabric GPFS Spectrum Scale layout," 2019 International Conference on High Performance Computing & Simulation (HPCS), Dublin, Ireland, 2019, pp. 1051-1052, doi: 10.1109/HPCS48598.2019.918813 When setting up the system the main trick has been: just use CentOS drivers and do not install OFED We do not use IPoIB. Giovanni On 30/01/21 06:45, Walter Sklenka wrote: > Hi! > > Is it possible to mix OPAcards and Infininiband HCAs on the same server? > > In the faq > https://www.ibm.com/support/knowledgecenter/en/STXKQY/gpfsclustersfaq. > html#rdma > > > They talk about RDMA : > > "RDMA is NOT ?supported on a node when both Mellanox HCAs and Intel > Omni-Path HFIs are ENABLED for RDMA." > > So do I understand right: When we do NOT enable ?the opa interface we > can still enable IB ? > > The reason I ask ?is, that we have a gpfs cluster of 6 NSD Servers ? > (wih access to storage) ?with opa interfaces which provide access to > remote cluster ?also via OPA. > > A new cluster with HDR interfaces will be implemented soon > > They shell have access to the same filesystems > > When we add HDR interfaces to? NSD servers? and enable rdma on this > network ?while disabling rdma on opa we would accept the worse > performance via opa . We hope that this provides ?still better perf > and less technical overhead ?than using routers > > Or am I totally wrong? > > Thank you very much and keep healthy! > > Best regards > > Walter > > Mit freundlichen Gr??en > */Walter Sklenka/* > */Technical Consultant/* > > EDV-Design Informationstechnologie GmbH Giefinggasse 6/1/2, A-1210 > Wien > Tel: +43 1 29 22 165-31 > Fax: +43 1 29 22 165-90 > E-Mail: sklenka at edv-design.at > Internet: www.edv-design.at > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -- Giovanni Bracco phone +39 351 8804788 E-mail giovanni.bracco at enea.it WWW http://www.afs.enea.it/bracco From S.J.Thompson at bham.ac.uk Mon Jan 4 12:21:05 2021 From: S.J.Thompson at bham.ac.uk (Simon Thompson) Date: Mon, 4 Jan 2021 12:21:05 +0000 Subject: [gpfsug-discuss] Spectrum Protect and disk pools Message-ID: <36E75FAC-F5D8-45B1-B2AB-EAF4922A0DC6@bham.ac.uk> Hi All, We use Spectrum Protect (TSM) to backup our Scale filesystems. We have the backup setup to use multiple nodes with the PROXY node function turned on (and to some extent also use multiple target servers). This all feels like it is nice and parallel, on the TSM servers, we have disk pools for any ?small? files to drop into (I think we set anything smaller than 20GB) to prevent lots of small files stalling tape drive writes. Whilst digging into why we have slow backups at times, we found that the disk pool empties with a single thread (one drive). And looking at the docs: https://www.ibm.com/support/pages/concurrent-migration-processes-and-constraints This implies that we are limited to the number of client nodes stored in the pool. i.e. because we have one node and PROXY nodes, we are essentially limited to a single thread streaming out of the disk pool when full. Have we understood this correctly as if so, this appears to make the whole purpose of PROXY nodes sort of pointless if you have lots of small files. Or is there some other setting we should be looking at to increase the number of threads when the disk pool is emptying? (The disk pool itself has Migration Processes: 6) Thanks Simon -------------- next part -------------- An HTML attachment was scrubbed... URL: From jordi.caubet at es.ibm.com Mon Jan 4 13:36:40 2021 From: jordi.caubet at es.ibm.com (Jordi Caubet Serrabou) Date: Mon, 4 Jan 2021 13:36:40 +0000 Subject: [gpfsug-discuss] Spectrum Protect and disk pools In-Reply-To: <36E75FAC-F5D8-45B1-B2AB-EAF4922A0DC6@bham.ac.uk> References: <36E75FAC-F5D8-45B1-B2AB-EAF4922A0DC6@bham.ac.uk> Message-ID: An HTML attachment was scrubbed... URL: From scale at us.ibm.com Mon Jan 4 13:37:50 2021 From: scale at us.ibm.com (IBM Spectrum Scale) Date: Mon, 4 Jan 2021 19:07:50 +0530 Subject: [gpfsug-discuss] Spectrum Protect and disk pools In-Reply-To: <36E75FAC-F5D8-45B1-B2AB-EAF4922A0DC6@bham.ac.uk> References: <36E75FAC-F5D8-45B1-B2AB-EAF4922A0DC6@bham.ac.uk> Message-ID: Hi Diane, Can you help Simon with the below query. Or else would you know who would be the best person to be contacted here. Regards, The Spectrum Scale (GPFS) team ------------------------------------------------------------------------------------------------------------------ If you feel that your question can benefit other users of Spectrum Scale (GPFS), then please post it to the public IBM developerWroks Forum at https://www.ibm.com/developerworks/community/forums/html/forum?id=11111111-0000-0000-0000-000000000479. If your query concerns a potential software error in Spectrum Scale (GPFS) and you have an IBM software maintenance contract please contact 1-800-237-5511 in the United States or your local IBM Service Center in other countries. The forum is informally monitored as time permits and should not be used for priority messages to the Spectrum Scale (GPFS) team. From: Simon Thompson To: "gpfsug-discuss at spectrumscale.org" Date: 04-01-2021 05.51 PM Subject: [EXTERNAL] [gpfsug-discuss] Spectrum Protect and disk pools Sent by: gpfsug-discuss-bounces at spectrumscale.org Hi All, We use Spectrum Protect (TSM) to backup our Scale filesystems. We have the backup setup to use multiple nodes with the PROXY node function turned on (and to some extent also use multiple target servers). This all feels like it is nice and parallel, on the TSM servers, we have disk pools for any ?small? files to drop into (I think we set anything smaller than 20GB) to prevent lots of small files stalling tape drive writes. Whilst digging into why we have slow backups at times, we found that the disk pool empties with a single thread (one drive). And looking at the docs: https://www.ibm.com/support/pages/concurrent-migration-processes-and-constraints This implies that we are limited to the number of client nodes stored in the pool. i.e. because we have one node and PROXY nodes, we are essentially limited to a single thread streaming out of the disk pool when full. Have we understood this correctly as if so, this appears to make the whole purpose of PROXY nodes sort of pointless if you have lots of small files. Or is there some other setting we should be looking at to increase the number of threads when the disk pool is emptying? (The disk pool itself has Migration Processes: 6) Thanks Simon_______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: From S.J.Thompson at bham.ac.uk Mon Jan 4 13:52:05 2021 From: S.J.Thompson at bham.ac.uk (Simon Thompson) Date: Mon, 4 Jan 2021 13:52:05 +0000 Subject: [gpfsug-discuss] Spectrum Protect and disk pools In-Reply-To: References: <36E75FAC-F5D8-45B1-B2AB-EAF4922A0DC6@bham.ac.uk> Message-ID: <62F6E92A-31B4-45BE-9FF7-E6DBE0F7526B@bham.ac.uk> Hi Jordi, Thanks, yes it is a disk pool: Protect: TSM01>q stg BACKUP_DISK f=d Storage Pool Name: BACKUP_DISK Storage Pool Type: Primary Device Class Name: DISK Storage Type: DEVCLASS ? Next Storage Pool: BACKUP_ONSTAPE So it is a disk pool ? though it is made up of multiple disk files ? /tsmdisk/stgpool/tsmins- BACKUP_DISK DISK 200.0 G 0.0 On-Line t3/bkup_diskvol01.dsm /tsmdisk/stgpool/tsmins- BACKUP_DISK DISK 200.0 G 0.0 On-Line t3/bkup_diskvol02.dsm /tsmdisk/stgpool/tsmins- BACKUP_DISK DISK 200.0 G 0.0 On-Line t3/bkup_diskvol03.dsm Will look into the FILE pool as this sounds like it might be less single threaded than now ? Thanks Simon From: on behalf of "jordi.caubet at es.ibm.com" Reply to: "gpfsug-discuss at spectrumscale.org" Date: Monday, 4 January 2021 at 13:36 To: "gpfsug-discuss at spectrumscale.org" Subject: Re: [gpfsug-discuss] Spectrum Protect and disk pools Simon, which kind of storage pool are you using, DISK or FILE ? I understand DISK pool from your mail. DISK pool does not behave the same as FILE pool. DISK pool is limited by the number of nodes or MIGProcess setting (the minimum of both) as the document states. Using proxy helps you backup in parallel from multiple nodes to the stg pool but from Protect perspective it is a single node. Even multiple nodes are sending they run "asnodename" so single node from Protect perspective. If using FILE pool, you can define the number of volumes within the FILE pool and when migrating to tape, it will migrate each volume in parallel with the limit of MIGProcess setting. So it would be the minimum of #volumes and MIGProcess value. I know more deep technical skills in Protect are on this mailing list so feel free to add something or correct me. Best Regards, -- Jordi Caubet Serrabou IBM Storage Client Technical Specialist (IBM Spain) Ext. Phone: (+34) 679.79.17.84 (internal 55834) E-mail: jordi.caubet at es.ibm.com -----gpfsug-discuss-bounces at spectrumscale.org wrote: ----- To: "gpfsug-discuss at spectrumscale.org" > From: Simon Thompson Sent by: gpfsug-discuss-bounces at spectrumscale.org Date: 01/04/2021 01:21PM Subject: [EXTERNAL] [gpfsug-discuss] Spectrum Protect and disk pools Hi All, We use Spectrum Protect (TSM) to backup our Scale filesystems. We have the backup setup to use multiple nodes with the PROXY node function turned on (and to some extent also use multiple target servers). This all feels like it is nice and parallel, on the TSM servers, we have disk pools for any ?small? files to drop into (I think we set anything smaller than 20GB) to prevent lots of small files stalling tape drive writes. Whilst digging into why we have slow backups at times, we found that the disk pool empties with a single thread (one drive). And looking at the docs: https://www.ibm.com/support/pages/concurrent-migration-processes-and-constraints This implies that we are limited to the number of client nodes stored in the pool. i.e. because we have one node and PROXY nodes, we are essentially limited to a single thread streaming out of the disk pool when full. Have we understood this correctly as if so, this appears to make the whole purpose of PROXY nodes sort of pointless if you have lots of small files. Or is there some other setting we should be looking at to increase the number of threads when the disk pool is emptying? (The disk pool itself has Migration Processes: 6) Thanks Simon _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss Salvo indicado de otro modo m?s arriba / Unless stated otherwise above: International Business Machines, S.A. Santa Hortensia, 26-28, 28002 Madrid Registro Mercantil de Madrid; Folio 1; Tomo 1525; Hoja M-28146 CIF A28-010791 -------------- next part -------------- An HTML attachment was scrubbed... URL: From skylar2 at uw.edu Mon Jan 4 15:27:31 2021 From: skylar2 at uw.edu (Skylar Thompson) Date: Mon, 4 Jan 2021 07:27:31 -0800 Subject: [gpfsug-discuss] Spectrum Protect and disk pools In-Reply-To: <36E75FAC-F5D8-45B1-B2AB-EAF4922A0DC6@bham.ac.uk> References: <36E75FAC-F5D8-45B1-B2AB-EAF4922A0DC6@bham.ac.uk> Message-ID: <20210104152731.mwgcj2caojjalony@thargelion> I think the collocation settings of the target pool for the migration come into play as well. If you have multiple filespaces associated with a node and collocation is set to FILESPACE, then you should be able to get one migration process per filespace rather than one per node/collocation group. On Mon, Jan 04, 2021 at 12:21:05PM +0000, Simon Thompson wrote: > Hi All, > > We use Spectrum Protect (TSM) to backup our Scale filesystems. We have the backup setup to use multiple nodes with the PROXY node function turned on (and to some extent also use multiple target servers). > > This all feels like it is nice and parallel, on the TSM servers, we have disk pools for any ???small??? files to drop into (I think we set anything smaller than 20GB) to prevent lots of small files stalling tape drive writes. > > Whilst digging into why we have slow backups at times, we found that the disk pool empties with a single thread (one drive). And looking at the docs: > https://www.ibm.com/support/pages/concurrent-migration-processes-and-constraints > > This implies that we are limited to the number of client nodes stored in the pool. i.e. because we have one node and PROXY nodes, we are essentially limited to a single thread streaming out of the disk pool when full. > > Have we understood this correctly as if so, this appears to make the whole purpose of PROXY nodes sort of pointless if you have lots of small files. Or is there some other setting we should be looking at to increase the number of threads when the disk pool is emptying? (The disk pool itself has Migration Processes: 6) > > Thanks > > Simon > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss -- -- Skylar Thompson (skylar2 at u.washington.edu) -- Genome Sciences Department (UW Medicine), System Administrator -- Foege Building S046, (206)-685-7354 -- Pronouns: He/Him/His From jonathan.buzzard at strath.ac.uk Mon Jan 4 16:24:25 2021 From: jonathan.buzzard at strath.ac.uk (Jonathan Buzzard) Date: Mon, 4 Jan 2021 16:24:25 +0000 Subject: [gpfsug-discuss] Spectrum Protect and disk pools In-Reply-To: <36E75FAC-F5D8-45B1-B2AB-EAF4922A0DC6@bham.ac.uk> References: <36E75FAC-F5D8-45B1-B2AB-EAF4922A0DC6@bham.ac.uk> Message-ID: On 04/01/2021 12:21, Simon Thompson wrote: > CAUTION: This email originated outside the University. Check before > clicking links or attachments. > > Hi All, > > We use Spectrum Protect (TSM) to backup our Scale filesystems. We have > the backup setup to use multiple nodes with the PROXY node function > turned on (and to some extent also use multiple target servers). > > This all feels like it is nice and parallel, on the TSM servers, we have > disk pools for any ?small? files to drop into (I think we set anything > smaller than 20GB) to prevent lots of small files stalling tape drive > writes. > > Whilst digging into why we have slow backups at times, we found that the > disk pool empties with a single thread (one drive). And looking at the docs: > > https://www.ibm.com/support/pages/concurrent-migration-processes-and-constraints > > > This implies that we are limited to the number of client nodes stored in > the pool. i.e. because we have one node and PROXY nodes, we are > essentially limited to a single thread streaming out of the disk pool > when full. > > Have we understood this correctly as if so, this appears to make the > whole purpose of PROXY nodes sort of pointless if you have lots of small > files. Or is there some other setting we should be looking at to > increase the number of threads when the disk pool is emptying? (The disk > pool itself has Migration Processes: 6) > I have found in the past that the speed of the disk pool can make a large difference. That is a RAID5/6 of 7200RPM drives was inadequate and there was a significant boost in backup in moving to 15k RPM disks. Also your DB really needs to be on SSD, again this affords a large boost in backup speed. The other rule of thumb I have always worked with is that the disk pool should be sized for the daily churn. That is your backup should disappear into the disk pool and then when the backup is finished you can then spit the disk pool out to the primary and copy pools. If you are needing to drain the disk pool mid backup your disk pool is too small. TL;DR your TSM disks (DB and disk pool) need to be some of the best storage you have to maximize backup speed. JAB. -- Jonathan A. Buzzard Tel: +44141-5483420 HPC System Administrator, ARCHIE-WeSt. University of Strathclyde, John Anderson Building, Glasgow. G4 0NG From Alec.Effrat at wellsfargo.com Mon Jan 4 17:30:39 2021 From: Alec.Effrat at wellsfargo.com (Alec.Effrat at wellsfargo.com) Date: Mon, 4 Jan 2021 17:30:39 +0000 Subject: [gpfsug-discuss] Spectrum Protect and disk pools In-Reply-To: References: <36E75FAC-F5D8-45B1-B2AB-EAF4922A0DC6@bham.ac.uk> Message-ID: <151a06b1b52545fca2f92d3a5e3ce943@wellsfargo.com> I am not sure what platform you run on but for AIX with a fully virtualized LPAR we needed to enable "mtu_bypass" on the en device that was used for our backups. Prior to this setting we could not exceed 250 MB/s on our 10G interface, after that we run at 1.6GB/s solid per 10G virtual adapter, fueled by Spectrum Scale and a different backup engine. We did lose a lot of sleep trying to figure this one out, but are very pleased with the end result. Alec Effrat SAS Lead, AVP Business Intelligence Competency Center SAS Administration Cell?949-246-7713 alec.effrat at wellsfargo.com -----Original Message----- From: gpfsug-discuss-bounces at spectrumscale.org On Behalf Of Jonathan Buzzard Sent: Monday, January 4, 2021 8:24 AM To: gpfsug-discuss at spectrumscale.org Subject: Re: [gpfsug-discuss] Spectrum Protect and disk pools On 04/01/2021 12:21, Simon Thompson wrote: > CAUTION: This email originated outside the University. Check before > clicking links or attachments. > > Hi All, > > We use Spectrum Protect (TSM) to backup our Scale filesystems. We have > the backup setup to use multiple nodes with the PROXY node function > turned on (and to some extent also use multiple target servers). > > This all feels like it is nice and parallel, on the TSM servers, we > have disk pools for any ?small? files to drop into (I think we set > anything smaller than 20GB) to prevent lots of small files stalling > tape drive writes. > > Whilst digging into why we have slow backups at times, we found that > the disk pool empties with a single thread (one drive). And looking at the docs: > > https://www.ibm.com/support/pages/concurrent-migration-processes-and-c > onstraints > .ibm.com%2Fsupport%2Fpages%2Fconcurrent-migration-processes-and-constr > aints&data=04%7C01%7Cjonathan.buzzard%40strath.ac.uk%7C99158004dad04c7 > 9a58808d8b0ab39b8%7C631e0763153347eba5cd0457bee5944e%7C0%7C0%7C6374535 > 96745356438%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMz > IiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=ZPUkTB5Vy5S0%2BL67neMp4C > 1lxIuphMS5HuTkBYcmcMU%3D&reserved=0> > > This implies that we are limited to the number of client nodes stored > in the pool. i.e. because we have one node and PROXY nodes, we are > essentially limited to a single thread streaming out of the disk pool > when full. > > Have we understood this correctly as if so, this appears to make the > whole purpose of PROXY nodes sort of pointless if you have lots of > small files. Or is there some other setting we should be looking at to > increase the number of threads when the disk pool is emptying? (The > disk pool itself has Migration Processes: 6) > I have found in the past that the speed of the disk pool can make a large difference. That is a RAID5/6 of 7200RPM drives was inadequate and there was a significant boost in backup in moving to 15k RPM disks. Also your DB really needs to be on SSD, again this affords a large boost in backup speed. The other rule of thumb I have always worked with is that the disk pool should be sized for the daily churn. That is your backup should disappear into the disk pool and then when the backup is finished you can then spit the disk pool out to the primary and copy pools. If you are needing to drain the disk pool mid backup your disk pool is too small. TL;DR your TSM disks (DB and disk pool) need to be some of the best storage you have to maximize backup speed. JAB. -- Jonathan A. Buzzard Tel: +44141-5483420 HPC System Administrator, ARCHIE-WeSt. University of Strathclyde, John Anderson Building, Glasgow. G4 0NG _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss From xhejtman at ics.muni.cz Wed Jan 6 17:46:58 2021 From: xhejtman at ics.muni.cz (Lukas Hejtmanek) Date: Wed, 6 Jan 2021 18:46:58 +0100 Subject: [gpfsug-discuss] S3 API and POSIX rights Message-ID: <20210106174658.GA1764842@ics.muni.cz> Hello, we are playing a bit with Spectrum Scale OBJ storage. We were able to get working unified access for NFS and OBJ but only if we use swift clients. If we use s3 client for OBJ, all objects are owned by swift user and large objects are multiparted wich is not suitable for unified access. Should the unified access work also for S3 API? Or only swift is supported currently? -- Luk?? Hejtm?nek Linux Administrator only because Full Time Multitasking Ninja is not an official job title From jcatana at gmail.com Wed Jan 6 22:30:08 2021 From: jcatana at gmail.com (Josh Catana) Date: Wed, 6 Jan 2021 17:30:08 -0500 Subject: [gpfsug-discuss] S3 API and POSIX rights In-Reply-To: <20210106174658.GA1764842@ics.muni.cz> References: <20210106174658.GA1764842@ics.muni.cz> Message-ID: Swift and s3 are both object storage, but different protocol implementation. Not compatible. I use minio to share data for s3 compatibility. On Wed, Jan 6, 2021, 12:52 PM Lukas Hejtmanek wrote: > Hello, > > we are playing a bit with Spectrum Scale OBJ storage. We were able to get > working unified access for NFS and OBJ but only if we use swift clients. > If we > use s3 client for OBJ, all objects are owned by swift user and large > objects > are multiparted wich is not suitable for unified access. > > Should the unified access work also for S3 API? Or only swift is supported > currently? > > -- > Luk?? Hejtm?nek > > Linux Administrator only because > Full Time Multitasking Ninja > is not an official job title > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: From brnelson at us.ibm.com Thu Jan 7 00:18:28 2021 From: brnelson at us.ibm.com (Brian Nelson) Date: Wed, 6 Jan 2021 18:18:28 -0600 Subject: [gpfsug-discuss] S3 API and POSIX rights Message-ID: Unfortunately, these features are not supported. Multipart uploads are not supported with Unified File and Object for the reason you mentioned, as the separate parts of the object are written as separate files. And because the S3 and Swift authentication is handled differently, the user is not passed through in the S3 path. Without the user information, the Unified File and Object layer is not able to set the file ownership to the external authentication user. Ownership is set to the default of 'swift' in that case. -Brian =================================== Brian Nelson 512-286-7735 (T/L) 363-7735 IBM Spectrum Scale brnelson at us.ibm.com On Wed, Jan 6, 2021, 12:52 PM Lukas Hejtmanek wrote: > Hello, > > we are playing a bit with Spectrum Scale OBJ storage. We were able to get > working unified access for NFS and OBJ but only if we use swift clients. > If we > use s3 client for OBJ, all objects are owned by swift user and large > objects > are multiparted wich is not suitable for unified access. > > Should the unified access work also for S3 API? Or only swift is supported > currently? > > -- > Luk?? Hejtm?nek > > Linux Administrator only because > Full Time Multitasking Ninja > is not an official job title > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: From scale at us.ibm.com Thu Jan 7 08:36:25 2021 From: scale at us.ibm.com (IBM Spectrum Scale) Date: Thu, 7 Jan 2021 14:06:25 +0530 Subject: [gpfsug-discuss] S3 API and POSIX rights In-Reply-To: <20210106174658.GA1764842@ics.muni.cz> References: <20210106174658.GA1764842@ics.muni.cz> Message-ID: Hi Brian, Can you please answer the below S3 API related query. Or would you know who would be the right person to forward this to. Regards, The Spectrum Scale (GPFS) team ------------------------------------------------------------------------------------------------------------------ If you feel that your question can benefit other users of Spectrum Scale (GPFS), then please post it to the public IBM developerWroks Forum at https://www.ibm.com/developerworks/community/forums/html/forum?id=11111111-0000-0000-0000-000000000479. If your query concerns a potential software error in Spectrum Scale (GPFS) and you have an IBM software maintenance contract please contact 1-800-237-5511 in the United States or your local IBM Service Center in other countries. The forum is informally monitored as time permits and should not be used for priority messages to the Spectrum Scale (GPFS) team. From: Lukas Hejtmanek To: gpfsug-discuss at spectrumscale.org Date: 06-01-2021 11.22 PM Subject: [EXTERNAL] [gpfsug-discuss] S3 API and POSIX rights Sent by: gpfsug-discuss-bounces at spectrumscale.org Hello, we are playing a bit with Spectrum Scale OBJ storage. We were able to get working unified access for NFS and OBJ but only if we use swift clients. If we use s3 client for OBJ, all objects are owned by swift user and large objects are multiparted wich is not suitable for unified access. Should the unified access work also for S3 API? Or only swift is supported currently? -- Luk?? Hejtm?nek Linux Administrator only because Full Time Multitasking Ninja is not an official job title _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: From juergen.hannappel at desy.de Fri Jan 8 12:27:27 2021 From: juergen.hannappel at desy.de (Hannappel, Juergen) Date: Fri, 8 Jan 2021 13:27:27 +0100 (CET) Subject: [gpfsug-discuss] GPFS_CLEAR_FILE_CACHE fails on Read-Only FS Message-ID: <933204168.28588491.1610108847123.JavaMail.zimbra@desy.de> Hi, in a program after reading a file I did a gpfs_fcntl() with GPFS_CLEAR_FILE_CACHE to get rid of the now unused pages in the file cache. That works fine, but if the file system is read-only (in a remote cluster) this fails with a message that the file system is read only. Is that expected behaviour or an unexpected feature (aka bug)? -- Dr. J?rgen Hannappel DESY/IT Tel. : +49 40 8998-4616 -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 1711 bytes Desc: S/MIME Cryptographic Signature URL: From scale at us.ibm.com Fri Jan 8 13:42:25 2021 From: scale at us.ibm.com (IBM Spectrum Scale) Date: Fri, 8 Jan 2021 08:42:25 -0500 Subject: [gpfsug-discuss] GPFS_CLEAR_FILE_CACHE fails on Read-Only FS In-Reply-To: <933204168.28588491.1610108847123.JavaMail.zimbra@desy.de> References: <933204168.28588491.1610108847123.JavaMail.zimbra@desy.de> Message-ID: It seems like a defect. Could you please open a help case and if possible provide a sample program and the steps you took to create the problem? Also, please provide the version of Scale you are using where you see this behavior. This should result in a defect being opened against GPFS which will then be addressed by a member of the development team. Regards, The Spectrum Scale (GPFS) team ------------------------------------------------------------------------------------------------------------------ If you feel that your question can benefit other users of Spectrum Scale (GPFS), then please post it to the public IBM developerWroks Forum at https://www.ibm.com/developerworks/community/forums/html/forum?id=11111111-0000-0000-0000-000000000479 . If your query concerns a potential software error in Spectrum Scale (GPFS) and you have an IBM software maintenance contract please contact 1-800-237-5511 in the United States or your local IBM Service Center in other countries. The forum is informally monitored as time permits and should not be used for priority messages to the Spectrum Scale (GPFS) team. From: "Hannappel, Juergen" To: gpfsug main discussion list Date: 01/08/2021 07:33 AM Subject: [EXTERNAL] [gpfsug-discuss] GPFS_CLEAR_FILE_CACHE fails on Read-Only FS Sent by: gpfsug-discuss-bounces at spectrumscale.org Hi, in a program after reading a file I did a gpfs_fcntl() with GPFS_CLEAR_FILE_CACHE to get rid of the now unused pages in the file cache. That works fine, but if the file system is read-only (in a remote cluster) this fails with a message that the file system is read only. Is that expected behaviour or an unexpected feature (aka bug)? -- Dr. J?rgen Hannappel DESY/IT Tel. : +49 40 8998-4616 [attachment "smime.p7s" deleted by Frederick Stock/Pittsburgh/IBM] _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From hoov at us.ibm.com Mon Jan 11 17:32:48 2021 From: hoov at us.ibm.com (Theodore Hoover Jr) Date: Mon, 11 Jan 2021 17:32:48 +0000 Subject: [gpfsug-discuss] Spectrum Scale Cloud Online Survey In-Reply-To: References: Message-ID: An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: Image.16082105961220.jpg Type: image/jpeg Size: 6839 bytes Desc: not available URL: From Philipp.Rehs at uni-duesseldorf.de Mon Jan 11 18:53:29 2021 From: Philipp.Rehs at uni-duesseldorf.de (Rehs, Philipp Helo) Date: Mon, 11 Jan 2021 18:53:29 +0000 Subject: [gpfsug-discuss] GPFS GUI does not create snapshots Message-ID: <63152da6-4464-4497-b4d2-11f8d2260614@email.android.com> Hello, we have gpfs GUI on 4.2.3.22 running and it suddenly stopped to create new snapshots from schedule. I can manually create snapshots but none is created from schedule. How can I debug it? Kind regards Philipp Rehs -------------- next part -------------- An HTML attachment was scrubbed... URL: From S.J.Thompson at bham.ac.uk Mon Jan 11 19:07:44 2021 From: S.J.Thompson at bham.ac.uk (Simon Thompson) Date: Mon, 11 Jan 2021 19:07:44 +0000 Subject: [gpfsug-discuss] GPFS GUI does not create snapshots Message-ID: Have you tried restarting the gpfs.gui service? At some point in the past we have seen similar and restarting the GUI made it start again. Simon From: on behalf of "Philipp.Rehs at uni-duesseldorf.de" Reply to: "gpfsug-discuss at spectrumscale.org" Date: Monday, 11 January 2021 at 19:03 To: "gpfsug-discuss at spectrumscale.org" Subject: [gpfsug-discuss] GPFS GUI does not create snapshots Hello, we have gpfs GUI on 4.2.3.22 running and it suddenly stopped to create new snapshots from schedule. I can manually create snapshots but none is created from schedule. How can I debug it? Kind regards Philipp Rehs -------------- next part -------------- An HTML attachment was scrubbed... URL: From Philipp.Rehs at uni-duesseldorf.de Mon Jan 11 19:16:52 2021 From: Philipp.Rehs at uni-duesseldorf.de (Rehs, Philipp Helo) Date: Mon, 11 Jan 2021 19:16:52 +0000 Subject: [gpfsug-discuss] GPFS GUI does not create snapshots Message-ID: Hello Simon, I have already rebooted the server but no change. I also see no calls to mmcrsnapshot in the journalctl sudo log. Maybe there is a service which is not running? Kind regards Philipp Am 11.01.2021 20:07 schrieb Simon Thompson : Have you tried restarting the gpfs.gui service? At some point in the past we have seen similar and restarting the GUI made it start again. Simon From: on behalf of "Philipp.Rehs at uni-duesseldorf.de" Reply to: "gpfsug-discuss at spectrumscale.org" Date: Monday, 11 January 2021 at 19:03 To: "gpfsug-discuss at spectrumscale.org" Subject: [gpfsug-discuss] GPFS GUI does not create snapshots Hello, we have gpfs GUI on 4.2.3.22 running and it suddenly stopped to create new snapshots from schedule. I can manually create snapshots but none is created from schedule. How can I debug it? Kind regards Philipp Rehs -------------- next part -------------- An HTML attachment was scrubbed... URL: From rohwedder at de.ibm.com Tue Jan 12 10:46:16 2021 From: rohwedder at de.ibm.com (Markus Rohwedder) Date: Tue, 12 Jan 2021 11:46:16 +0100 Subject: [gpfsug-discuss] GPFS GUI does not create snapshots In-Reply-To: References: Message-ID: Hello Philipp. there is no additional service that covers the snapshot scheduling besides the GUI service. Please note, that in case you have two GUI instances running, the snapshot scheduling would have moved to the second instance in case you reboot. The GUI/REST application logs are located in /var/log/cnlog/mgtsrv, but I propose to open a support case for this issue. Mit freundlichen Gr??en / Kind regards Dr. Markus Rohwedder IBM Systems / Lab Services Europe / EMEA Storage Competence Center Phone: +49 162 4159920 IBM Deutschland GmbH E-Mail: rohwedder at de.ibm.com Am Weiher 24 65451 Kelsterbach Germany IBM Deutschland GmbH / Vorsitzender des Aufsichtsrats: Sebastian Krause / Gesch?ftsf?hrung: Gregor Pillen (Vorsitzender), Agnes Heftberger, Norbert Janzen, Markus Koerner, Christian Noll, Nicole Reimer / Sitz der Gesellschaft: 71139 Ehningen, IBM-Allee 1 / Registergericht: Amtsgericht Stuttgart, HRB14562 From: "Rehs, Philipp Helo" To: gpfsug main discussion list Date: 11.01.2021 20:17 Subject: [EXTERNAL] Re: [gpfsug-discuss] GPFS GUI does not create snapshots Sent by: gpfsug-discuss-bounces at spectrumscale.org Hello Simon, I have already rebooted the server but no change. I also see no calls to mmcrsnapshot in the journalctl sudo log. Maybe there is a service which is not running? Kind regards Philipp Am 11.01.2021 20:07 schrieb Simon Thompson : Have you tried restarting the gpfs.gui service? At some point in the past we have seen similar and restarting the GUI made it start again. Simon From: on behalf of "Philipp.Rehs at uni-duesseldorf.de" Reply to: "gpfsug-discuss at spectrumscale.org" Date: Monday, 11 January 2021 at 19:03 To: "gpfsug-discuss at spectrumscale.org" Subject: [gpfsug-discuss] GPFS GUI does not create snapshots Hello, we have gpfs GUI on 4.2.3.22 running and it suddenly stopped to create new snapshots from schedule. I can manually create snapshots but none is created from schedule. How can I debug it? Kind regards Philipp Rehs _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: ecblank.gif Type: image/gif Size: 45 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: 1E685739.gif Type: image/gif Size: 4659 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: From cabrillo at ifca.unican.es Tue Jan 12 14:32:23 2021 From: cabrillo at ifca.unican.es (Iban Cabrillo) Date: Tue, 12 Jan 2021 15:32:23 +0100 (CET) Subject: [gpfsug-discuss] Disk in unrecovered state Message-ID: <1959609895.205347.1610461943831.JavaMail.zimbra@ifca.unican.es> Dear, Since this moning I have a couple of disk (7) in down state, I have tried to start them again but after that they change to unrecovered. These "failed" disk are only DATA. Both pool Data and Metadata has two failures groups, and set replica to 2. The Metadata disks are in two different enclosures one for each filure group. The filesystem has been unmounted , but when i have tried to run the mmfsck told me the I should remove the down disk [root at gpfs06 ~]# mmlsdisk gpfs2 -L | grep -v up disk driver sector failure holds holds storage ------------ -------- ------ ----------- -------- ----- ------------- ------------ ------- ------------ --------- ..... nsd18jbod1 nsd 512 2 No Yes to be emptied unrecovered 26 data nsd19jbod1 nsd 512 2 No Yes ready unrecovered 27 data nsd19jbod2 nsd 512 3 No Yes ready down 46 data nsd24jbod2 nsd 512 3 No Yes ready down 51 data nsd57jbod1 nsd 512 2 No Yes ready down 109 data nsd61jbod1 nsd 512 2 No Yes ready down 113 data nsd71jbod1 nsd 512 2 No Yes ready down 123 data ..... Any help is welcomed. Regards, I -------------- next part -------------- An HTML attachment was scrubbed... URL: From bbanister at jumptrading.com Tue Jan 12 15:11:22 2021 From: bbanister at jumptrading.com (Bryan Banister) Date: Tue, 12 Jan 2021 15:11:22 +0000 Subject: [gpfsug-discuss] Disk in unrecovered state In-Reply-To: <1959609895.205347.1610461943831.JavaMail.zimbra@ifca.unican.es> References: <1959609895.205347.1610461943831.JavaMail.zimbra@ifca.unican.es> Message-ID: Definitely recommend getting a IBM Case in and ask someone for direct assistance (Zoom even). Then also check that you can access all of the underlying storage with READ ONLY operations from all defined NSD Servers in the NSD ServerList for nsd18jbod1 and nsd19jbod1. Given the name of the NSDs, sound like there is not any RAID protection on theses disks. If so then you would have serious data loss issues with one of the drives corrupted. Hope that helps, -Bryan From: gpfsug-discuss-bounces at spectrumscale.org On Behalf Of Iban Cabrillo Sent: Tuesday, January 12, 2021 8:32 AM To: gpfsug-discuss Subject: [gpfsug-discuss] Disk in unrecovered state [EXTERNAL EMAIL] Dear, Since this moning I have a couple of disk (7) in down state, I have tried to start them again but after that they change to unrecovered. These "failed" disk are only DATA. Both pool Data and Metadata has two failures groups, and set replica to 2. The Metadata disks are in two different enclosures one for each filure group. The filesystem has been unmounted , but when i have tried to run the mmfsck told me the I should remove the down disk [root at gpfs06 ~]# mmlsdisk gpfs2 -L | grep -v up disk driver sector failure holds holds storage ------------ -------- ------ ----------- -------- ----- ------------- ------------ ------- ------------ --------- ..... nsd18jbod1 nsd 512 2 No Yes to be emptied unrecovered 26 data nsd19jbod1 nsd 512 2 No Yes ready unrecovered 27 data nsd19jbod2 nsd 512 3 No Yes ready down 46 data nsd24jbod2 nsd 512 3 No Yes ready down 51 data nsd57jbod1 nsd 512 2 No Yes ready down 109 data nsd61jbod1 nsd 512 2 No Yes ready down 113 data nsd71jbod1 nsd 512 2 No Yes ready down 123 data ..... Any help is welcomed. Regards, I -------------- next part -------------- An HTML attachment was scrubbed... URL: From Renar.Grunenberg at huk-coburg.de Tue Jan 12 15:21:33 2021 From: Renar.Grunenberg at huk-coburg.de (Grunenberg, Renar) Date: Tue, 12 Jan 2021 15:21:33 +0000 Subject: [gpfsug-discuss] Disk in unrecovered state In-Reply-To: <1959609895.205347.1610461943831.JavaMail.zimbra@ifca.unican.es> References: <1959609895.205347.1610461943831.JavaMail.zimbra@ifca.unican.es> Message-ID: <3a7cf4ec9ee8458cb8275453f7b9d0a1@huk-coburg.de> Hallo Iban, first you should check the path to the disk. (mmlsnsd -m) It seems to be broken from the OS view. This should fixed first. If you see no dev entry you have a HW problem. If this is fixed then you can start each disk individuell to see there are something start here. On wich scale version do you are? Renar Grunenberg Abteilung Informatik - Betrieb HUK-COBURG Bahnhofsplatz 96444 Coburg Telefon: 09561 96-44110 Telefax: 09561 96-44104 E-Mail: Renar.Grunenberg at huk-coburg.de Internet: www.huk.de ________________________________ HUK-COBURG Haftpflicht-Unterst?tzungs-Kasse kraftfahrender Beamter Deutschlands a. G. in Coburg Reg.-Gericht Coburg HRB 100; St.-Nr. 9212/101/00021 Sitz der Gesellschaft: Bahnhofsplatz, 96444 Coburg Vorsitzender des Aufsichtsrats: Prof. Dr. Heinrich R. Schradin. Vorstand: Klaus-J?rgen Heitmann (Sprecher), Stefan Gronbach, Dr. Hans Olav Her?y, Dr. J?rg Rheinl?nder, Sarah R?ssler, Thomas Sehn, Daniel Thomas. ________________________________ Diese Nachricht enth?lt vertrauliche und/oder rechtlich gesch?tzte Informationen. Wenn Sie nicht der richtige Adressat sind oder diese Nachricht irrt?mlich erhalten haben, informieren Sie bitte sofort den Absender und vernichten Sie diese Nachricht. Das unerlaubte Kopieren sowie die unbefugte Weitergabe dieser Nachricht ist nicht gestattet. This information may contain confidential and/or privileged information. If you are not the intended recipient (or have received this information in error) please notify the sender immediately and destroy this information. Any unauthorized copying, disclosure or distribution of the material in this information is strictly forbidden. ________________________________ Von: gpfsug-discuss-bounces at spectrumscale.org Im Auftrag von Iban Cabrillo Gesendet: Dienstag, 12. Januar 2021 15:32 An: gpfsug-discuss Betreff: [gpfsug-discuss] Disk in unrecovered state Dear, Since this moning I have a couple of disk (7) in down state, I have tried to start them again but after that they change to unrecovered. These "failed" disk are only DATA. Both pool Data and Metadata has two failures groups, and set replica to 2. The Metadata disks are in two different enclosures one for each filure group. The filesystem has been unmounted , but when i have tried to run the mmfsck told me the I should remove the down disk [root at gpfs06 ~]# mmlsdisk gpfs2 -L | grep -v up disk driver sector failure holds holds storage ------------ -------- ------ ----------- -------- ----- ------------- ------------ ------- ------------ --------- ..... nsd18jbod1 nsd 512 2 No Yes to be emptied unrecovered 26 data nsd19jbod1 nsd 512 2 No Yes ready unrecovered 27 data nsd19jbod2 nsd 512 3 No Yes ready down 46 data nsd24jbod2 nsd 512 3 No Yes ready down 51 data nsd57jbod1 nsd 512 2 No Yes ready down 109 data nsd61jbod1 nsd 512 2 No Yes ready down 113 data nsd71jbod1 nsd 512 2 No Yes ready down 123 data ..... Any help is welcomed. Regards, I -------------- next part -------------- An HTML attachment was scrubbed... URL: From cabrillo at ifca.unican.es Tue Jan 12 15:59:03 2021 From: cabrillo at ifca.unican.es (Iban Cabrillo) Date: Tue, 12 Jan 2021 16:59:03 +0100 (CET) Subject: [gpfsug-discuss] Disk in unrecovered state In-Reply-To: <3a7cf4ec9ee8458cb8275453f7b9d0a1@huk-coburg.de> References: <1959609895.205347.1610461943831.JavaMail.zimbra@ifca.unican.es> <3a7cf4ec9ee8458cb8275453f7b9d0a1@huk-coburg.de> Message-ID: <753758367.214256.1610467143052.JavaMail.zimbra@ifca.unican.es> Hi Renar, The version we are installed is 5.0.4-3, and the paths to these wrong disks seems to be fine: [root at gpfs06 ~]# mmlsnsd -m| grep nsd18jbod1 nsd18jbod1 0A0A00675EE76CF5 /dev/sds gpfs05.ifca.es server node nsd18jbod1 0A0A00675EE76CF5 /dev/sdby gpfs06.ifca.es server node [root at gpfs06 ~]# mmlsnsd -m| grep nsd19jbod1 nsd19jbod1 0A0A00665EE76CF6 /dev/sdt gpfs05.ifca.es server node nsd19jbod1 0A0A00665EE76CF6 /dev/sdaa gpfs06.ifca.es server node [root at gpfs06 ~]# mmlsnsd -m| grep nsd19jbod2 nsd19jbod2 0A0A00695EE79A12 /dev/sdt gpfs07.ifca.es server node nsd19jbod2 0A0A00695EE79A12 /dev/sdat gpfs08.ifca.es server node [root at gpfs06 ~]# mmlsnsd -m| grep nsd24jbod2 nsd24jbod2 0A0A00685EE79749 /dev/sdbn gpfs07.ifca.es server node nsd24jbod2 0A0A00685EE79749 /dev/sdcg gpfs08.ifca.es server node [root at gpfs06 ~]# mmlsnsd -m| grep nsd57jbod1 nsd57jbod1 0A0A00665F243CE1 /dev/sdbg gpfs05.ifca.es server node nsd57jbod1 0A0A00665F243CE1 /dev/sdbx gpfs06.ifca.es server node [root at gpfs06 ~]# mmlsnsd -m| grep nsd61jbod1 nsd61jbod1 0A0A00665F243CFA /dev/sdbk gpfs05.ifca.es server node nsd61jbod1 0A0A00665F243CFA /dev/sdy gpfs06.ifca.es server node [root at gpfs06 ~]# mmlsnsd -m| grep nsd71jbod1 nsd71jbod1 0A0A00665F243D38 /dev/sdbu gpfs05.ifca.es server node nsd71jbod1 0A0A00665F243D38 /dev/sdbv gpfs06.ifca.es server node trying to start 19jbod1 again: [root at gpfs06 ~]# mmchdisk gpfs2 start -d nsd19jbod1 mmnsddiscover: Attempting to rediscover the disks. This may take a while ... mmnsddiscover: Finished. gpfs06.ifca.es: Rediscovered nsd server access to nsd19jbod1. gpfs05.ifca.es: Rediscovered nsd server access to nsd19jbod1. Failed to open gpfs2. Log recovery failed. Input/output error Initial disk state was updated successfully, but another error may have changed the state again. mmchdisk: Command failed. Examine previous error messages to determine cause. Regards, I From olaf.weiser at de.ibm.com Tue Jan 12 16:30:24 2021 From: olaf.weiser at de.ibm.com (Olaf Weiser) Date: Tue, 12 Jan 2021 16:30:24 +0000 Subject: [gpfsug-discuss] Disk in unrecovered state In-Reply-To: <753758367.214256.1610467143052.JavaMail.zimbra@ifca.unican.es> References: <753758367.214256.1610467143052.JavaMail.zimbra@ifca.unican.es>, <1959609895.205347.1610461943831.JavaMail.zimbra@ifca.unican.es><3a7cf4ec9ee8458cb8275453f7b9d0a1@huk-coburg.de> Message-ID: An HTML attachment was scrubbed... URL: From nikhilk at us.ibm.com Tue Jan 12 17:32:08 2021 From: nikhilk at us.ibm.com (Nikhil Khandelwal) Date: Tue, 12 Jan 2021 17:32:08 +0000 Subject: [gpfsug-discuss] Disk in unrecovered state In-Reply-To: References: , <753758367.214256.1610467143052.JavaMail.zimbra@ifca.unican.es>, <1959609895.205347.1610461943831.JavaMail.zimbra@ifca.unican.es><3a7cf4ec9ee8458cb8275453f7b9d0a1@huk-coburg.de> Message-ID: An HTML attachment was scrubbed... URL: From cabrillo at ifca.unican.es Wed Jan 13 10:23:20 2021 From: cabrillo at ifca.unican.es (Iban Cabrillo) Date: Wed, 13 Jan 2021 11:23:20 +0100 (CET) Subject: [gpfsug-discuss] Disk in unrecovered state In-Reply-To: References: <753758367.214256.1610467143052.JavaMail.zimbra@ifca.unican.es> <1959609895.205347.1610461943831.JavaMail.zimbra@ifca.unican.es> <3a7cf4ec9ee8458cb8275453f7b9d0a1@huk-coburg.de> Message-ID: <538725591.388718.1610533400379.JavaMail.zimbra@ifca.unican.es> Hi Guys, Devices seems to be accesible from both server primary and secondary, and thr harware state is "Optimal" [root at gpfs05 ~]# mmlsnsd -m| grep nsd18jbod1 nsd18jbod1 0A0A00675EE76CF5 /dev/sds gpfs05.ifca.es server node nsd18jbod1 0A0A00675EE76CF5 /dev/sdby gpfs06.ifca.es server node [root at gpfs05 ~]# #dd if=/dev/sds [root at gpfs05 ~]# man od [root at gpfs05 ~]# dd if=/dev/sds bs=4k count=2 | od -c 2+0 records in 2+0 records out 8192 bytes (8.2 kB) copied, 0.000249162 s, 32.9 MB/s 0000000 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 * 0000700 001 \0 356 376 377 377 001 \0 \0 \0 377 377 377 377 \0 \0 0000720 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 * 0000760 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 U 252 0001000 E F I P A R T \0 \0 001 \0 \ \0 \0 \0 0001020 \r 0 267 u \0 \0 \0 \0 001 \0 \0 \0 \0 \0 \0 \0 0001040 257 * 201 243 003 \0 \0 \0 " \0 \0 \0 \0 \0 \0 \0 0001060 216 * 201 243 003 \0 \0 \0 240 ! 302 3 . R \f M 0001100 200 241 323 024 245 h | G 002 \0 \0 \0 \0 \0 \0 \0 0001120 200 \0 \0 \0 200 \0 \0 \0 p b 203 F \0 \0 \0 \0 0001140 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 * 0002000 220 374 257 7 } 357 226 N 221 303 - z 340 U 261 t 0002020 316 343 324 ' } 033 K C 203 a 314 = 220 k 336 023 0002040 0 \0 \0 \0 \0 \0 \0 \0 177 * 201 243 003 \0 \0 \0 0002060 001 \0 \0 \0 \0 \0 \0 @ G \0 P \0 F \0 S \0 0002100 : \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 0002120 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 * 0020000 Regards, I -------------- next part -------------- An HTML attachment was scrubbed... URL: From Achim.Rehor at de.ibm.com Wed Jan 13 11:51:44 2021 From: Achim.Rehor at de.ibm.com (Achim Rehor) Date: Wed, 13 Jan 2021 12:51:44 +0100 Subject: [gpfsug-discuss] Disk in unrecovered state In-Reply-To: <753758367.214256.1610467143052.JavaMail.zimbra@ifca.unican.es> References: <1959609895.205347.1610461943831.JavaMail.zimbra@ifca.unican.es><3a7cf4ec9ee8458cb8275453f7b9d0a1@huk-coburg.de> <753758367.214256.1610467143052.JavaMail.zimbra@ifca.unican.es> Message-ID: Hi Iban, given that you have physical access to the disks and they are readable ( i see you checked that via dd command) you should mmchdisk start them. Note: as you have down disks in more than one FG, you will need to be able to at least get one good copy of the metadata readable .. in order to be able to mmchdisk start a disk. In that case i would run : mmchdisk start -a (so gpfs can get data from all readable disks) Mit freundlichen Gr??en / Kind regards Achim Rehor Remote Technical Support Engineer Storage IBM Systems Storage Support - EMEA Storage Competence Center (ESCC) Spectrum Scale / Elastic Storage Server ------------------------------------------------------------------------------------------------------------------------------------------- IBM Deutschland Am Weiher 24 65451 Kelsterbach Phone: +49-170-4521194 E-Mail: Achim.Rehor at de.ibm.com ------------------------------------------------------------------------------------------------------------------------------------------- IBM Deutschland GmbH / Vorsitzender des Aufsichtsrats: Sebastian Krause Gesch?ftsf?hrung: Gregor Pillen (Vorsitzender), Agnes Heftberger, Norbert Janzen, Markus Koerner, Christian Noll, Nicole Reimer Sitz der Gesellschaft: Ehningen / Registergericht: Amtsgericht Stuttgart, HRB 14562 / WEEE-Reg.-Nr. DE 99369940 gpfsug-discuss-bounces at spectrumscale.org wrote on 12/01/2021 16:59:03: > From: Iban Cabrillo > To: gpfsug-discuss > Date: 12/01/2021 16:59 > Subject: [EXTERNAL] Re: [gpfsug-discuss] Disk in unrecovered state > Sent by: gpfsug-discuss-bounces at spectrumscale.org > > Hi Renar, > The version we are installed is 5.0.4-3, and the paths to these > wrong disks seems to be fine: > > [root at gpfs06 ~]# mmlsnsd -m| grep nsd18jbod1 > nsd18jbod1 0A0A00675EE76CF5 /dev/sds gpfs05.ifca.es > server node > nsd18jbod1 0A0A00675EE76CF5 /dev/sdby gpfs06.ifca.es > server node > [root at gpfs06 ~]# mmlsnsd -m| grep nsd19jbod1 > nsd19jbod1 0A0A00665EE76CF6 /dev/sdt gpfs05.ifca.es > server node > nsd19jbod1 0A0A00665EE76CF6 /dev/sdaa gpfs06.ifca.es > server node > [root at gpfs06 ~]# mmlsnsd -m| grep nsd19jbod2 > nsd19jbod2 0A0A00695EE79A12 /dev/sdt gpfs07.ifca.es > server node > nsd19jbod2 0A0A00695EE79A12 /dev/sdat gpfs08.ifca.es > server node > [root at gpfs06 ~]# mmlsnsd -m| grep nsd24jbod2 > nsd24jbod2 0A0A00685EE79749 /dev/sdbn gpfs07.ifca.es > server node > nsd24jbod2 0A0A00685EE79749 /dev/sdcg gpfs08.ifca.es > server node > [root at gpfs06 ~]# mmlsnsd -m| grep nsd57jbod1 > nsd57jbod1 0A0A00665F243CE1 /dev/sdbg gpfs05.ifca.es > server node > nsd57jbod1 0A0A00665F243CE1 /dev/sdbx gpfs06.ifca.es > server node > [root at gpfs06 ~]# mmlsnsd -m| grep nsd61jbod1 > nsd61jbod1 0A0A00665F243CFA /dev/sdbk gpfs05.ifca.es > server node > nsd61jbod1 0A0A00665F243CFA /dev/sdy gpfs06.ifca.es > server node > [root at gpfs06 ~]# mmlsnsd -m| grep nsd71jbod1 > nsd71jbod1 0A0A00665F243D38 /dev/sdbu gpfs05.ifca.es > server node > nsd71jbod1 0A0A00665F243D38 /dev/sdbv gpfs06.ifca.es > server node > > trying to start 19jbod1 again: > [root at gpfs06 ~]# mmchdisk gpfs2 start -d nsd19jbod1 > mmnsddiscover: Attempting to rediscover the disks. This may take awhile ... > mmnsddiscover: Finished. > gpfs06.ifca.es: Rediscovered nsd server access to nsd19jbod1. > gpfs05.ifca.es: Rediscovered nsd server access to nsd19jbod1. > Failed to open gpfs2. > Log recovery failed. > Input/output error > Initial disk state was updated successfully, but another error may > have changed the state again. > mmchdisk: Command failed. Examine previous error messages to determine cause. > > Regards, I > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > INVALID URI REMOVED > u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx- > siA1ZOg&r=RGTETs2tk0Kz_VOpznDVDkqChhnfLapOTkxLvgmR2- > M&m=f4oAWXtPlhIm5cEShA0Amlf1ZUG3PyXvVbzB9e- > I3hk&s=SA1wXw8XXPjvMbSU6TILc2vnC4KxkfoboM8RolqBmuc&e= > From cabrillo at ifca.unican.es Wed Jan 13 12:26:17 2021 From: cabrillo at ifca.unican.es (Iban Cabrillo) Date: Wed, 13 Jan 2021 13:26:17 +0100 (CET) Subject: [gpfsug-discuss] Disk in unrecovered state In-Reply-To: References: <1959609895.205347.1610461943831.JavaMail.zimbra@ifca.unican.es> <3a7cf4ec9ee8458cb8275453f7b9d0a1@huk-coburg.de> <753758367.214256.1610467143052.JavaMail.zimbra@ifca.unican.es> Message-ID: <1691473688.446197.1610540777278.JavaMail.zimbra@ifca.unican.es> Thanks a lot!! Guys, this do the trick Now, whole disks are up again and the FS has been mounted without troubles, Cheers, I From anacreo at gmail.com Wed Jan 20 11:09:27 2021 From: anacreo at gmail.com (Alec) Date: Wed, 20 Jan 2021 03:09:27 -0800 Subject: [gpfsug-discuss] Spectrum Scale 5 and Reading Compressed Data Message-ID: We have AIX and Spectrum Scale 5.1 and are compressing older data. We can compress data at about 10GB/minute and decompress data wicked fast using mmchattr, when a user reads data from a compressed file via application open / read calls.... it moves at about 5MB/s. Normally our I/O pipeline allows for 2400MB/s on a single file read. What can we look at to speed up the read of the compressed data, are there any tunables that might affect this? As it is now if the backup daemon is backing up a compressed file, it can get stuck for hours, I will go and mmchattr to decompress the file, within a minute the file is decompressed, and backed up, then I simply recompress the file once backup has moved on. Any advice on how to improve the compressed reads under AIX would be very helpful. Alec -------------- next part -------------- An HTML attachment was scrubbed... URL: From daniel.kidger at uk.ibm.com Wed Jan 20 11:59:39 2021 From: daniel.kidger at uk.ibm.com (Daniel Kidger) Date: Wed, 20 Jan 2021 11:59:39 +0000 Subject: [gpfsug-discuss] Spectrum Scale 5 and Reading Compressed Data In-Reply-To: References: Message-ID: An HTML attachment was scrubbed... URL: From janfrode at tanso.net Wed Jan 20 14:47:07 2021 From: janfrode at tanso.net (Jan-Frode Myklebust) Date: Wed, 20 Jan 2021 15:47:07 +0100 Subject: [gpfsug-discuss] Spectrum Scale 5 and Reading Compressed Data In-Reply-To: References: Message-ID: This sounds like a bug to me... (I wouldn't expect mmchattr works on different node than other file access). I would check "mmdiag --iohist verbose" during these slow reads, to see if it gives a hint at what it's doing, versus what it shows during "mmchattr". Maybe one is triggering prefetch, while the other is some kind of random IO ? Also might be worth to try a mmtrace. Compare the traces for mmtrace start trace="all 0 vnode 1 vnop 1 io 1" cat compressedLargeFile mmtrace stop vs.: mmtrace start trace="all 0 vnode 1 vnop 1 io 1" mmchattr --compress no someLargeFile mmtrace stop (but please make sure that the file wasn't already uncompressed in pagepool in this second run). -jf On Wed, Jan 20, 2021 at 12:59 PM Daniel Kidger wrote: > I think you need to think about which node the file is being decompressed > on (and if that node has plenty of space in the page pool.) > iirc mmchattr works on one of the 'manager' nodes not necessarily the node > you typed the command on? > Daniel > > _________________________________________________________ > *Daniel Kidger Ph.D.* > IBM Technical Sales Specialist > Spectrum Scale, Spectrum Discover and IBM Cloud Object Storage > > +44-(0)7818 522 266 > daniel.kidger at uk.ibm.com > > > > > > > > > > ----- Original message ----- > From: Alec > Sent by: gpfsug-discuss-bounces at spectrumscale.org > To: gpfsug-discuss at spectrumscale.org > Cc: > Subject: [EXTERNAL] [gpfsug-discuss] Spectrum Scale 5 and Reading > Compressed Data > Date: Wed, Jan 20, 2021 11:10 > > We have AIX and Spectrum Scale 5.1 and are compressing older data. > > We can compress data at about 10GB/minute and decompress data wicked fast > using mmchattr, when a user reads data from a compressed file via > application open / read calls.... it moves at about 5MB/s. Normally our > I/O pipeline allows for 2400MB/s on a single file read. > > What can we look at to speed up the read of the compressed data, are there > any tunables that might affect this? > > As it is now if the backup daemon is backing up a compressed file, it can > get stuck for hours, I will go and mmchattr to decompress the file, within > a minute the file is decompressed, and backed up, then I simply recompress > the file once backup has moved on. > > Any advice on how to improve the compressed reads under AIX would be very > helpful. > > Alec > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > > https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=HlQDuUjgJx4p54QzcXd0_zTwf4Cr2t3NINalNhLTA2E&m=USgQqOp8HDCg0DYjdjSVFvVOwq1rMgRYPP_hoZqgUyI&s=_hdEB3EvWW-8ZzdS1D1roh92-AicdrVMywJwQGlKTIQ&e= > > > > Unless stated otherwise above: > IBM United Kingdom Limited - Registered in England and Wales with number > 741598. > Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: From anacreo at gmail.com Wed Jan 20 22:10:39 2021 From: anacreo at gmail.com (Alec) Date: Wed, 20 Jan 2021 14:10:39 -0800 Subject: [gpfsug-discuss] Spectrum Scale 5 and Reading Compressed Data Message-ID: I see a lot of references to the page pool. Our page pool is only 8 gb and our files can be very large into the terrabytes. I will try increasing the page pool in dev to 2x a test file and see if the problem resolves. Any documentation on the correlation here would be nice. I will see if I can get rights for the debug as well. Alec -------------- next part -------------- An HTML attachment was scrubbed... URL: From anacreo at gmail.com Fri Jan 22 11:44:56 2021 From: anacreo at gmail.com (Alec) Date: Fri, 22 Jan 2021 03:44:56 -0800 Subject: [gpfsug-discuss] Spectrum Scale 5 and Reading Compressed Data In-Reply-To: References: Message-ID: When comparing compression performance I see the following performance, is anyone else getting significantly higher on any other systems? Read Speeds: lz4 with null fill data, ~ 90MB/s lz4 with a SAS data set, ~40-50MB/s z with null fill data, ~ 15MB/s z with a SAS data set, ~ 5MB/s While on a 4G page pool I tested each of these file sizes and got roughly identical performance in all cases: 1 GB, 5 GB, and 10GB. This was on an S824 (p8) with read performance typically going to 1.2GB/s of read on a single thread (non-compressed). Doing a "very limited test" in Production hardware E850, 8gb Page Pool, with ~2.4 GB/s of read on a single thread (non-compressed) I got very similar results. In all cases the work was done from the NSD master, and due to the file sizes and the difference in page pool, i'd expect the 1gb files to move at a significantly faster pace if pagepool was a factor. If anyone could tell me what performance they get on their platform and what OS or Hardware they're using, I'd very much be interested. I'm debating if using GPFS to migrate the files to a .gz compressed version, and then providing a fifo mechanism to pipe through the compressed data wouldn't be a better solution. Alec On Wed, Jan 20, 2021 at 2:10 PM Alec wrote: > I see a lot of references to the page pool. Our page pool is only 8 gb and > our files can be very large into the terrabytes. > > I will try increasing the page pool in dev to 2x a test file and see if > the problem resolves. > > Any documentation on the correlation here would be nice. > > I will see if I can get rights for the debug as well. > > Alec > -------------- next part -------------- An HTML attachment was scrubbed... URL: From cabrillo at ifca.unican.es Wed Jan 27 13:20:08 2021 From: cabrillo at ifca.unican.es (Iban Cabrillo) Date: Wed, 27 Jan 2021 14:20:08 +0100 (CET) Subject: [gpfsug-discuss] cannot unmount fs Message-ID: <1692854831.1435321.1611753608516.JavaMail.zimbra@ifca.unican.es> Dear, We have a couple of GPFS fs, gpfs mount on /gpfs and gpfs2 mount on /gpfs/external, the problem is the mount path of the second fs sometimes is missied I am trying to mmumount this FS in order to change the mount path. but I cann't. If I make a mmumont gpfs2 or mmumount /gpfs/external I get this error: [root at gpfsgui ~]# mmumount gpfs2 Wed Jan 27 14:11:07 CET 2021: mmumount: Unmounting file systems ... umount: /gpfs/external: not mounted (/gpfs/external path exists) If I try to mmchfs -T XXX , the system says that the FS is already mounted. But there is no error in the logs. Any Idea? Regards, I -------------- next part -------------- An HTML attachment was scrubbed... URL: From olaf.weiser at de.ibm.com Wed Jan 27 13:28:44 2021 From: olaf.weiser at de.ibm.com (Olaf Weiser) Date: Wed, 27 Jan 2021 13:28:44 +0000 Subject: [gpfsug-discuss] cannot unmount fs In-Reply-To: <1692854831.1435321.1611753608516.JavaMail.zimbra@ifca.unican.es> References: <1692854831.1435321.1611753608516.JavaMail.zimbra@ifca.unican.es> Message-ID: An HTML attachment was scrubbed... URL: From owen.morgan at motionpicturesolutions.com Wed Jan 27 17:14:45 2021 From: owen.morgan at motionpicturesolutions.com (Owen Morgan) Date: Wed, 27 Jan 2021 17:14:45 +0000 Subject: [gpfsug-discuss] General Introduction Message-ID: Hi Everyone, First off thanks for this user group existing! I've already watched a load of the great webinars that were uploaded to YouTube! My name is Owen Morgan and I'm currently the 'Archivist' at Motion Picture Solutions in the UK. MPS is a post-production and distribution facility for the major studios and a multitude of smaller studios. Their main area of operation is mastering and localisation of feature films along with trailer creation etc.. They also then have a combined Hard drive and Internet based distribution arm that can distribute all that content to all cinemas in the UK and, with a huge number of growing partners and co-investors, globally as well. My role started of primarily as just archiving data to tar based LTO tapes, but in recent times has moved to using Spectrum Scale and Spectrum Archive and now to pretty much managing those systems from a sysadmin level. Recently MPS invested in a Spectrum Scale system for their content network, and again, I'm starting to take over management of that both on a ILM perspective and actively involved with maintenance and support. Enough about me. I have a 'first question' but will send that over separately over the next day or so to stop this email being a novella! Thanks and nice to meet people! Owen. [Sent from Front] Owen Morgan Data Wrangler Motion Picture Solutions Ltd T: E: owen.morgan at motionpicturesolutions.com | W: motionpicturesolutions.com A: Mission Hall, 9-11 North End Road, London, W14 8ST Motion Picture Solutions Ltd is a company registered in England and Wales under number 5388229, VAT number 201330482 -------------- next part -------------- An HTML attachment was scrubbed... URL: From owen.morgan at motionpicturesolutions.com Wed Jan 27 22:17:09 2021 From: owen.morgan at motionpicturesolutions.com (Owen Morgan) Date: Wed, 27 Jan 2021 22:17:09 +0000 Subject: [gpfsug-discuss] Policy Rules Syntax to find files older than X days excluding weekends in the calculation.... Message-ID: Hi Everyone, First question from me I appreciate this is policy engine thing as opposed to more fundamental Spectrum Scale so hope its ok! I'm trying to find a 'neat' way within a couple of policy rules to measure different time intervals (in days) but solely interested in WEEK DAYS only (ie delete files older than X week days only). An example is one of the rules a team would like implemented is delete all files older than 10 business days (ie week days only. We are ignoring public holidays as if they don't exist). Followed by a separate rule for a different folder of deleting all files older than 4 business days. The only way I've been able to facilitate this so far for the 4 business days is to separate out Fridays as a separate rule (because Friday - 4 days are all week days), then a separate rule for Monday through Thursday (because timestamp - 4 days has to factor in weekends, so has to actually set the INTERVAL to 6 days). Likewise for the 10 days rule I have to have a method to separate out Monday - Wednesday, and Thursday and Friday. I feel my 'solution', which does work, is extremely messy and not ideal should they want to add more rules as it just makes the policy file very long full of random definitions for all the different scenarios. So whilst the 'rules' are simple thanks to definitions, its the definitions themselves that are stacking up... depending on the interval required I have to create a unique set of is_weekday definitions and unique is_older_than_xdays definitions. here is a snippet of the policy: define( is_older_than_4days, ( (CURRENT_TIMESTAMP - CREATION_TIME) >= INTERVAL '4' DAYS ) ) define( is_older_than_6days, ( (CURRENT_TIMESTAMP - CREATION_TIME) >= INTERVAL '6' DAYS ) ) define( is_weekday_ex_fri, ( DAYOFWEEK(CURRENT_DATE) IN (2,3,4,5) ) ) define( is_weekday_ex_fri, ( DAYOFWEEK(CURRENT_DATE) = 6 ) ) RULE 'rule name' WHEN is_weekday_ex_fri DELETE WHERE include_list /* an include list just not added above */ AND is_older_than_6days RULE 'rule name' WHEN is_fri DELETE WHERE include_list /* an include list just not added above */ AND is_older_than_4days Are there any 'neat' other ways that are a tad more 'concise' for calculating INTERVAL X weekdays only which is easily and concisely extendable for any permutation of intervals required. I'm not sure how much SQL you can shoehorn into a policy before mmapplypolicy / policy engine isn't happy. Thanks in advance, Owen. [Sent from Front] Owen Morgan Data Wrangler Motion Picture Solutions Ltd T: E: owen.morgan at motionpicturesolutions.com | W: motionpicturesolutions.com A: Mission Hall, 9-11 North End Road, London, W14 8ST Motion Picture Solutions Ltd is a company registered in England and Wales under number 5388229, VAT number 201330482 -------------- next part -------------- An HTML attachment was scrubbed... URL: From owen.morgan at motionpicturesolutions.com Thu Jan 28 14:27:35 2021 From: owen.morgan at motionpicturesolutions.com (Owen Morgan) Date: Thu, 28 Jan 2021 14:27:35 +0000 Subject: [gpfsug-discuss] Policy Rules Syntax to find files older than X days excluding weekends in the calculation.... In-Reply-To: <1360632-1611790655.643971@r36M.X7Dl.WWDV> References: , <1360632-1611790655.643971@r36M.X7Dl.WWDV> Message-ID: Mark, Thank you for taking the time to comment, I genuinely appreciate it! I will digest and look at the mmfind examples (to be honest, didn't know it was a thing.....). Everything I know about Spectrum Scale (and Spectrum Archive) has been self taught so...... I'm pretty sure I'm missing Soooooooooo much useful info! I wish there was like a dummies guide (I've read the redbooks and admin guides as best I can but I know my knowledge is patchy at best)! Once digested I may, or may not, have further questions but I genuinely thank you for your assistance. Owen. [Sent from Front] Owen Morgan Data Wrangler Motion Picture Solutions Ltd T: E: owen.morgan at motionpicturesolutions.com | W: motionpicturesolutions.com A: Mission Hall, 9-11 North End Road, London, W14 8ST Motion Picture Solutions Ltd is a company registered in England and Wales under number 5388229, VAT number 201330482 On Wed, Jan 27 at 11:53 pm, > mark.bergman at uphs.upenn.edu wrote: In the message dated: Wed, 27 Jan 2021 22:17:09 +0000, The pithy ruminations from Owen Morgan on [[External] [gpfsug-discuss] Policy Rules Syntax to find files older than X days excluding weekends in the calculation....] were: => Hi Everyone, => => First question from me I appreciate this is policy engine thing as => opposed to more fundamental Spectrum Scale so hope its ok! It's great. => => I'm trying to find a 'neat' way within a couple of policy rules to => measure different time intervals (in days) but solely interested in WEEK => DAYS only (ie delete files older than X week days only). Policy SQL syntax gives me a headache. For this kind of task, I find that mmfind is your friend -- it's in the "examples" source dir within /usr/lpp/mmfs. Trivial to compile & install. Easier to debug, and it will generate the SQL. => => An example is one of the rules a team would like implemented is delete => all files older than 10 business days (ie week days only. We are What about "delete all files older than 12 calendar days" -- by definition, those files are older than 10 business days as well. => ignoring public holidays as if they don't exist). Followed by a separate => rule for a different folder of deleting all files older than 4 business => days. Or, older than 6 calendar days. Or, run this nightly: #! /bin/bash dateOffset=0 if [ `date '+%u'` -le 4 ] ; then # Mon=1, Tue=2, Wed=3, Thu=4 # # For a file to be more than 4 business days old on-or-before the # 4th day of the week, it must span the weekend, so offset the number # of required days in the file age dateOffset=2 fi mmfind -mtime $((4 + $dateOffset)) /path/to/Nuke/After/4/Days -xarg rm -f => => Thanks in advance, => => Owen. [Sent from Front] => => Owen Morgan Data Wrangler Motion Picture Solutions Ltd T: E: => owen.morgan at motionpicturesolutions.com | W: motionpicturesolutions.com => A: Mission Hall, 9-11 North End Road, London, W14 8ST Motion Picture => Solutions Ltd is a company registered in England and Wales under number => 5388229, VAT number 201330482 => -------------- next part -------------- An HTML attachment was scrubbed... URL: From jamervi at sandia.gov Thu Jan 28 18:26:37 2021 From: jamervi at sandia.gov (Mervini, Joseph A) Date: Thu, 28 Jan 2021 18:26:37 +0000 Subject: [gpfsug-discuss] Number of vCPUs exceeded Message-ID: <59193954-B649-4DF5-AD21-652922E49FD9@contoso.com> Hi, I haven?t seen this before but one of my remote cluster users reported the system in question is experiencing high loads and is with Scale unmounting the file system. This is the output she is seeing: Wed Jan 27 22:18:34.168 2021: [I] GPFS vCPU limits: Low warning limit 3 vCPUs, High warning limit 256 vCPUs, Hard limit 1536 vCPUs. Wed Jan 27 22:18:34.169 2021: [I] GPFS vCPU limits include all vCPUs that Linux sees as online or possibly online via hot add, ht/smt changes, etc. Wed Jan 27 22:18:34.170 2021: [X] GPFS detected 1792 vCPUs. This exceeds the warning limit of 256 vCPUs and the hard limit of 1536 vCPUs. GPFS will shutdown Any help will be appreciated. Thanks, Joe ==== Joe Mervini Sandia National Laboratories High Performance Computing 505.844.6770 jamervi at sandia.gov -------------- next part -------------- An HTML attachment was scrubbed... URL: From mzp at us.ibm.com Thu Jan 28 18:42:56 2021 From: mzp at us.ibm.com (Madhav Ponamgi1) Date: Thu, 28 Jan 2021 13:42:56 -0500 Subject: [gpfsug-discuss] gpfsug-discuss Digest, Vol 108, Issue 18 In-Reply-To: References: Message-ID: To calculate this directly (if you don't want to depend on a utility) consider the following steps. There are many more such algorithms in the wonderful book Calenderical Calculations. Take the last two digits of the year. Divide by 4, discarding any fraction. Add the day of the month. Add the month's key value: JFM AMJ JAS OND 144 025 036 146 Subtract 1 for January or February of a leap year. For a Gregorian date, add 0 for 1900's, 6 for 2000's, 4 for 1700's, 2 for 1800's; for other years, add or subtract multiples of 400. For a Julian date, add 1 for 1700's, and 1 for every additional century you go back. Add the last two digits of the year. Divide by 7 and take the remainder. --- Madhav mzp at us.ibm.com From: gpfsug-discuss-request at spectrumscale.org To: gpfsug-discuss at spectrumscale.org Date: 01/28/2021 01:32 PM Subject: [EXTERNAL] gpfsug-discuss Digest, Vol 108, Issue 18 Sent by: gpfsug-discuss-bounces at spectrumscale.org Send gpfsug-discuss mailing list submissions to gpfsug-discuss at spectrumscale.org To subscribe or unsubscribe via the World Wide Web, visit http://gpfsug.org/mailman/listinfo/gpfsug-discuss or, via email, send a message with subject or body 'help' to gpfsug-discuss-request at spectrumscale.org You can reach the person managing the list at gpfsug-discuss-owner at spectrumscale.org When replying, please edit your Subject line so it is more specific than "Re: Contents of gpfsug-discuss digest..." Today's Topics: 1. Re: Policy Rules Syntax to find files older than X days excluding weekends in the calculation.... (Owen Morgan) 2. Number of vCPUs exceeded (Mervini, Joseph A) ---------------------------------------------------------------------- Message: 1 Date: Thu, 28 Jan 2021 14:27:35 +0000 From: Owen Morgan To: "mark.bergman at uphs.upenn.edu" , "gpfsug-discuss at spectrumscale.org" Subject: Re: [gpfsug-discuss] Policy Rules Syntax to find files older than X days excluding weekends in the calculation.... Message-ID: Content-Type: text/plain; charset="utf-8" Mark, Thank you for taking the time to comment, I genuinely appreciate it! I will digest and look at the mmfind examples (to be honest, didn't know it was a thing.....). Everything I know about Spectrum Scale (and Spectrum Archive) has been self taught so...... I'm pretty sure I'm missing Soooooooooo much useful info! I wish there was like a dummies guide (I've read the redbooks and admin guides as best I can but I know my knowledge is patchy at best)! Once digested I may, or may not, have further questions but I genuinely thank you for your assistance. Owen. [Sent from Front] Owen Morgan Data Wrangler Motion Picture Solutions Ltd T: E: owen.morgan at motionpicturesolutions.com | W: motionpicturesolutions.com A: Mission Hall, 9-11 North End Road, London, W14 8ST Motion Picture Solutions Ltd is a company registered in England and Wales under number 5388229, VAT number 201330482 On Wed, Jan 27 at 11:53 pm, > mark.bergman at uphs.upenn.edu< mailto:mark.bergman at uphs.upenn.edu> wrote: In the message dated: Wed, 27 Jan 2021 22:17:09 +0000, The pithy ruminations from Owen Morgan on [[External] [gpfsug-discuss] Policy Rules Syntax to find files older than X days excluding weekends in the calculation....] were: => Hi Everyone, => => First question from me I appreciate this is policy engine thing as => opposed to more fundamental Spectrum Scale so hope its ok! It's great. => => I'm trying to find a 'neat' way within a couple of policy rules to => measure different time intervals (in days) but solely interested in WEEK => DAYS only (ie delete files older than X week days only). Policy SQL syntax gives me a headache. For this kind of task, I find that mmfind is your friend -- it's in the "examples" source dir within /usr/lpp/mmfs. Trivial to compile & install. Easier to debug, and it will generate the SQL. => => An example is one of the rules a team would like implemented is delete => all files older than 10 business days (ie week days only. We are What about "delete all files older than 12 calendar days" -- by definition, those files are older than 10 business days as well. => ignoring public holidays as if they don't exist). Followed by a separate => rule for a different folder of deleting all files older than 4 business => days. Or, older than 6 calendar days. Or, run this nightly: #! /bin/bash dateOffset=0 if [ `date '+%u'` -le 4 ] ; then # Mon=1, Tue=2, Wed=3, Thu=4 # # For a file to be more than 4 business days old on-or-before the # 4th day of the week, it must span the weekend, so offset the number # of required days in the file age dateOffset=2 fi mmfind -mtime $((4 + $dateOffset)) /path/to/Nuke/After/4/Days -xarg rm -f => => Thanks in advance, => => Owen. [Sent from Front] => => Owen Morgan Data Wrangler Motion Picture Solutions Ltd T: E: => owen.morgan at motionpicturesolutions.com< mailto:owen.morgan at motionpicturesolutions.com> | W: motionpicturesolutions.com< http://motionpicturesolutions.com > => A: Mission Hall, 9-11 North End Road, London, W14 8ST Motion Picture => Solutions Ltd is a company registered in England and Wales under number => 5388229, VAT number 201330482 => -------------- next part -------------- An HTML attachment was scrubbed... URL: < http://gpfsug.org/pipermail/gpfsug-discuss/attachments/20210128/201a280e/attachment-0001.html > ------------------------------ Message: 2 Date: Thu, 28 Jan 2021 18:26:37 +0000 From: "Mervini, Joseph A" To: "gpfsug-discuss at spectrumscale.org" Subject: [gpfsug-discuss] Number of vCPUs exceeded Message-ID: <59193954-B649-4DF5-AD21-652922E49FD9 at contoso.com> Content-Type: text/plain; charset="utf-8" Hi, I haven?t seen this before but one of my remote cluster users reported the system in question is experiencing high loads and is with Scale unmounting the file system. This is the output she is seeing: Wed Jan 27 22:18:34.168 2021: [I] GPFS vCPU limits: Low warning limit 3 vCPUs, High warning limit 256 vCPUs, Hard limit 1536 vCPUs. Wed Jan 27 22:18:34.169 2021: [I] GPFS vCPU limits include all vCPUs that Linux sees as online or possibly online via hot add, ht/smt changes, etc. Wed Jan 27 22:18:34.170 2021: [X] GPFS detected 1792 vCPUs. This exceeds the warning limit of 256 vCPUs and the hard limit of 1536 vCPUs. GPFS will shutdown Any help will be appreciated. Thanks, Joe ==== Joe Mervini Sandia National Laboratories High Performance Computing 505.844.6770 jamervi at sandia.gov -------------- next part -------------- An HTML attachment was scrubbed... URL: < http://gpfsug.org/pipermail/gpfsug-discuss/attachments/20210128/930fadb1/attachment.html > ------------------------------ _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss End of gpfsug-discuss Digest, Vol 108, Issue 18 *********************************************** -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: From knop at us.ibm.com Thu Jan 28 18:55:36 2021 From: knop at us.ibm.com (Felipe Knop) Date: Thu, 28 Jan 2021 18:55:36 +0000 Subject: [gpfsug-discuss] Number of vCPUs exceeded In-Reply-To: <59193954-B649-4DF5-AD21-652922E49FD9@contoso.com> References: <59193954-B649-4DF5-AD21-652922E49FD9@contoso.com> Message-ID: An HTML attachment was scrubbed... URL: From UWEFALKE at de.ibm.com Thu Jan 28 19:54:38 2021 From: UWEFALKE at de.ibm.com (Uwe Falke) Date: Thu, 28 Jan 2021 20:54:38 +0100 Subject: [gpfsug-discuss] gpfsug-discuss Digest, Vol 108, Issue 18 In-Reply-To: References: Message-ID: sounds quite complicated. if all public holidays can be ignored it is simple: the algorithm has only to run on week days (the effective age of files would not change on weekend days.). To find the latest date to remove files: Now, enumerate the weekdays, starting with Mon=1 If your max age is T find the integer multiple of 5 and the remainder such that T=T_i*5 +R Determine the current DoW in terms of your enumeration. if DoW - R > 0, your max age date is Dx=D-(R+7*T_i) else your max age date is Dx=D-(R+2+7*T_i dates can be easily compiled in epoch, like D_e=$(date +%s), Dx_e = D_e - 86400*(R+7*T_i) or Dx_e = D_e - 86400*(R+2+7*T_i) you then need to convert the found epoch time back into a christian date which could be done by date --date='@ To: gpfsug-discuss at spectrumscale.org Date: 28/01/2021 19:43 Subject: [EXTERNAL] Re: [gpfsug-discuss] gpfsug-discuss Digest, Vol 108, Issue 18 Sent by: gpfsug-discuss-bounces at spectrumscale.org To calculate this directly (if you don't want to depend on a utility) consider the following steps. There are many more such algorithms in the wonderful book Calenderical Calculations. 1. Take the last two digits of the year. 2. Divide by 4, discarding any fraction. 3. Add the day of the month. 4. Add the month's key value: JFM AMJ JAS OND 144 025 036 146 5. Subtract 1 for January or February of a leap year. 6. For a Gregorian date, add 0 for 1900's, 6 for 2000's, 4 for 1700's, 2 for 1800's; for other years, add or subtract multiples of 400. 7. For a Julian date, add 1 for 1700's, and 1 for every additional century you go back. 8. Add the last two digits of the year. 9. Divide by 7 and take the remainder. --- Madhav mzp at us.ibm.com gpfsug-discuss-request---01/28/2021 01:32:13 PM---Send gpfsug-discuss mailing list submissions to gpfsug-discuss at spectrumscale.org From: gpfsug-discuss-request at spectrumscale.org To: gpfsug-discuss at spectrumscale.org Date: 01/28/2021 01:32 PM Subject: [EXTERNAL] gpfsug-discuss Digest, Vol 108, Issue 18 Sent by: gpfsug-discuss-bounces at spectrumscale.org Send gpfsug-discuss mailing list submissions to gpfsug-discuss at spectrumscale.org To subscribe or unsubscribe via the World Wide Web, visit http://gpfsug.org/mailman/listinfo/gpfsug-discuss or, via email, send a message with subject or body 'help' to gpfsug-discuss-request at spectrumscale.org You can reach the person managing the list at gpfsug-discuss-owner at spectrumscale.org When replying, please edit your Subject line so it is more specific than "Re: Contents of gpfsug-discuss digest..." Today's Topics: 1. Re: Policy Rules Syntax to find files older than X days excluding weekends in the calculation.... (Owen Morgan) 2. Number of vCPUs exceeded (Mervini, Joseph A) ---------------------------------------------------------------------- Message: 1 Date: Thu, 28 Jan 2021 14:27:35 +0000 From: Owen Morgan To: "mark.bergman at uphs.upenn.edu" , "gpfsug-discuss at spectrumscale.org" Subject: Re: [gpfsug-discuss] Policy Rules Syntax to find files older than X days excluding weekends in the calculation.... Message-ID: Content-Type: text/plain; charset="utf-8" Mark, Thank you for taking the time to comment, I genuinely appreciate it! I will digest and look at the mmfind examples (to be honest, didn't know it was a thing.....). Everything I know about Spectrum Scale (and Spectrum Archive) has been self taught so...... I'm pretty sure I'm missing Soooooooooo much useful info! I wish there was like a dummies guide (I've read the redbooks and admin guides as best I can but I know my knowledge is patchy at best)! Once digested I may, or may not, have further questions but I genuinely thank you for your assistance. Owen. [Sent from Front] Owen Morgan Data Wrangler Motion Picture Solutions Ltd T: E: owen.morgan at motionpicturesolutions.com | W: motionpicturesolutions.com A: Mission Hall, 9-11 North End Road, London, W14 8ST Motion Picture Solutions Ltd is a company registered in England and Wales under number 5388229, VAT number 201330482 On Wed, Jan 27 at 11:53 pm, > mark.bergman at uphs.upenn.edu< mailto:mark.bergman at uphs.upenn.edu> wrote: In the message dated: Wed, 27 Jan 2021 22:17:09 +0000, The pithy ruminations from Owen Morgan on [[External] [gpfsug-discuss] Policy Rules Syntax to find files older than X days excluding weekends in the calculation....] were: => Hi Everyone, => => First question from me I appreciate this is policy engine thing as => opposed to more fundamental Spectrum Scale so hope its ok! It's great. => => I'm trying to find a 'neat' way within a couple of policy rules to => measure different time intervals (in days) but solely interested in WEEK => DAYS only (ie delete files older than X week days only). Policy SQL syntax gives me a headache. For this kind of task, I find that mmfind is your friend -- it's in the "examples" source dir within /usr/lpp/mmfs. Trivial to compile & install. Easier to debug, and it will generate the SQL. => => An example is one of the rules a team would like implemented is delete => all files older than 10 business days (ie week days only. We are What about "delete all files older than 12 calendar days" -- by definition, those files are older than 10 business days as well. => ignoring public holidays as if they don't exist). Followed by a separate => rule for a different folder of deleting all files older than 4 business => days. Or, older than 6 calendar days. Or, run this nightly: #! /bin/bash dateOffset=0 if [ `date '+%u'` -le 4 ] ; then # Mon=1, Tue=2, Wed=3, Thu=4 # # For a file to be more than 4 business days old on-or-before the # 4th day of the week, it must span the weekend, so offset the number # of required days in the file age dateOffset=2 fi mmfind -mtime $((4 + $dateOffset)) /path/to/Nuke/After/4/Days -xarg rm -f => => Thanks in advance, => => Owen. [Sent from Front] => => Owen Morgan Data Wrangler Motion Picture Solutions Ltd T: E: => owen.morgan at motionpicturesolutions.com< mailto:owen.morgan at motionpicturesolutions.com> | W: motionpicturesolutions.com => A: Mission Hall, 9-11 North End Road, London, W14 8ST Motion Picture => Solutions Ltd is a company registered in England and Wales under number => 5388229, VAT number 201330482 => -------------- next part -------------- An HTML attachment was scrubbed... URL: < http://gpfsug.org/pipermail/gpfsug-discuss/attachments/20210128/201a280e/attachment-0001.html > ------------------------------ Message: 2 Date: Thu, 28 Jan 2021 18:26:37 +0000 From: "Mervini, Joseph A" To: "gpfsug-discuss at spectrumscale.org" Subject: [gpfsug-discuss] Number of vCPUs exceeded Message-ID: <59193954-B649-4DF5-AD21-652922E49FD9 at contoso.com> Content-Type: text/plain; charset="utf-8" Hi, I haven?t seen this before but one of my remote cluster users reported the system in question is experiencing high loads and is with Scale unmounting the file system. This is the output she is seeing: Wed Jan 27 22:18:34.168 2021: [I] GPFS vCPU limits: Low warning limit 3 vCPUs, High warning limit 256 vCPUs, Hard limit 1536 vCPUs. Wed Jan 27 22:18:34.169 2021: [I] GPFS vCPU limits include all vCPUs that Linux sees as online or possibly online via hot add, ht/smt changes, etc. Wed Jan 27 22:18:34.170 2021: [X] GPFS detected 1792 vCPUs. This exceeds the warning limit of 256 vCPUs and the hard limit of 1536 vCPUs. GPFS will shutdown Any help will be appreciated. Thanks, Joe ==== Joe Mervini Sandia National Laboratories High Performance Computing 505.844.6770 jamervi at sandia.gov -------------- next part -------------- An HTML attachment was scrubbed... URL: < http://gpfsug.org/pipermail/gpfsug-discuss/attachments/20210128/930fadb1/attachment.html > ------------------------------ _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss End of gpfsug-discuss Digest, Vol 108, Issue 18 *********************************************** _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss From mzp at us.ibm.com Fri Jan 29 12:38:37 2021 From: mzp at us.ibm.com (Madhav Ponamgi1) Date: Fri, 29 Jan 2021 07:38:37 -0500 Subject: [gpfsug-discuss] gpfsug-discuss Digest, Vol 108, Issue 20 In-Reply-To: References: Message-ID: Here is a simple C function posted from comp.lang.c many years ago that works for a restricted range (year > 1752) based on the algorithm I described earlier. dayofweek(y, m, d) { y -= m < 3; return (y + y/4 - y/100 + y/400 + "-bed=pen+mad."[m] + d) % 7; } --- Madhav From: gpfsug-discuss-request at spectrumscale.org To: gpfsug-discuss at spectrumscale.org Date: 01/29/2021 07:00 AM Subject: [EXTERNAL] gpfsug-discuss Digest, Vol 108, Issue 20 Sent by: gpfsug-discuss-bounces at spectrumscale.org Send gpfsug-discuss mailing list submissions to gpfsug-discuss at spectrumscale.org To subscribe or unsubscribe via the World Wide Web, visit http://gpfsug.org/mailman/listinfo/gpfsug-discuss or, via email, send a message with subject or body 'help' to gpfsug-discuss-request at spectrumscale.org You can reach the person managing the list at gpfsug-discuss-owner at spectrumscale.org When replying, please edit your Subject line so it is more specific than "Re: Contents of gpfsug-discuss digest..." Today's Topics: 1. Re: gpfsug-discuss Digest, Vol 108, Issue 18 (Uwe Falke) ---------------------------------------------------------------------- Message: 1 Date: Thu, 28 Jan 2021 20:54:38 +0100 From: "Uwe Falke" To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] gpfsug-discuss Digest, Vol 108, Issue 18 Message-ID: Content-Type: text/plain; charset="ISO-8859-1" sounds quite complicated. if all public holidays can be ignored it is simple: the algorithm has only to run on week days (the effective age of files would not change on weekend days.). To find the latest date to remove files: Now, enumerate the weekdays, starting with Mon=1 If your max age is T find the integer multiple of 5 and the remainder such that T=T_i*5 +R Determine the current DoW in terms of your enumeration. if DoW - R > 0, your max age date is Dx=D-(R+7*T_i) else your max age date is Dx=D-(R+2+7*T_i dates can be easily compiled in epoch, like D_e=$(date +%s), Dx_e = D_e - 86400*(R+7*T_i) or Dx_e = D_e - 86400*(R+2+7*T_i) you then need to convert the found epoch time back into a christian date which could be done by date --date='@ To: gpfsug-discuss at spectrumscale.org Date: 28/01/2021 19:43 Subject: [EXTERNAL] Re: [gpfsug-discuss] gpfsug-discuss Digest, Vol 108, Issue 18 Sent by: gpfsug-discuss-bounces at spectrumscale.org To calculate this directly (if you don't want to depend on a utility) consider the following steps. There are many more such algorithms in the wonderful book Calenderical Calculations. 1. Take the last two digits of the year. 2. Divide by 4, discarding any fraction. 3. Add the day of the month. 4. Add the month's key value: JFM AMJ JAS OND 144 025 036 146 5. Subtract 1 for January or February of a leap year. 6. For a Gregorian date, add 0 for 1900's, 6 for 2000's, 4 for 1700's, 2 for 1800's; for other years, add or subtract multiples of 400. 7. For a Julian date, add 1 for 1700's, and 1 for every additional century you go back. 8. Add the last two digits of the year. 9. Divide by 7 and take the remainder. --- Madhav mzp at us.ibm.com gpfsug-discuss-request---01/28/2021 01:32:13 PM---Send gpfsug-discuss mailing list submissions to gpfsug-discuss at spectrumscale.org From: gpfsug-discuss-request at spectrumscale.org To: gpfsug-discuss at spectrumscale.org Date: 01/28/2021 01:32 PM Subject: [EXTERNAL] gpfsug-discuss Digest, Vol 108, Issue 18 Sent by: gpfsug-discuss-bounces at spectrumscale.org Send gpfsug-discuss mailing list submissions to gpfsug-discuss at spectrumscale.org To subscribe or unsubscribe via the World Wide Web, visit http://gpfsug.org/mailman/listinfo/gpfsug-discuss or, via email, send a message with subject or body 'help' to gpfsug-discuss-request at spectrumscale.org You can reach the person managing the list at gpfsug-discuss-owner at spectrumscale.org When replying, please edit your Subject line so it is more specific than "Re: Contents of gpfsug-discuss digest..." Today's Topics: 1. Re: Policy Rules Syntax to find files older than X days excluding weekends in the calculation.... (Owen Morgan) 2. Number of vCPUs exceeded (Mervini, Joseph A) ---------------------------------------------------------------------- Message: 1 Date: Thu, 28 Jan 2021 14:27:35 +0000 From: Owen Morgan To: "mark.bergman at uphs.upenn.edu" , "gpfsug-discuss at spectrumscale.org" Subject: Re: [gpfsug-discuss] Policy Rules Syntax to find files older than X days excluding weekends in the calculation.... Message-ID: Content-Type: text/plain; charset="utf-8" Mark, Thank you for taking the time to comment, I genuinely appreciate it! I will digest and look at the mmfind examples (to be honest, didn't know it was a thing.....). Everything I know about Spectrum Scale (and Spectrum Archive) has been self taught so...... I'm pretty sure I'm missing Soooooooooo much useful info! I wish there was like a dummies guide (I've read the redbooks and admin guides as best I can but I know my knowledge is patchy at best)! Once digested I may, or may not, have further questions but I genuinely thank you for your assistance. Owen. [Sent from Front] Owen Morgan Data Wrangler Motion Picture Solutions Ltd T: E: owen.morgan at motionpicturesolutions.com | W: motionpicturesolutions.com A: Mission Hall, 9-11 North End Road, London, W14 8ST Motion Picture Solutions Ltd is a company registered in England and Wales under number 5388229, VAT number 201330482 On Wed, Jan 27 at 11:53 pm, > mark.bergman at uphs.upenn.edu< mailto:mark.bergman at uphs.upenn.edu> wrote: In the message dated: Wed, 27 Jan 2021 22:17:09 +0000, The pithy ruminations from Owen Morgan on [[External] [gpfsug-discuss] Policy Rules Syntax to find files older than X days excluding weekends in the calculation....] were: => Hi Everyone, => => First question from me I appreciate this is policy engine thing as => opposed to more fundamental Spectrum Scale so hope its ok! It's great. => => I'm trying to find a 'neat' way within a couple of policy rules to => measure different time intervals (in days) but solely interested in WEEK => DAYS only (ie delete files older than X week days only). Policy SQL syntax gives me a headache. For this kind of task, I find that mmfind is your friend -- it's in the "examples" source dir within /usr/lpp/mmfs. Trivial to compile & install. Easier to debug, and it will generate the SQL. => => An example is one of the rules a team would like implemented is delete => all files older than 10 business days (ie week days only. We are What about "delete all files older than 12 calendar days" -- by definition, those files are older than 10 business days as well. => ignoring public holidays as if they don't exist). Followed by a separate => rule for a different folder of deleting all files older than 4 business => days. Or, older than 6 calendar days. Or, run this nightly: #! /bin/bash dateOffset=0 if [ `date '+%u'` -le 4 ] ; then # Mon=1, Tue=2, Wed=3, Thu=4 # # For a file to be more than 4 business days old on-or-before the # 4th day of the week, it must span the weekend, so offset the number # of required days in the file age dateOffset=2 fi mmfind -mtime $((4 + $dateOffset)) /path/to/Nuke/After/4/Days -xarg rm -f => => Thanks in advance, => => Owen. [Sent from Front] => => Owen Morgan Data Wrangler Motion Picture Solutions Ltd T: E: => owen.morgan at motionpicturesolutions.com< mailto:owen.morgan at motionpicturesolutions.com> | W: motionpicturesolutions.com< http://motionpicturesolutions.com > => A: Mission Hall, 9-11 North End Road, London, W14 8ST Motion Picture => Solutions Ltd is a company registered in England and Wales under number => 5388229, VAT number 201330482 => -------------- next part -------------- An HTML attachment was scrubbed... URL: < http://gpfsug.org/pipermail/gpfsug-discuss/attachments/20210128/201a280e/attachment-0001.html > ------------------------------ Message: 2 Date: Thu, 28 Jan 2021 18:26:37 +0000 From: "Mervini, Joseph A" To: "gpfsug-discuss at spectrumscale.org" Subject: [gpfsug-discuss] Number of vCPUs exceeded Message-ID: <59193954-B649-4DF5-AD21-652922E49FD9 at contoso.com> Content-Type: text/plain; charset="utf-8" Hi, I haven?t seen this before but one of my remote cluster users reported the system in question is experiencing high loads and is with Scale unmounting the file system. This is the output she is seeing: Wed Jan 27 22:18:34.168 2021: [I] GPFS vCPU limits: Low warning limit 3 vCPUs, High warning limit 256 vCPUs, Hard limit 1536 vCPUs. Wed Jan 27 22:18:34.169 2021: [I] GPFS vCPU limits include all vCPUs that Linux sees as online or possibly online via hot add, ht/smt changes, etc. Wed Jan 27 22:18:34.170 2021: [X] GPFS detected 1792 vCPUs. This exceeds the warning limit of 256 vCPUs and the hard limit of 1536 vCPUs. GPFS will shutdown Any help will be appreciated. Thanks, Joe ==== Joe Mervini Sandia National Laboratories High Performance Computing 505.844.6770 jamervi at sandia.gov -------------- next part -------------- An HTML attachment was scrubbed... URL: < http://gpfsug.org/pipermail/gpfsug-discuss/attachments/20210128/930fadb1/attachment.html > ------------------------------ _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss End of gpfsug-discuss Digest, Vol 108, Issue 18 *********************************************** _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss ------------------------------ _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss End of gpfsug-discuss Digest, Vol 108, Issue 20 *********************************************** -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: From jpr9c at virginia.edu Fri Jan 29 19:47:13 2021 From: jpr9c at virginia.edu (Ruffner, Scott (jpr9c)) Date: Fri, 29 Jan 2021 19:47:13 +0000 Subject: [gpfsug-discuss] Adding client nodes using a shared NFS root image. Message-ID: <4A332838-9D59-477D-AAE2-F79F8AAD143B@virginia.edu> Hi everyone, We want all of our compute nodes (bare metal) to directly participate in the cluster as client nodes; of course, they are sharing a common root image. Adding nodes via the regular mmaddnode (with the dsh operation to replicate files to the clients) isn?t really viable, but if I short-circuit that, and simply generate the /var/mmfs/gen files and then manually copy those and the keyfiles to the shared root images, is that safe? Am I going about this the entirely wrong way? -- Scott Ruffner Senior HPC Engineer UVa Research Computing (434)924-6778(o) (434)295-0250(h) sruffner at virginia.edu -------------- next part -------------- An HTML attachment was scrubbed... URL: From david_johnson at brown.edu Fri Jan 29 19:52:04 2021 From: david_johnson at brown.edu (david_johnson at brown.edu) Date: Fri, 29 Jan 2021 14:52:04 -0500 Subject: [gpfsug-discuss] Adding client nodes using a shared NFS root image. In-Reply-To: <4A332838-9D59-477D-AAE2-F79F8AAD143B@virginia.edu> References: <4A332838-9D59-477D-AAE2-F79F8AAD143B@virginia.edu> Message-ID: <094EDEFE-4B15-4214-90C4-CD83BC76A10A@brown.edu> We use mmsdrrestore after the node boots. In our case these are diskless nodes provisioned by xCAT. The post install script takes care of ensuring infiniband is lit up, and does the mmsdrrestore followed by mmstartup. -- ddj Dave Johnson > On Jan 29, 2021, at 2:47 PM, Ruffner, Scott (jpr9c) wrote: > > ? > Hi everyone, > > We want all of our compute nodes (bare metal) to directly participate in the cluster as client nodes; of course, they are sharing a common root image. > > Adding nodes via the regular mmaddnode (with the dsh operation to replicate files to the clients) isn?t really viable, but if I short-circuit that, and simply generate the /var/mmfs/gen files and then manually copy those and the keyfiles to the shared root images, is that safe? > > Am I going about this the entirely wrong way? > > -- > Scott Ruffner > Senior HPC Engineer > UVa Research Computing > (434)924-6778(o) > (434)295-0250(h) > sruffner at virginia.edu > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From jpr9c at virginia.edu Fri Jan 29 20:04:32 2021 From: jpr9c at virginia.edu (Ruffner, Scott (jpr9c)) Date: Fri, 29 Jan 2021 20:04:32 +0000 Subject: [gpfsug-discuss] Adding client nodes using a shared NFS root image. In-Reply-To: <094EDEFE-4B15-4214-90C4-CD83BC76A10A@brown.edu> References: <4A332838-9D59-477D-AAE2-F79F8AAD143B@virginia.edu> <094EDEFE-4B15-4214-90C4-CD83BC76A10A@brown.edu> Message-ID: <6A72D8F2-65ED-431C-B13F-3D4F189A53DF@virginia.edu> Thanks David! Slick solution. -- Scott Ruffner Senior HPC Engineer UVa Research Computing (434)924-6778(o) (434)295-0250(h) sruffner at virginia.edu From: on behalf of "david_johnson at brown.edu" Reply-To: gpfsug main discussion list Date: Friday, January 29, 2021 at 2:52 PM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] Adding client nodes using a shared NFS root image. We use mmsdrrestore after the node boots. In our case these are diskless nodes provisioned by xCAT. The post install script takes care of ensuring infiniband is lit up, and does the mmsdrrestore followed by mmstartup. -- ddj Dave Johnson On Jan 29, 2021, at 2:47 PM, Ruffner, Scott (jpr9c) wrote: Hi everyone, We want all of our compute nodes (bare metal) to directly participate in the cluster as client nodes; of course, they are sharing a common root image. Adding nodes via the regular mmaddnode (with the dsh operation to replicate files to the clients) isn?t really viable, but if I short-circuit that, and simply generate the /var/mmfs/gen files and then manually copy those and the keyfiles to the shared root images, is that safe? Am I going about this the entirely wrong way? -- Scott Ruffner Senior HPC Engineer UVa Research Computing (434)924-6778(o) (434)295-0250(h) sruffner at virginia.edu _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From owen.morgan at motionpicturesolutions.com Sat Jan 30 00:31:27 2021 From: owen.morgan at motionpicturesolutions.com (Owen Morgan) Date: Sat, 30 Jan 2021 00:31:27 +0000 Subject: [gpfsug-discuss] gpfsug-discuss Digest, Vol 108, Issue 18 Message-ID: Hi all, Sorry I appear to have missed a load of replies and screwed up the threading thing when looking online... not used to this email group thing! Might look at the slack option! Just wanted to clarify my general issue a bit: So the methodology I've started to implement is per department policy files where all rules related to managing a specific teams assets are all in one policy file and then I have fine control over when and how each departments rule run, when, and potentially (if it mattered) what order etc. So team a want me to manage two folders where in folder 1a all files older than 4 week days of age are deleted, and in filder 1b all files older than 8 week days are deleted. They now want me to manage a different set of two folders with two different "thresholds" for how old they need to be in week days before they delete (ie. I now need additional rules for folders 2a and 2b). The issue is for each scenario there is a different 'offset' required depending on the day of the week the policy is run to maintian the number of weekdays required (the 'threshold' is always in weekdays, so intervening weekends need to be added to take them into account). For instance when run on a Monday, if the threshold were 4 weekdays of age, I need to be deleting files that were created on the previous Tuesday. Which is 6 days (ie 4 days + 2 weekend days). If the threshold was 8 week days the threhold in terms of the policy would be 12 (ie 8 plus 2x 2 weekend days). The only way I was able to work this out in the sql like policy file was to split the week days into groups where the offset would be the same (so for 4 week days, Monday through Thursday share the offset of 2 - which then has to be added to the 4 for the desired result) and then a separate rule for the Friday. However for every addition of a different threshold I have to write all new groups to match the days etc.. so the policy ends up with 6 rules but 150 lines of definition macros.... I was trying to work out if there was a more concise way of, within the sql like framework, programmatically calculating the day offest the needs to be added to the threshold to allow a more generic function that could just automatically work it out.... The algorithm I have recently thought up is to effectively calculate the difference in weeks between the current run time and the desired deletion day and multiply it by 2. Psudocode it would be (threshold is the number of week days for the rule, offset is the number that needs to be added to account for the weekends between those dates): If current day of month - threshold = sunday, then add 1 to the threshold value (sundays are de oted as the week start so Saturday would represent the previous week). Offset = (difference between current week and week of (current day of month - threshold)) x 2 A worked example: Threshold = 11 week days Policy run on the 21st Jan which is the week 4 of 2021 21st - 11 days = Sunday 10th Therefore need to add 1 to threshold to push the day into the previous week. New threshold is 12 Saturday 9th is in week 2 of 2021 so the offset is week 4 - week 2 = 2 (ie difference in weeks) x 2 which is 4. Add 4 to the original 11 to make 15. So for the policy running on the 21st Jan to delete only files older than 11 week days of age I need to set my rule to be Delete where ((Current_date - creation_time) >= interval '15' days Unfortunately, I'm now struggling to implement that algorithm..... it seems the SQL-ness is very limited and I cant declare variables to use or stuff.... its a shame as that algorithm is generic so only needs to be written once and you could have ad many unique rules as you want all with different thresholds etc... Is there another way to get the same results? I would prefer to stay in the bounds of the SQL policy rule setup as that is the framework I have created and started to implement.. Hope the above gives more clarity to what Im asking.... sorry if one of the previous rplies addresses this, if it does I clearly was confused by the response (I seriously feel like an amateur at this at the moment and am having to learn all these finer things as I go). Thanks in advance, Owen. Owen Morgan Data Wrangler Motion Picture Solutions Ltd T: E: owen.morgan at motionpicturesolutions.com | W: motionpicturesolutions.com A: Mission Hall, 9-11 North End Road, London, W14 8ST Motion Picture Solutions Ltd is a company registered in England and Wales under number 5388229, VAT number 201330482 -------------- next part -------------- An HTML attachment was scrubbed... URL: From anacreo at gmail.com Sat Jan 30 02:53:49 2021 From: anacreo at gmail.com (Alec) Date: Fri, 29 Jan 2021 18:53:49 -0800 Subject: [gpfsug-discuss] gpfsug-discuss Digest, Vol 108, Issue 18 In-Reply-To: References: Message-ID: Based on the problem you have. I would write an mmfind / mmxarg command that sets a custom attr such as puge.after, have a ksh/perl/php script that simply makes the necessary calculations using all the tricks it has... Skip files that already have the attribute set, or are too new to bother having the attribute. Then use a single purge policy to query all files that have a purge.after set to the appropriate datestamp. You could get way more concise with this mechanism and have a much simpler process. Alec On Fri, Jan 29, 2021 at 4:32 PM Owen Morgan < owen.morgan at motionpicturesolutions.com> wrote: > Hi all, > > Sorry I appear to have missed a load of replies and screwed up the > threading thing when looking online... not used to this email group thing! > Might look at the slack option! > > Just wanted to clarify my general issue a bit: > > So the methodology I've started to implement is per department policy > files where all rules related to managing a specific teams assets are all > in one policy file and then I have fine control over when and how each > departments rule run, when, and potentially (if it mattered) what order etc. > > > So team a want me to manage two folders where in folder 1a all files older > than 4 week days of age are deleted, and in filder 1b all files older than > 8 week days are deleted. > > They now want me to manage a different set of two folders with two > different "thresholds" for how old they need to be in week days before they > delete (ie. I now need additional rules for folders 2a and 2b). > > > The issue is for each scenario there is a different 'offset' required > depending on the day of the week the policy is run to maintian the number > of weekdays required (the 'threshold' is always in weekdays, so intervening > weekends need to be added to take them into account). > > For instance when run on a Monday, if the threshold were 4 weekdays of > age, I need to be deleting files that were created on the previous Tuesday. > Which is 6 days (ie 4 days + 2 weekend days). If the threshold was 8 week > days the threhold in terms of the policy would be 12 (ie 8 plus 2x 2 > weekend days). > > > The only way I was able to work this out in the sql like policy file was > to split the week days into groups where the offset would be the same (so > for 4 week days, Monday through Thursday share the offset of 2 - which then > has to be added to the 4 for the desired result) and then a separate rule > for the Friday. > > > However for every addition of a different threshold I have to write all > new groups to match the days etc.. so the policy ends up with 6 rules but > 150 lines of definition macros.... > > > I was trying to work out if there was a more concise way of, within the > sql like framework, programmatically calculating the day offest the needs > to be added to the threshold to allow a more generic function that could > just automatically work it out.... > > > The algorithm I have recently thought up is to effectively calculate the > difference in weeks between the current run time and the desired deletion > day and multiply it by 2. > > > Psudocode it would be (threshold is the number of week days for the rule, > offset is the number that needs to be added to account for the weekends > between those dates): > > > If current day of month - threshold = sunday, then add 1 to the threshold > value (sundays are de oted as the week start so Saturday would represent > the previous week). > > Offset = (difference between current week and week of (current day of > month - threshold)) x 2 > > A worked example: > > Threshold = 11 week days > Policy run on the 21st Jan which is the week 4 of 2021 > > 21st - 11 days = Sunday 10th > > Therefore need to add 1 to threshold to push the day into the previous > week. New threshold is 12 > > Saturday 9th is in week 2 of 2021 so the offset is week 4 - week 2 = 2 (ie > difference in weeks) x 2 which is 4. > > Add 4 to the original 11 to make 15. > > So for the policy running on the 21st Jan to delete only files older than > 11 week days of age I need to set my rule to be > > Delete where ((Current_date - creation_time) >= interval '15' days > > > Unfortunately, I'm now struggling to implement that algorithm..... it > seems the SQL-ness is very limited and I cant declare variables to use or > stuff.... its a shame as that algorithm is generic so only needs to be > written once and you could have ad many unique rules as you want all with > different thresholds etc... > > Is there another way to get the same results? > > I would prefer to stay in the bounds of the SQL policy rule setup as that > is the framework I have created and started to implement.. > > Hope the above gives more clarity to what Im asking.... sorry if one of > the previous rplies addresses this, if it does I clearly was confused by > the response (I seriously feel like an amateur at this at the moment and am > having to learn all these finer things as I go). > > Thanks in advance, > > Owen. > > Owen Morgan? > Data Wrangler > Motion Picture Solutions Ltd > T: > E: *owen.morgan at motionpicturesolutions.com* > | W: > *motionpicturesolutions.com* > A: Mission Hall, 9?11 North End Road , London , W14 8ST > Motion Picture Solutions Ltd is a company registered in England and Wales > under number 5388229, VAT number 201330482 > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: From anacreo at gmail.com Sat Jan 30 03:07:24 2021 From: anacreo at gmail.com (Alec) Date: Fri, 29 Jan 2021 19:07:24 -0800 Subject: [gpfsug-discuss] gpfsug-discuss Digest, Vol 108, Issue 18 In-Reply-To: References: Message-ID: Also a caution on this... you may want to retain the file's modified time in something like purge.modified... so you can also re-calc for files where purge.modified != file modified time. Else you may purge something too early. Alec On Fri, Jan 29, 2021 at 6:53 PM Alec wrote: > Based on the problem you have. > > I would write an mmfind / mmxarg command that sets a custom attr such as > puge.after, have a ksh/perl/php script that simply makes the necessary > calculations using all the tricks it has... Skip files that already have > the attribute set, or are too new to bother having the attribute. > > Then use a single purge policy to query all files that have a purge.after > set to the appropriate datestamp. > > You could get way more concise with this mechanism and have a much simpler > process. > > Alec > > On Fri, Jan 29, 2021 at 4:32 PM Owen Morgan < > owen.morgan at motionpicturesolutions.com> wrote: > >> Hi all, >> >> Sorry I appear to have missed a load of replies and screwed up the >> threading thing when looking online... not used to this email group thing! >> Might look at the slack option! >> >> Just wanted to clarify my general issue a bit: >> >> So the methodology I've started to implement is per department policy >> files where all rules related to managing a specific teams assets are all >> in one policy file and then I have fine control over when and how each >> departments rule run, when, and potentially (if it mattered) what order etc. >> >> >> So team a want me to manage two folders where in folder 1a all files >> older than 4 week days of age are deleted, and in filder 1b all files older >> than 8 week days are deleted. >> >> They now want me to manage a different set of two folders with two >> different "thresholds" for how old they need to be in week days before they >> delete (ie. I now need additional rules for folders 2a and 2b). >> >> >> The issue is for each scenario there is a different 'offset' required >> depending on the day of the week the policy is run to maintian the number >> of weekdays required (the 'threshold' is always in weekdays, so intervening >> weekends need to be added to take them into account). >> >> For instance when run on a Monday, if the threshold were 4 weekdays of >> age, I need to be deleting files that were created on the previous Tuesday. >> Which is 6 days (ie 4 days + 2 weekend days). If the threshold was 8 week >> days the threhold in terms of the policy would be 12 (ie 8 plus 2x 2 >> weekend days). >> >> >> The only way I was able to work this out in the sql like policy file was >> to split the week days into groups where the offset would be the same (so >> for 4 week days, Monday through Thursday share the offset of 2 - which then >> has to be added to the 4 for the desired result) and then a separate rule >> for the Friday. >> >> >> However for every addition of a different threshold I have to write all >> new groups to match the days etc.. so the policy ends up with 6 rules but >> 150 lines of definition macros.... >> >> >> I was trying to work out if there was a more concise way of, within the >> sql like framework, programmatically calculating the day offest the needs >> to be added to the threshold to allow a more generic function that could >> just automatically work it out.... >> >> >> The algorithm I have recently thought up is to effectively calculate the >> difference in weeks between the current run time and the desired deletion >> day and multiply it by 2. >> >> >> Psudocode it would be (threshold is the number of week days for the rule, >> offset is the number that needs to be added to account for the weekends >> between those dates): >> >> >> If current day of month - threshold = sunday, then add 1 to the threshold >> value (sundays are de oted as the week start so Saturday would represent >> the previous week). >> >> Offset = (difference between current week and week of (current day of >> month - threshold)) x 2 >> >> A worked example: >> >> Threshold = 11 week days >> Policy run on the 21st Jan which is the week 4 of 2021 >> >> 21st - 11 days = Sunday 10th >> >> Therefore need to add 1 to threshold to push the day into the previous >> week. New threshold is 12 >> >> Saturday 9th is in week 2 of 2021 so the offset is week 4 - week 2 = 2 >> (ie difference in weeks) x 2 which is 4. >> >> Add 4 to the original 11 to make 15. >> >> So for the policy running on the 21st Jan to delete only files older than >> 11 week days of age I need to set my rule to be >> >> Delete where ((Current_date - creation_time) >= interval '15' days >> >> >> Unfortunately, I'm now struggling to implement that algorithm..... it >> seems the SQL-ness is very limited and I cant declare variables to use or >> stuff.... its a shame as that algorithm is generic so only needs to be >> written once and you could have ad many unique rules as you want all with >> different thresholds etc... >> >> Is there another way to get the same results? >> >> I would prefer to stay in the bounds of the SQL policy rule setup as that >> is the framework I have created and started to implement.. >> >> Hope the above gives more clarity to what Im asking.... sorry if one of >> the previous rplies addresses this, if it does I clearly was confused by >> the response (I seriously feel like an amateur at this at the moment and am >> having to learn all these finer things as I go). >> >> Thanks in advance, >> >> Owen. >> >> Owen Morgan? >> Data Wrangler >> Motion Picture Solutions Ltd >> T: >> E: *owen.morgan at motionpicturesolutions.com* >> | W: >> *motionpicturesolutions.com* >> A: Mission Hall, 9?11 North End Road , London , W14 8ST >> Motion Picture Solutions Ltd is a company registered in England and Wales >> under number 5388229, VAT number 201330482 >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at spectrumscale.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From owen.morgan at motionpicturesolutions.com Sat Jan 30 03:39:42 2021 From: owen.morgan at motionpicturesolutions.com (Owen Morgan) Date: Sat, 30 Jan 2021 03:39:42 +0000 Subject: [gpfsug-discuss] gpfsug-discuss Digest, Vol 108, Issue 18 Message-ID: Alec, Thank you for your response! I get it now! And, I also understand some of the other peoples responses better as well! Not only does this make sense I also suppose that it shows I have to broaden my 'ideas' as to what tools avaliable can be used more than mmapplypolicy and policy files alone. Using the power of all of them provides more ability than just focusing on one! Just want to thank you, and the other respondents as you've genuinely helped me and I've learnt new things in the process (until I posted the original question I didn't even know mmfind was a thing!) Thanks! Owen. Owen Morgan Data Wrangler Motion Picture Solutions Ltd T: E: owen.morgan at motionpicturesolutions.com | W: motionpicturesolutions.com A: Mission Hall, 9-11 North End Road, London, W14 8ST Motion Picture Solutions Ltd is a company registered in England and Wales under number 5388229, VAT number 201330482 -------------- next part -------------- An HTML attachment was scrubbed... URL: From anacreo at gmail.com Sat Jan 30 04:40:44 2021 From: anacreo at gmail.com (Alec) Date: Fri, 29 Jan 2021 20:40:44 -0800 Subject: [gpfsug-discuss] gpfsug-discuss Digest, Vol 108, Issue 18 In-Reply-To: References: Message-ID: No problem at all. If you can't get mmfind compiled... you can do everything it does via mmapplypolicy. But it is certainly easier with mmfind to add in options dynamically. I have modified the program that mmfind invokes... I forget offhand tr_Polsomething.pl to add functions such as -gpfsCompress_lz4 and -gpfsIsCompressed. Spectrum Scale really has way more power than most people know what to do with... I wish there was a much richer library of scripts available. For instance with mmfind, this saved my bacon a few days ago.. as our 416TB file system had less than 400GB free... mmfind -polArgs "-a 8 -N node1,node2 -B 20" /sasfilesystem -mtime +1800 -name '*.sas7bdat' -size +1G -not -gpfsIsCompressed -gpfsCompress_lz4 (I had to add in my own -gpfsIsCompressed and -gpfsCompress_lz4 features... but that was fairly easy) -- Find any file named '*.sas7bdat' over 1800 days (5 years), larger than 1G, and compress it down using lz4... Farmed it out to my two app nodes 8 threads each... and 14000 files compressed overnight. Next morning I had an extra 5TB of free space.. funny thing is I needed to run it on my app nodes to slow down their write capacity so we didn't get a fatal out of capacity. If you really want to have fun, check out the ksh93 built in time functions pairs nicely with this requirement. Output the day of the week corresponding to the last day of February 2008. $ printf "%(%a)T\n" "final day Feb 2008" Fri Output the date corresponding to the third Wednesday in May 2008. $ printf "%(%D)T\n" "3rd wednesday may 2008" 05/21/08 Output what date it was 4 weeks ago. $ printf "%(%D)T\n" "4 weeks ago" 02/18/08 Read more: https://blog.fpmurphy.com/2008/10/ksh93-date-manipulation.html#ixzz6l0Egm6hp On Fri, Jan 29, 2021 at 7:39 PM Owen Morgan < owen.morgan at motionpicturesolutions.com> wrote: > Alec, > > Thank you for your response! > > I get it now! And, I also understand some of the other peoples responses > better as well! > > Not only does this make sense I also suppose that it shows I have to > broaden my 'ideas' as to what tools avaliable can be used more than > mmapplypolicy and policy files alone. Using the power of all of them > provides more ability than just focusing on one! > > Just want to thank you, and the other respondents as you've genuinely > helped me and I've learnt new things in the process (until I posted the > original question I didn't even know mmfind was a thing!) > > Thanks! > > Owen. > > Owen Morgan? > Data Wrangler > Motion Picture Solutions Ltd > T: > E: *owen.morgan at motionpicturesolutions.com* > | W: > *motionpicturesolutions.com* > A: Mission Hall, 9?11 North End Road , London , W14 8ST > Motion Picture Solutions Ltd is a company registered in England and Wales > under number 5388229, VAT number 201330482 > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: From Walter.Sklenka at EDV-Design.at Sat Jan 30 05:45:47 2021 From: Walter.Sklenka at EDV-Design.at (Walter Sklenka) Date: Sat, 30 Jan 2021 05:45:47 +0000 Subject: [gpfsug-discuss] OPA HFI and Mellanox HCA on same NSD Server with only ib rdma enabled Message-ID: <14218088180e4613847984c44e0321d8@Mail.EDVDesign.cloudia> Hi! Is it possible to mix OPAcards and Infininiband HCAs on the same server? In the faq https://www.ibm.com/support/knowledgecenter/en/STXKQY/gpfsclustersfaq.html#rdma They talk about RDMA : "RDMA is NOT supported on a node when both Mellanox HCAs and Intel Omni-Path HFIs are ENABLED for RDMA." So do I understand right: When we do NOT enable the opa interface we can still enable IB ? The reason I ask is, that we have a gpfs cluster of 6 NSD Servers (wih access to storage) with opa interfaces which provide access to remote cluster also via OPA. A new cluster with HDR interfaces will be implemented soon They shell have access to the same filesystems When we add HDR interfaces to NSD servers and enable rdma on this network while disabling rdma on opa we would accept the worse performance via opa . We hope that this provides still better perf and less technical overhead than using routers Or am I totally wrong? Thank you very much and keep healthy! Best regards Walter Mit freundlichen Gr??en Walter Sklenka Technical Consultant EDV-Design Informationstechnologie GmbH Giefinggasse 6/1/2, A-1210 Wien Tel: +43 1 29 22 165-31 Fax: +43 1 29 22 165-90 E-Mail: sklenka at edv-design.at Internet: www.edv-design.at -------------- next part -------------- An HTML attachment was scrubbed... URL: From jonathan.buzzard at strath.ac.uk Sat Jan 30 10:29:39 2021 From: jonathan.buzzard at strath.ac.uk (Jonathan Buzzard) Date: Sat, 30 Jan 2021 10:29:39 +0000 Subject: [gpfsug-discuss] gpfsug-discuss Digest, Vol 108, Issue 18 In-Reply-To: References: Message-ID: On 30/01/2021 00:31, Owen Morgan wrote: [SNIP] > > I would prefer to stay in the bounds of the SQL policy rule setup as > that is the framework I have created and started to implement.. > In general SQL is Turing complete. Though I have not checked in detail I believe the SQL of the policy engine is too. I would also note that SQL has a whole bunch of time/date functions. So something like define(offset, 4) define(day, DAYOFWEEK(CURRENT_TIMESTAMP)) define(age,(DAYS(CURRENT_TIMESTAMP)-DAYS(ACCESS_TIME))) define(workingdays, CASE WHEN day=1 THEN offest+1 WHEN day=6 THEN offset WHEN day=7 THEN offset+1 ELSE offset+2 ) /* delete all files from files older than 4 working days */ RULE purge4 DELETE WHERE (age>workingdays) FOR FILESET dummies JAB. -- Jonathan A. Buzzard Tel: +44141-5483420 HPC System Administrator, ARCHIE-WeSt. University of Strathclyde, John Anderson Building, Glasgow. G4 0NG From giovanni.bracco at enea.it Sat Jan 30 17:07:43 2021 From: giovanni.bracco at enea.it (Giovanni Bracco) Date: Sat, 30 Jan 2021 18:07:43 +0100 Subject: [gpfsug-discuss] OPA HFI and Mellanox HCA on same NSD Server with only ib rdma enabled In-Reply-To: <14218088180e4613847984c44e0321d8@Mail.EDVDesign.cloudia> References: <14218088180e4613847984c44e0321d8@Mail.EDVDesign.cloudia> Message-ID: <3bb0f4ca-f6ee-6013-45a0-e783470089f0@enea.it> In our HPC infrastructure we have 6 NSD server, running CentOS 7.4, each of them with with 1 Intel QDR HCA to a QDR Cluster (now 100 nodes SandyBridge cpu it was 300 nodes CentOS 6.5), 1 OPA HCA to the main OPA Cluster (400 nodes Skylake cpu, CentOS 7.3) and 1 Mellanox FDR to DDN storages and it works nicely using RDMA since 2018. GPFS 4.2.3-19. See F. Iannone et al., "CRESCO ENEA HPC clusters: a working example of a multifabric GPFS Spectrum Scale layout," 2019 International Conference on High Performance Computing & Simulation (HPCS), Dublin, Ireland, 2019, pp. 1051-1052, doi: 10.1109/HPCS48598.2019.918813 When setting up the system the main trick has been: just use CentOS drivers and do not install OFED We do not use IPoIB. Giovanni On 30/01/21 06:45, Walter Sklenka wrote: > Hi! > > Is it possible to mix OPAcards and Infininiband HCAs on the same server? > > In the faq > https://www.ibm.com/support/knowledgecenter/en/STXKQY/gpfsclustersfaq.html#rdma > > > They talk about RDMA : > > ?RDMA is NOT ?supported on a node when both Mellanox HCAs and Intel > Omni-Path HFIs are ENABLED for RDMA.? > > So do I understand right: When we do NOT enable ?the opa interface we > can still enable IB ? > > The reason I ask ?is, that we have a gpfs cluster of 6 NSD Servers ?(wih > access to storage) ?with opa interfaces which provide access to remote > cluster ?also via OPA. > > A new cluster with HDR interfaces will be implemented soon > > They shell have access to the same filesystems > > When we add HDR interfaces to? NSD servers? and enable rdma on this > network ?while disabling rdma on opa we would accept the worse > performance via opa . We hope that this provides ?still better perf and > less technical overhead ?than using routers > > Or am I totally wrong? > > Thank you very much and keep healthy! > > Best regards > > Walter > > Mit freundlichen Gr??en > */Walter Sklenka/* > */Technical Consultant/* > > EDV-Design Informationstechnologie GmbH > Giefinggasse 6/1/2, A-1210 Wien > Tel: +43 1 29 22 165-31 > Fax: +43 1 29 22 165-90 > E-Mail: sklenka at edv-design.at > Internet: www.edv-design.at > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -- Giovanni Bracco phone +39 351 8804788 E-mail giovanni.bracco at enea.it WWW http://www.afs.enea.it/bracco From Walter.Sklenka at EDV-Design.at Sat Jan 30 20:01:51 2021 From: Walter.Sklenka at EDV-Design.at (Walter Sklenka) Date: Sat, 30 Jan 2021 20:01:51 +0000 Subject: [gpfsug-discuss] OPA HFI and Mellanox HCA on same NSD Server with only ib rdma enabled In-Reply-To: <3bb0f4ca-f6ee-6013-45a0-e783470089f0@enea.it> References: <14218088180e4613847984c44e0321d8@Mail.EDVDesign.cloudia> <3bb0f4ca-f6ee-6013-45a0-e783470089f0@enea.it> Message-ID: Hi Giovanni! Thats great! Many thanks for your fast and detailed answer!!!! So this is the way we will go too! Have a nice weekend and keep healthy! Best regards Walter -----Original Message----- From: Giovanni Bracco Sent: Samstag, 30. J?nner 2021 18:08 To: gpfsug main discussion list ; Walter Sklenka Subject: Re: [gpfsug-discuss] OPA HFI and Mellanox HCA on same NSD Server with only ib rdma enabled In our HPC infrastructure we have 6 NSD server, running CentOS 7.4, each of them with with 1 Intel QDR HCA to a QDR Cluster (now 100 nodes SandyBridge cpu it was 300 nodes CentOS 6.5), 1 OPA HCA to the main OPA Cluster (400 nodes Skylake cpu, CentOS 7.3) and 1 Mellanox FDR to DDN storages and it works nicely using RDMA since 2018. GPFS 4.2.3-19. See F. Iannone et al., "CRESCO ENEA HPC clusters: a working example of a multifabric GPFS Spectrum Scale layout," 2019 International Conference on High Performance Computing & Simulation (HPCS), Dublin, Ireland, 2019, pp. 1051-1052, doi: 10.1109/HPCS48598.2019.918813 When setting up the system the main trick has been: just use CentOS drivers and do not install OFED We do not use IPoIB. Giovanni On 30/01/21 06:45, Walter Sklenka wrote: > Hi! > > Is it possible to mix OPAcards and Infininiband HCAs on the same server? > > In the faq > https://www.ibm.com/support/knowledgecenter/en/STXKQY/gpfsclustersfaq. > html#rdma > > > They talk about RDMA : > > "RDMA is NOT ?supported on a node when both Mellanox HCAs and Intel > Omni-Path HFIs are ENABLED for RDMA." > > So do I understand right: When we do NOT enable ?the opa interface we > can still enable IB ? > > The reason I ask ?is, that we have a gpfs cluster of 6 NSD Servers ? > (wih access to storage) ?with opa interfaces which provide access to > remote cluster ?also via OPA. > > A new cluster with HDR interfaces will be implemented soon > > They shell have access to the same filesystems > > When we add HDR interfaces to? NSD servers? and enable rdma on this > network ?while disabling rdma on opa we would accept the worse > performance via opa . We hope that this provides ?still better perf > and less technical overhead ?than using routers > > Or am I totally wrong? > > Thank you very much and keep healthy! > > Best regards > > Walter > > Mit freundlichen Gr??en > */Walter Sklenka/* > */Technical Consultant/* > > EDV-Design Informationstechnologie GmbH Giefinggasse 6/1/2, A-1210 > Wien > Tel: +43 1 29 22 165-31 > Fax: +43 1 29 22 165-90 > E-Mail: sklenka at edv-design.at > Internet: www.edv-design.at > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -- Giovanni Bracco phone +39 351 8804788 E-mail giovanni.bracco at enea.it WWW http://www.afs.enea.it/bracco