From jonathan at buzzard.me.uk Sat Jul 1 10:20:18 2017 From: jonathan at buzzard.me.uk (Jonathan Buzzard) Date: Sat, 1 Jul 2017 10:20:18 +0100 Subject: [gpfsug-discuss] Mass UID migration suggestions In-Reply-To: <59566c3b.kqOSy9e2kg80XZ/k%hpc-luke@uconn.edu> References: <59566c3b.kqOSy9e2kg80XZ/k%hpc-luke@uconn.edu> Message-ID: On 30/06/17 16:20, hpc-luke at uconn.edu wrote: > Hello, > > We're trying to change most of our users uids, is there a clean way to > migrate all of one users files with say `mmapplypolicy`? We have to change the > owner of around 273539588 files, and my estimates for runtime are around 6 days. > > What we've been doing is indexing all of the files and splitting them up by > owner which takes around an hour, and then we were locking the user out while we > chown their files. I made it multi threaded as it weirdly gave a 10% speedup > despite my expectation that multi threading access from a single node would not > give any speedup. > > Generally I'm looking for advice on how to make the chowning faster. Would > spreading the chowning processes over multiple nodes improve performance? Should > I not stat the files before running lchown on them, since lchown checks the file > before changing it? I saw mention of inodescan(), in an old gpfsug email, which > speeds up disk read access, by not guaranteeing that the data is up to date. We > have a maintenance day coming up where all users will be locked out, so the file > handles(?) from GPFS's perspective will not be able to go stale. Is there a > function with similar constraints to inodescan that I can use to speed up this > process? My suggestion is to do some development work in C to write a custom program to do it for you. That way you can hook into the GPFS API to leverage the fast file system scanning API. Take a look at the tsbackup.C file in the samples directory. Obviously this is going to require someone with appropriate coding skills to develop. On the other hand given it is a one off and input is strictly controlled so error checking is a one off, then couple hundred lines C tops. My tip for this would be load the new UID's into a sparse array so you can just use the current UID to index into the array for the new UID, for speeding things up. It burns RAM but these days RAM is cheap and plentiful and speed is the major consideration here. This should in theory be able to do this in a few hours with this technique. One thing to bear in mind is that once the UID change is complete you will have to backup the entire file system again. JAB. -- Jonathan A. Buzzard Email: jonathan (at) buzzard.me.uk Fife, United Kingdom. From ilan84 at gmail.com Tue Jul 4 09:16:43 2017 From: ilan84 at gmail.com (Ilan Schwarts) Date: Tue, 4 Jul 2017 11:16:43 +0300 Subject: [gpfsug-discuss] Fail to mount file system Message-ID: Hi everyone, I have received existing corrupted GPFS 4.2.2 lab and I am trying to make it work. There are 2 nodes in a cluster: [root at LH20-GPFS1 ~]# mmgetstate -a Node number Node name GPFS state ------------------------------------------ 1 LH20-GPFS1 active 3 LH20-GPFS2 active The Cluster status is: [root at LH20-GPFS1 ~]# mmlscluster GPFS cluster information ======================== GPFS cluster name: MyCluster.LH20-GPFS2 GPFS cluster id: 10777108240438931454 GPFS UID domain: MyCluster.LH20-GPFS2 Remote shell command: /usr/bin/ssh Remote file copy command: /usr/bin/scp Repository type: CCR Node Daemon node name IP address Admin node name Designation -------------------------------------------------------------------- 1 LH20-GPFS1 10.10.158.61 LH20-GPFS1 quorum-manager 3 LH20-GPFS2 10.10.158.62 LH20-GPFS2 There is a file system: [root at LH20-GPFS1 ~]# mmlsnsd File system Disk name NSD servers --------------------------------------------------------------------------- fs_gpfs01 nynsd1 (directly attached) fs_gpfs01 nynsd2 (directly attached) [root at LH20-GPFS1 ~]# On each Node, There is folder /fs_gpfs01 The next step is to mount this fs_gpfs01 to be synced between the 2 nodes. Whilte executing mmmount i get exception: [root at LH20-GPFS1 ~]# mmmount /fs_gpfs01 Tue Jul 4 11:14:18 IDT 2017: mmmount: Mounting file systems ... mount: mount fs_gpfs01 on /fs_gpfs01 failed: Wrong medium type mmmount: Command failed. Examine previous error messages to determine cause. What am i doing wrong ? From scale at us.ibm.com Tue Jul 4 09:36:43 2017 From: scale at us.ibm.com (IBM Spectrum Scale) Date: Tue, 4 Jul 2017 14:06:43 +0530 Subject: [gpfsug-discuss] Fail to mount file system In-Reply-To: References: Message-ID: What exactly do you mean by "I have received existing corrupted GPFS 4.2.2 lab"? Is the file system corrupted ? Maybe this error is then due to file system corruption. Can you once try: mmmount fs_gpfs01 -a If this does not work then try: mmmount -o rs fs_gpfs01 Let me know which mount is working. Regards, The Spectrum Scale (GPFS) team ------------------------------------------------------------------------------------------------------------------ If you feel that your question can benefit other users of Spectrum Scale (GPFS), then please post it to the public IBM developerWroks Forum at https://www.ibm.com/developerworks/community/forums/html/forum?id=11111111-0000-0000-0000-000000000479 . If your query concerns a potential software error in Spectrum Scale (GPFS) and you have an IBM software maintenance contract please contact 1-800-237-5511 in the United States or your local IBM Service Center in other countries. The forum is informally monitored as time permits and should not be used for priority messages to the Spectrum Scale (GPFS) team. From: Ilan Schwarts To: gpfsug-discuss at spectrumscale.org Date: 07/04/2017 01:47 PM Subject: [gpfsug-discuss] Fail to mount file system Sent by: gpfsug-discuss-bounces at spectrumscale.org Hi everyone, I have received existing corrupted GPFS 4.2.2 lab and I am trying to make it work. There are 2 nodes in a cluster: [root at LH20-GPFS1 ~]# mmgetstate -a Node number Node name GPFS state ------------------------------------------ 1 LH20-GPFS1 active 3 LH20-GPFS2 active The Cluster status is: [root at LH20-GPFS1 ~]# mmlscluster GPFS cluster information ======================== GPFS cluster name: MyCluster.LH20-GPFS2 GPFS cluster id: 10777108240438931454 GPFS UID domain: MyCluster.LH20-GPFS2 Remote shell command: /usr/bin/ssh Remote file copy command: /usr/bin/scp Repository type: CCR Node Daemon node name IP address Admin node name Designation -------------------------------------------------------------------- 1 LH20-GPFS1 10.10.158.61 LH20-GPFS1 quorum-manager 3 LH20-GPFS2 10.10.158.62 LH20-GPFS2 There is a file system: [root at LH20-GPFS1 ~]# mmlsnsd File system Disk name NSD servers --------------------------------------------------------------------------- fs_gpfs01 nynsd1 (directly attached) fs_gpfs01 nynsd2 (directly attached) [root at LH20-GPFS1 ~]# On each Node, There is folder /fs_gpfs01 The next step is to mount this fs_gpfs01 to be synced between the 2 nodes. Whilte executing mmmount i get exception: [root at LH20-GPFS1 ~]# mmmount /fs_gpfs01 Tue Jul 4 11:14:18 IDT 2017: mmmount: Mounting file systems ... mount: mount fs_gpfs01 on /fs_gpfs01 failed: Wrong medium type mmmount: Command failed. Examine previous error messages to determine cause. What am i doing wrong ? _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From ilan84 at gmail.com Tue Jul 4 09:38:28 2017 From: ilan84 at gmail.com (Ilan Schwarts) Date: Tue, 4 Jul 2017 11:38:28 +0300 Subject: [gpfsug-discuss] Fail to mount file system In-Reply-To: References: Message-ID: I mean the person tried to configure it... didnt do good job so now its me to continue On Jul 4, 2017 11:37, "IBM Spectrum Scale" wrote: > What exactly do you mean by "I have received existing corrupted GPFS > 4.2.2 lab"? > Is the file system corrupted ? Maybe this error is then due to file system > corruption. > > Can you once try: mmmount fs_gpfs01 -a > If this does not work then try: mmmount -o rs fs_gpfs01 > > Let me know which mount is working. > > Regards, The Spectrum Scale (GPFS) team > > ------------------------------------------------------------ > ------------------------------------------------------ > If you feel that your question can benefit other users of Spectrum Scale > (GPFS), then please post it to the public IBM developerWroks Forum at > https://www.ibm.com/developerworks/community/ > forums/html/forum?id=11111111-0000-0000-0000-000000000479. > > If your query concerns a potential software error in Spectrum Scale (GPFS) > and you have an IBM software maintenance contract please contact > 1-800-237-5511 in the United States or your local IBM Service Center in > other countries. > > The forum is informally monitored as time permits and should not be used > for priority messages to the Spectrum Scale (GPFS) team. > > > > From: Ilan Schwarts > To: gpfsug-discuss at spectrumscale.org > Date: 07/04/2017 01:47 PM > Subject: [gpfsug-discuss] Fail to mount file system > Sent by: gpfsug-discuss-bounces at spectrumscale.org > ------------------------------ > > > > Hi everyone, I have received existing corrupted GPFS 4.2.2 lab and I > am trying to make it work. > There are 2 nodes in a cluster: > [root at LH20-GPFS1 ~]# mmgetstate -a > > Node number Node name GPFS state > ------------------------------------------ > 1 LH20-GPFS1 active > 3 LH20-GPFS2 active > > The Cluster status is: > [root at LH20-GPFS1 ~]# mmlscluster > > GPFS cluster information > ======================== > GPFS cluster name: MyCluster.LH20-GPFS2 > GPFS cluster id: 10777108240438931454 > GPFS UID domain: MyCluster.LH20-GPFS2 > Remote shell command: /usr/bin/ssh > Remote file copy command: /usr/bin/scp > Repository type: CCR > > Node Daemon node name IP address Admin node name Designation > -------------------------------------------------------------------- > 1 LH20-GPFS1 10.10.158.61 LH20-GPFS1 quorum-manager > 3 LH20-GPFS2 10.10.158.62 LH20-GPFS2 > > There is a file system: > [root at LH20-GPFS1 ~]# mmlsnsd > > File system Disk name NSD servers > ------------------------------------------------------------ > --------------- > fs_gpfs01 nynsd1 (directly attached) > fs_gpfs01 nynsd2 (directly attached) > > [root at LH20-GPFS1 ~]# > > On each Node, There is folder /fs_gpfs01 > The next step is to mount this fs_gpfs01 to be synced between the 2 nodes. > Whilte executing mmmount i get exception: > [root at LH20-GPFS1 ~]# mmmount /fs_gpfs01 > Tue Jul 4 11:14:18 IDT 2017: mmmount: Mounting file systems ... > mount: mount fs_gpfs01 on /fs_gpfs01 failed: Wrong medium type > mmmount: Command failed. Examine previous error messages to determine > cause. > > > What am i doing wrong ? > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From r.sobey at imperial.ac.uk Tue Jul 4 11:54:52 2017 From: r.sobey at imperial.ac.uk (Sobey, Richard A) Date: Tue, 4 Jul 2017 10:54:52 +0000 Subject: [gpfsug-discuss] Requirement to keep gpfs.smb the same on all nodes Message-ID: Hi all, For how long has this requirement been in force, and why? https://www.ibm.com/support/knowledgecenter/en/STXKQY_4.2.3/com.ibm.spectrum.scale.v4r23.doc/bl1ins_updatingsmb.htm All protocol nodes running the SMB service must have the same version of gpfs.smb installed at any time. This requires a brief outage of the SMB service to upgrade gpfs.smb to the newer version across all protocol nodes. The procedure outlined here is intended to reduce the outage to a minimum. Previously I've upgraded nodes one at a time over the course of a few days. Is the impact just that we won't be supported, or will a hole open up beneath my feet and swallow me whole? I really don't fancy the headache of getting approvals to get an outage of even 5 minutes at 6am.... Cheers Richard -------------- next part -------------- An HTML attachment was scrubbed... URL: From ilan84 at gmail.com Tue Jul 4 11:56:20 2017 From: ilan84 at gmail.com (Ilan Schwarts) Date: Tue, 4 Jul 2017 13:56:20 +0300 Subject: [gpfsug-discuss] Fail to mount file system In-Reply-To: References: Message-ID: [root at LH20-GPFS1 ~]# mmmount fs_gpfs01 -a Tue Jul 4 13:52:07 IDT 2017: mmmount: Mounting file systems ... LH20-GPFS1: mount: mount fs_gpfs01 on /fs_gpfs01 failed: Wrong medium type mmdsh: LH20-GPFS1 remote shell process had return code 32. LH20-GPFS2: mount: mount fs_gpfs01 on /fs_gpfs01 failed: Stale file handle mmdsh: LH20-GPFS2 remote shell process had return code 32. mmmount: Command failed. Examine previous error messages to determine cause. [root at LH20-GPFS1 ~]# mmmount -o rs /fs_gpfs01 mmmount: Mount point can not be a relative path name: rs [root at LH20-GPFS1 ~]# mmmount -o rs fs_gpfs01 mmmount: Mount point can not be a relative path name: rs I recieve in "dmesg": [ 18.338044] sd 2:0:0:1: [sdc] Attached SCSI disk [ 141.363422] hvt_cn_callback: unexpected netlink message! [ 141.366153] hvt_cn_callback: unexpected netlink message! [ 4479.292850] tracedev: loading out-of-tree module taints kernel. [ 4479.292888] tracedev: module verification failed: signature and/or required key missing - tainting kernel [ 4482.928413] ------------[ cut here ]------------ [ 4482.928445] WARNING: at fs/xfs/xfs_aops.c:906 xfs_do_writepage+0x537/0x550 [xfs]() [ 4482.928446] Modules linked in: mmfs26(OE) mmfslinux(OE) tracedev(OE) iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi ext4 mbcache jbd2 loop intel_powerclamp iosf_mbi sg pcspkr hv_utils i2c_piix4 i2c_core nfsd auth_rpcgss nfs_acl lockd grace sunrpc binfmt_misc ip_tables xfs libcrc32c sd_mod crc_t10dif crct10dif_generic crct10dif_common ata_generic pata_acpi hv_netvsc hyperv_keyboard hid_hyperv hv_storvsc hyperv_fb serio_raw fjes floppy libata hv_vmbus dm_mirror dm_region_hash dm_log dm_mod [ 4482.928471] CPU: 1 PID: 15210 Comm: mmfsd Tainted: G OE ------------ 3.10.0-514.21.2.el7.x86_64 #1 On Tue, Jul 4, 2017 at 11:36 AM, IBM Spectrum Scale wrote: > What exactly do you mean by "I have received existing corrupted GPFS 4.2.2 > lab"? > Is the file system corrupted ? Maybe this error is then due to file system > corruption. > > Can you once try: mmmount fs_gpfs01 -a > If this does not work then try: mmmount -o rs fs_gpfs01 > > Let me know which mount is working. > > Regards, The Spectrum Scale (GPFS) team > > ------------------------------------------------------------------------------------------------------------------ > If you feel that your question can benefit other users of Spectrum Scale > (GPFS), then please post it to the public IBM developerWroks Forum at > https://www.ibm.com/developerworks/community/forums/html/forum?id=11111111-0000-0000-0000-000000000479. > > If your query concerns a potential software error in Spectrum Scale (GPFS) > and you have an IBM software maintenance contract please contact > 1-800-237-5511 in the United States or your local IBM Service Center in > other countries. > > The forum is informally monitored as time permits and should not be used for > priority messages to the Spectrum Scale (GPFS) team. > > > > From: Ilan Schwarts > To: gpfsug-discuss at spectrumscale.org > Date: 07/04/2017 01:47 PM > Subject: [gpfsug-discuss] Fail to mount file system > Sent by: gpfsug-discuss-bounces at spectrumscale.org > ________________________________ > > > > Hi everyone, I have received existing corrupted GPFS 4.2.2 lab and I > am trying to make it work. > There are 2 nodes in a cluster: > [root at LH20-GPFS1 ~]# mmgetstate -a > > Node number Node name GPFS state > ------------------------------------------ > 1 LH20-GPFS1 active > 3 LH20-GPFS2 active > > The Cluster status is: > [root at LH20-GPFS1 ~]# mmlscluster > > GPFS cluster information > ======================== > GPFS cluster name: MyCluster.LH20-GPFS2 > GPFS cluster id: 10777108240438931454 > GPFS UID domain: MyCluster.LH20-GPFS2 > Remote shell command: /usr/bin/ssh > Remote file copy command: /usr/bin/scp > Repository type: CCR > > Node Daemon node name IP address Admin node name Designation > -------------------------------------------------------------------- > 1 LH20-GPFS1 10.10.158.61 LH20-GPFS1 quorum-manager > 3 LH20-GPFS2 10.10.158.62 LH20-GPFS2 > > There is a file system: > [root at LH20-GPFS1 ~]# mmlsnsd > > File system Disk name NSD servers > --------------------------------------------------------------------------- > fs_gpfs01 nynsd1 (directly attached) > fs_gpfs01 nynsd2 (directly attached) > > [root at LH20-GPFS1 ~]# > > On each Node, There is folder /fs_gpfs01 > The next step is to mount this fs_gpfs01 to be synced between the 2 nodes. > Whilte executing mmmount i get exception: > [root at LH20-GPFS1 ~]# mmmount /fs_gpfs01 > Tue Jul 4 11:14:18 IDT 2017: mmmount: Mounting file systems ... > mount: mount fs_gpfs01 on /fs_gpfs01 failed: Wrong medium type > mmmount: Command failed. Examine previous error messages to determine cause. > > > What am i doing wrong ? > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > -- - Ilan Schwarts From S.J.Thompson at bham.ac.uk Tue Jul 4 12:09:18 2017 From: S.J.Thompson at bham.ac.uk (Simon Thompson (IT Research Support)) Date: Tue, 4 Jul 2017 11:09:18 +0000 Subject: [gpfsug-discuss] Requirement to keep gpfs.smb the same on all nodes Message-ID: AFAIK. Always. We have had the service eat itself BTW by having different code releases and trying this. Yes its a PITA that we have to get a change approval for it (so we don't do it as often as we should)... The upgrade process upgrades the SMB registry, we have also seen the CTDB lock stuff break when they are not running the same code release, so now we just don't do this. Simon From: > on behalf of "Sobey, Richard A" > Reply-To: "gpfsug-discuss at spectrumscale.org" > Date: Tuesday, 4 July 2017 at 11:54 To: "gpfsug-discuss at spectrumscale.org" > Subject: [gpfsug-discuss] Requirement to keep gpfs.smb the same on all nodes Hi all, For how long has this requirement been in force, and why? https://www.ibm.com/support/knowledgecenter/en/STXKQY_4.2.3/com.ibm.spectrum.scale.v4r23.doc/bl1ins_updatingsmb.htm All protocol nodes running the SMB service must have the same version of gpfs.smb installed at any time. This requires a brief outage of the SMB service to upgrade gpfs.smb to the newer version across all protocol nodes. The procedure outlined here is intended to reduce the outage to a minimum. Previously I?ve upgraded nodes one at a time over the course of a few days. Is the impact just that we won?t be supported, or will a hole open up beneath my feet and swallow me whole? I really don?t fancy the headache of getting approvals to get an outage of even 5 minutes at 6am?. Cheers Richard -------------- next part -------------- An HTML attachment was scrubbed... URL: From r.sobey at imperial.ac.uk Tue Jul 4 12:12:10 2017 From: r.sobey at imperial.ac.uk (Sobey, Richard A) Date: Tue, 4 Jul 2017 11:12:10 +0000 Subject: [gpfsug-discuss] Requirement to keep gpfs.smb the same on all nodes In-Reply-To: References: Message-ID: OK Simon, thanks. I suppose we're all in the same boat having to get change management approval etc! From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Simon Thompson (IT Research Support) Sent: 04 July 2017 12:09 To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] Requirement to keep gpfs.smb the same on all nodes AFAIK. Always. We have had the service eat itself BTW by having different code releases and trying this. Yes its a PITA that we have to get a change approval for it (so we don't do it as often as we should)... The upgrade process upgrades the SMB registry, we have also seen the CTDB lock stuff break when they are not running the same code release, so now we just don't do this. Simon From: > on behalf of "Sobey, Richard A" > Reply-To: "gpfsug-discuss at spectrumscale.org" > Date: Tuesday, 4 July 2017 at 11:54 To: "gpfsug-discuss at spectrumscale.org" > Subject: [gpfsug-discuss] Requirement to keep gpfs.smb the same on all nodes Hi all, For how long has this requirement been in force, and why? https://www.ibm.com/support/knowledgecenter/en/STXKQY_4.2.3/com.ibm.spectrum.scale.v4r23.doc/bl1ins_updatingsmb.htm All protocol nodes running the SMB service must have the same version of gpfs.smb installed at any time. This requires a brief outage of the SMB service to upgrade gpfs.smb to the newer version across all protocol nodes. The procedure outlined here is intended to reduce the outage to a minimum. Previously I've upgraded nodes one at a time over the course of a few days. Is the impact just that we won't be supported, or will a hole open up beneath my feet and swallow me whole? I really don't fancy the headache of getting approvals to get an outage of even 5 minutes at 6am.... Cheers Richard -------------- next part -------------- An HTML attachment was scrubbed... URL: From scale at us.ibm.com Tue Jul 4 17:28:07 2017 From: scale at us.ibm.com (IBM Spectrum Scale) Date: Tue, 4 Jul 2017 21:58:07 +0530 Subject: [gpfsug-discuss] Fail to mount file system In-Reply-To: References: Message-ID: My bad gave the wrong command, the right one is: mmmount fs_gpfs01 -o rs Also can you send output of mmlsnsd -X, need to check device type of the NSDs. Are you ok with deleting the file system and disks and building everything from scratch? Regards, The Spectrum Scale (GPFS) team ------------------------------------------------------------------------------------------------------------------ If you feel that your question can benefit other users of Spectrum Scale (GPFS), then please post it to the public IBM developerWroks Forum at https://www.ibm.com/developerworks/community/forums/html/forum?id=11111111-0000-0000-0000-000000000479 . If your query concerns a potential software error in Spectrum Scale (GPFS) and you have an IBM software maintenance contract please contact 1-800-237-5511 in the United States or your local IBM Service Center in other countries. The forum is informally monitored as time permits and should not be used for priority messages to the Spectrum Scale (GPFS) team. From: Ilan Schwarts To: IBM Spectrum Scale Cc: gpfsug-discuss-bounces at spectrumscale.org, gpfsug main discussion list Date: 07/04/2017 04:26 PM Subject: Re: [gpfsug-discuss] Fail to mount file system [root at LH20-GPFS1 ~]# mmmount fs_gpfs01 -a Tue Jul 4 13:52:07 IDT 2017: mmmount: Mounting file systems ... LH20-GPFS1: mount: mount fs_gpfs01 on /fs_gpfs01 failed: Wrong medium type mmdsh: LH20-GPFS1 remote shell process had return code 32. LH20-GPFS2: mount: mount fs_gpfs01 on /fs_gpfs01 failed: Stale file handle mmdsh: LH20-GPFS2 remote shell process had return code 32. mmmount: Command failed. Examine previous error messages to determine cause. [root at LH20-GPFS1 ~]# mmmount -o rs /fs_gpfs01 mmmount: Mount point can not be a relative path name: rs [root at LH20-GPFS1 ~]# mmmount -o rs fs_gpfs01 mmmount: Mount point can not be a relative path name: rs I recieve in "dmesg": [ 18.338044] sd 2:0:0:1: [sdc] Attached SCSI disk [ 141.363422] hvt_cn_callback: unexpected netlink message! [ 141.366153] hvt_cn_callback: unexpected netlink message! [ 4479.292850] tracedev: loading out-of-tree module taints kernel. [ 4479.292888] tracedev: module verification failed: signature and/or required key missing - tainting kernel [ 4482.928413] ------------[ cut here ]------------ [ 4482.928445] WARNING: at fs/xfs/xfs_aops.c:906 xfs_do_writepage+0x537/0x550 [xfs]() [ 4482.928446] Modules linked in: mmfs26(OE) mmfslinux(OE) tracedev(OE) iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi ext4 mbcache jbd2 loop intel_powerclamp iosf_mbi sg pcspkr hv_utils i2c_piix4 i2c_core nfsd auth_rpcgss nfs_acl lockd grace sunrpc binfmt_misc ip_tables xfs libcrc32c sd_mod crc_t10dif crct10dif_generic crct10dif_common ata_generic pata_acpi hv_netvsc hyperv_keyboard hid_hyperv hv_storvsc hyperv_fb serio_raw fjes floppy libata hv_vmbus dm_mirror dm_region_hash dm_log dm_mod [ 4482.928471] CPU: 1 PID: 15210 Comm: mmfsd Tainted: G OE ------------ 3.10.0-514.21.2.el7.x86_64 #1 On Tue, Jul 4, 2017 at 11:36 AM, IBM Spectrum Scale wrote: > What exactly do you mean by "I have received existing corrupted GPFS 4.2.2 > lab"? > Is the file system corrupted ? Maybe this error is then due to file system > corruption. > > Can you once try: mmmount fs_gpfs01 -a > If this does not work then try: mmmount -o rs fs_gpfs01 > > Let me know which mount is working. > > Regards, The Spectrum Scale (GPFS) team > > ------------------------------------------------------------------------------------------------------------------ > If you feel that your question can benefit other users of Spectrum Scale > (GPFS), then please post it to the public IBM developerWroks Forum at > https://www.ibm.com/developerworks/community/forums/html/forum?id=11111111-0000-0000-0000-000000000479 . > > If your query concerns a potential software error in Spectrum Scale (GPFS) > and you have an IBM software maintenance contract please contact > 1-800-237-5511 in the United States or your local IBM Service Center in > other countries. > > The forum is informally monitored as time permits and should not be used for > priority messages to the Spectrum Scale (GPFS) team. > > > > From: Ilan Schwarts > To: gpfsug-discuss at spectrumscale.org > Date: 07/04/2017 01:47 PM > Subject: [gpfsug-discuss] Fail to mount file system > Sent by: gpfsug-discuss-bounces at spectrumscale.org > ________________________________ > > > > Hi everyone, I have received existing corrupted GPFS 4.2.2 lab and I > am trying to make it work. > There are 2 nodes in a cluster: > [root at LH20-GPFS1 ~]# mmgetstate -a > > Node number Node name GPFS state > ------------------------------------------ > 1 LH20-GPFS1 active > 3 LH20-GPFS2 active > > The Cluster status is: > [root at LH20-GPFS1 ~]# mmlscluster > > GPFS cluster information > ======================== > GPFS cluster name: MyCluster.LH20-GPFS2 > GPFS cluster id: 10777108240438931454 > GPFS UID domain: MyCluster.LH20-GPFS2 > Remote shell command: /usr/bin/ssh > Remote file copy command: /usr/bin/scp > Repository type: CCR > > Node Daemon node name IP address Admin node name Designation > -------------------------------------------------------------------- > 1 LH20-GPFS1 10.10.158.61 LH20-GPFS1 quorum-manager > 3 LH20-GPFS2 10.10.158.62 LH20-GPFS2 > > There is a file system: > [root at LH20-GPFS1 ~]# mmlsnsd > > File system Disk name NSD servers > --------------------------------------------------------------------------- > fs_gpfs01 nynsd1 (directly attached) > fs_gpfs01 nynsd2 (directly attached) > > [root at LH20-GPFS1 ~]# > > On each Node, There is folder /fs_gpfs01 > The next step is to mount this fs_gpfs01 to be synced between the 2 nodes. > Whilte executing mmmount i get exception: > [root at LH20-GPFS1 ~]# mmmount /fs_gpfs01 > Tue Jul 4 11:14:18 IDT 2017: mmmount: Mounting file systems ... > mount: mount fs_gpfs01 on /fs_gpfs01 failed: Wrong medium type > mmmount: Command failed. Examine previous error messages to determine cause. > > > What am i doing wrong ? > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > -- - Ilan Schwarts -------------- next part -------------- An HTML attachment was scrubbed... URL: From ilan84 at gmail.com Tue Jul 4 17:46:17 2017 From: ilan84 at gmail.com (Ilan Schwarts) Date: Tue, 4 Jul 2017 19:46:17 +0300 Subject: [gpfsug-discuss] Fail to mount file system In-Reply-To: References: Message-ID: Yes I am ok with deleting. I follow a guide from john olsen at the ibm team from tuscon.. but the guide had steps after the gpfs setup... Is there step by step guide for gpfs cluster setup other than the one in the ibm site? Thank My bad gave the wrong command, the right one is: mmmount fs_gpfs01-o rs Also can you send output of mmlsnsd -X, need to check device type of the NSDs. Are you ok with deleting the file system and disks and building everything from scratch? Regards, The Spectrum Scale (GPFS) team ------------------------------------------------------------ ------------------------------------------------------ If you feel that your question can benefit other users of Spectrum Scale (GPFS), then please post it to the public IBM developerWroks Forum at https://www.ibm.com/developerworks/community/forums/html/forum?id=11111111- 0000-0000-0000-000000000479. If your query concerns a potential software error in Spectrum Scale (GPFS) and you have an IBM software maintenance contract please contact 1-800-237-5511 in the United States or your local IBM Service Center in other countries. The forum is informally monitored as time permits and should not be used for priority messages to the Spectrum Scale (GPFS) team. From: Ilan Schwarts To: IBM Spectrum Scale Cc: gpfsug-discuss-bounces at spectrumscale.org, gpfsug main discussion list Date: 07/04/2017 04:26 PM Subject: Re: [gpfsug-discuss] Fail to mount file system ------------------------------ [root at LH20-GPFS1 ~]# mmmount fs_gpfs01 -a Tue Jul 4 13:52:07 IDT 2017: mmmount: Mounting file systems ... LH20-GPFS1: mount: mount fs_gpfs01 on /fs_gpfs01 failed: Wrong medium type mmdsh: LH20-GPFS1 remote shell process had return code 32. LH20-GPFS2: mount: mount fs_gpfs01 on /fs_gpfs01 failed: Stale file handle mmdsh: LH20-GPFS2 remote shell process had return code 32. mmmount: Command failed. Examine previous error messages to determine cause. [root at LH20-GPFS1 ~]# mmmount -o rs /fs_gpfs01 mmmount: Mount point can not be a relative path name: rs [root at LH20-GPFS1 ~]# mmmount -o rs fs_gpfs01 mmmount: Mount point can not be a relative path name: rs I recieve in "dmesg": [ 18.338044] sd 2:0:0:1: [sdc] Attached SCSI disk [ 141.363422] hvt_cn_callback: unexpected netlink message! [ 141.366153] hvt_cn_callback: unexpected netlink message! [ 4479.292850] tracedev: loading out-of-tree module taints kernel. [ 4479.292888] tracedev: module verification failed: signature and/or required key missing - tainting kernel [ 4482.928413] ------------[ cut here ]------------ [ 4482.928445] WARNING: at fs/xfs/xfs_aops.c:906 xfs_do_writepage+0x537/0x550 [xfs]() [ 4482.928446] Modules linked in: mmfs26(OE) mmfslinux(OE) tracedev(OE) iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi ext4 mbcache jbd2 loop intel_powerclamp iosf_mbi sg pcspkr hv_utils i2c_piix4 i2c_core nfsd auth_rpcgss nfs_acl lockd grace sunrpc binfmt_misc ip_tables xfs libcrc32c sd_mod crc_t10dif crct10dif_generic crct10dif_common ata_generic pata_acpi hv_netvsc hyperv_keyboard hid_hyperv hv_storvsc hyperv_fb serio_raw fjes floppy libata hv_vmbus dm_mirror dm_region_hash dm_log dm_mod [ 4482.928471] CPU: 1 PID: 15210 Comm: mmfsd Tainted: G OE ------------ 3.10.0-514.21.2.el7.x86_64 #1 On Tue, Jul 4, 2017 at 11:36 AM, IBM Spectrum Scale wrote: > What exactly do you mean by "I have received existing corrupted GPFS 4.2.2 > lab"? > Is the file system corrupted ? Maybe this error is then due to file system > corruption. > > Can you once try: mmmount fs_gpfs01 -a > If this does not work then try: mmmount -o rs fs_gpfs01 > > Let me know which mount is working. > > Regards, The Spectrum Scale (GPFS) team > > ------------------------------------------------------------ ------------------------------------------------------ > If you feel that your question can benefit other users of Spectrum Scale > (GPFS), then please post it to the public IBM developerWroks Forum at > https://www.ibm.com/developerworks/community/ forums/html/forum?id=11111111-0000-0000-0000-000000000479. > > If your query concerns a potential software error in Spectrum Scale (GPFS) > and you have an IBM software maintenance contract please contact > 1-800-237-5511 in the United States or your local IBM Service Center in > other countries. > > The forum is informally monitored as time permits and should not be used for > priority messages to the Spectrum Scale (GPFS) team. > > > > From: Ilan Schwarts > To: gpfsug-discuss at spectrumscale.org > Date: 07/04/2017 01:47 PM > Subject: [gpfsug-discuss] Fail to mount file system > Sent by: gpfsug-discuss-bounces at spectrumscale.org > ________________________________ > > > > Hi everyone, I have received existing corrupted GPFS 4.2.2 lab and I > am trying to make it work. > There are 2 nodes in a cluster: > [root at LH20-GPFS1 ~]# mmgetstate -a > > Node number Node name GPFS state > ------------------------------------------ > 1 LH20-GPFS1 active > 3 LH20-GPFS2 active > > The Cluster status is: > [root at LH20-GPFS1 ~]# mmlscluster > > GPFS cluster information > ======================== > GPFS cluster name: MyCluster.LH20-GPFS2 > GPFS cluster id: 10777108240438931454 > GPFS UID domain: MyCluster.LH20-GPFS2 > Remote shell command: /usr/bin/ssh > Remote file copy command: /usr/bin/scp > Repository type: CCR > > Node Daemon node name IP address Admin node name Designation > -------------------------------------------------------------------- > 1 LH20-GPFS1 10.10.158.61 LH20-GPFS1 quorum-manager > 3 LH20-GPFS2 10.10.158.62 LH20-GPFS2 > > There is a file system: > [root at LH20-GPFS1 ~]# mmlsnsd > > File system Disk name NSD servers > ------------------------------------------------------------ --------------- > fs_gpfs01 nynsd1 (directly attached) > fs_gpfs01 nynsd2 (directly attached) > > [root at LH20-GPFS1 ~]# > > On each Node, There is folder /fs_gpfs01 > The next step is to mount this fs_gpfs01 to be synced between the 2 nodes. > Whilte executing mmmount i get exception: > [root at LH20-GPFS1 ~]# mmmount /fs_gpfs01 > Tue Jul 4 11:14:18 IDT 2017: mmmount: Mounting file systems ... > mount: mount fs_gpfs01 on /fs_gpfs01 failed: Wrong medium type > mmmount: Command failed. Examine previous error messages to determine cause. > > > What am i doing wrong ? > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > -- - Ilan Schwarts -------------- next part -------------- An HTML attachment was scrubbed... URL: From jcatana at gmail.com Tue Jul 4 17:47:09 2017 From: jcatana at gmail.com (Josh Catana) Date: Tue, 4 Jul 2017 12:47:09 -0400 Subject: [gpfsug-discuss] Fail to mount file system In-Reply-To: References: Message-ID: Check /var/adm/ras/mmfs.log.latest The dmesg xfs bug is probably from boot if you look at the dmesg with -T to show the timestamp On Jul 4, 2017 12:29 PM, "IBM Spectrum Scale" wrote: > My bad gave the wrong command, the right one is: mmmount fs_gpfs01-o rs > > Also can you send output of mmlsnsd -X, need to check device type of the > NSDs. > > Are you ok with deleting the file system and disks and building everything > from scratch? > > > Regards, The Spectrum Scale (GPFS) team > > ------------------------------------------------------------ > ------------------------------------------------------ > If you feel that your question can benefit other users of Spectrum Scale > (GPFS), then please post it to the public IBM developerWroks Forum at > https://www.ibm.com/developerworks/community/ > forums/html/forum?id=11111111-0000-0000-0000-000000000479. > > If your query concerns a potential software error in Spectrum Scale (GPFS) > and you have an IBM software maintenance contract please contact > 1-800-237-5511 <(800)%20237-5511> in the United States or your local IBM > Service Center in other countries. > > The forum is informally monitored as time permits and should not be used > for priority messages to the Spectrum Scale (GPFS) team. > > > > From: Ilan Schwarts > To: IBM Spectrum Scale > Cc: gpfsug-discuss-bounces at spectrumscale.org, gpfsug main > discussion list > Date: 07/04/2017 04:26 PM > Subject: Re: [gpfsug-discuss] Fail to mount file system > ------------------------------ > > > > [root at LH20-GPFS1 ~]# mmmount fs_gpfs01 -a > Tue Jul 4 13:52:07 IDT 2017: mmmount: Mounting file systems ... > LH20-GPFS1: mount: mount fs_gpfs01 on /fs_gpfs01 failed: Wrong medium type > mmdsh: LH20-GPFS1 remote shell process had return code 32. > LH20-GPFS2: mount: mount fs_gpfs01 on /fs_gpfs01 failed: Stale file handle > mmdsh: LH20-GPFS2 remote shell process had return code 32. > mmmount: Command failed. Examine previous error messages to determine > cause. > > [root at LH20-GPFS1 ~]# mmmount -o rs /fs_gpfs01 > mmmount: Mount point can not be a relative path name: rs > [root at LH20-GPFS1 ~]# mmmount -o rs fs_gpfs01 > mmmount: Mount point can not be a relative path name: rs > > > > I recieve in "dmesg": > > [ 18.338044] sd 2:0:0:1: [sdc] Attached SCSI disk > [ 141.363422] hvt_cn_callback: unexpected netlink message! > [ 141.366153] hvt_cn_callback: unexpected netlink message! > [ 4479.292850] tracedev: loading out-of-tree module taints kernel. > [ 4479.292888] tracedev: module verification failed: signature and/or > required key missing - tainting kernel > [ 4482.928413] ------------[ cut here ]------------ > [ 4482.928445] WARNING: at fs/xfs/xfs_aops.c:906 > xfs_do_writepage+0x537/0x550 [xfs]() > [ 4482.928446] Modules linked in: mmfs26(OE) mmfslinux(OE) > tracedev(OE) iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi ext4 > mbcache jbd2 loop intel_powerclamp iosf_mbi sg pcspkr hv_utils > i2c_piix4 i2c_core nfsd auth_rpcgss nfs_acl lockd grace sunrpc > binfmt_misc ip_tables xfs libcrc32c sd_mod crc_t10dif > crct10dif_generic crct10dif_common ata_generic pata_acpi hv_netvsc > hyperv_keyboard hid_hyperv hv_storvsc hyperv_fb serio_raw fjes floppy > libata hv_vmbus dm_mirror dm_region_hash dm_log dm_mod > [ 4482.928471] CPU: 1 PID: 15210 Comm: mmfsd Tainted: G OE > ------------ 3.10.0-514.21.2.el7.x86_64 #1 > > On Tue, Jul 4, 2017 at 11:36 AM, IBM Spectrum Scale > wrote: > > What exactly do you mean by "I have received existing corrupted GPFS > 4.2.2 > > lab"? > > Is the file system corrupted ? Maybe this error is then due to file > system > > corruption. > > > > Can you once try: mmmount fs_gpfs01 -a > > If this does not work then try: mmmount -o rs fs_gpfs01 > > > > Let me know which mount is working. > > > > Regards, The Spectrum Scale (GPFS) team > > > > ------------------------------------------------------------ > ------------------------------------------------------ > > If you feel that your question can benefit other users of Spectrum Scale > > (GPFS), then please post it to the public IBM developerWroks Forum at > > https://www.ibm.com/developerworks/community/ > forums/html/forum?id=11111111-0000-0000-0000-000000000479. > > > > If your query concerns a potential software error in Spectrum Scale > (GPFS) > > and you have an IBM software maintenance contract please contact > > 1-800-237-5511 <(800)%20237-5511> in the United States or your local > IBM Service Center in > > other countries. > > > > The forum is informally monitored as time permits and should not be used > for > > priority messages to the Spectrum Scale (GPFS) team. > > > > > > > > From: Ilan Schwarts > > To: gpfsug-discuss at spectrumscale.org > > Date: 07/04/2017 01:47 PM > > Subject: [gpfsug-discuss] Fail to mount file system > > Sent by: gpfsug-discuss-bounces at spectrumscale.org > > ________________________________ > > > > > > > > Hi everyone, I have received existing corrupted GPFS 4.2.2 lab and I > > am trying to make it work. > > There are 2 nodes in a cluster: > > [root at LH20-GPFS1 ~]# mmgetstate -a > > > > Node number Node name GPFS state > > ------------------------------------------ > > 1 LH20-GPFS1 active > > 3 LH20-GPFS2 active > > > > The Cluster status is: > > [root at LH20-GPFS1 ~]# mmlscluster > > > > GPFS cluster information > > ======================== > > GPFS cluster name: MyCluster.LH20-GPFS2 > > GPFS cluster id: 10777108240438931454 > > GPFS UID domain: MyCluster.LH20-GPFS2 > > Remote shell command: /usr/bin/ssh > > Remote file copy command: /usr/bin/scp > > Repository type: CCR > > > > Node Daemon node name IP address Admin node name Designation > > -------------------------------------------------------------------- > > 1 LH20-GPFS1 10.10.158.61 LH20-GPFS1 quorum-manager > > 3 LH20-GPFS2 10.10.158.62 LH20-GPFS2 > > > > There is a file system: > > [root at LH20-GPFS1 ~]# mmlsnsd > > > > File system Disk name NSD servers > > ------------------------------------------------------------ > --------------- > > fs_gpfs01 nynsd1 (directly attached) > > fs_gpfs01 nynsd2 (directly attached) > > > > [root at LH20-GPFS1 ~]# > > > > On each Node, There is folder /fs_gpfs01 > > The next step is to mount this fs_gpfs01 to be synced between the 2 > nodes. > > Whilte executing mmmount i get exception: > > [root at LH20-GPFS1 ~]# mmmount /fs_gpfs01 > > Tue Jul 4 11:14:18 IDT 2017: mmmount: Mounting file systems ... > > mount: mount fs_gpfs01 on /fs_gpfs01 failed: Wrong medium type > > mmmount: Command failed. Examine previous error messages to determine > cause. > > > > > > What am i doing wrong ? > > _______________________________________________ > > gpfsug-discuss mailing list > > gpfsug-discuss at spectrumscale.org > > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > > > > > > > > > -- > > > - > Ilan Schwarts > > > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From scale at us.ibm.com Tue Jul 4 19:15:49 2017 From: scale at us.ibm.com (IBM Spectrum Scale) Date: Tue, 4 Jul 2017 23:45:49 +0530 Subject: [gpfsug-discuss] Fail to mount file system In-Reply-To: References: Message-ID: You can refer to the concepts, planning and installation guide at the link ( https://www.ibm.com/support/knowledgecenter/STXKQY_4.2.2/com.ibm.spectrum.scale.v4r22.doc/bl1xx_library_prodoc.htm ) for finding detailed steps on setting up a cluster or creating a file system. Or open a PMR and work with IBM support to set it up. In your case (just as an example) you can use the below simple steps to delete and recreate the file system: 1) To delete file system and NSDs: a) Unmount file system - mmumount -a b) Delete file system - mmdelfs c) Delete NSDs - mmdelnsd "nynsd1;nynsd2" 2) To create file system with both disks in one system pool and having dataAndMetadata and data and metadata replica and directly attached to the nodes, you can use following steps: a) Create a /tmp/nsd file and fill it up with below information :::dataAndMetadata:1:nynsd1:system :::dataAndMetadata:2:nynsd2:system b) Use mmcrnsd -F /tmp/nsd to create NSDs c) Create file system using (just an example with assumptions on config) - mmcrfs /dev/fs_gpfs01 -F /tmp/nsd -A yes -B 256K -n 32 -m 2 -r 2 -T /fs_gpfs01 You can refer to above guide for configuring it in other ways as you want. If you have any issues with these steps you can raise PMR and follow proper channel to setup file system as well. Regards, The Spectrum Scale (GPFS) team ------------------------------------------------------------------------------------------------------------------ If you feel that your question can benefit other users of Spectrum Scale (GPFS), then please post it to the public IBM developerWroks Forum at https://www.ibm.com/developerworks/community/forums/html/forum?id=11111111-0000-0000-0000-000000000479 . If your query concerns a potential software error in Spectrum Scale (GPFS) and you have an IBM software maintenance contract please contact 1-800-237-5511 in the United States or your local IBM Service Center in other countries. The forum is informally monitored as time permits and should not be used for priority messages to the Spectrum Scale (GPFS) team. From: Ilan Schwarts To: IBM Spectrum Scale Cc: gpfsug-discuss-bounces at spectrumscale.org, gpfsug main discussion list Date: 07/04/2017 10:16 PM Subject: Re: [gpfsug-discuss] Fail to mount file system Yes I am ok with deleting. I follow a guide from john olsen at the ibm team from tuscon.. but the guide had steps after the gpfs setup... Is there step by step guide for gpfs cluster setup other than the one in the ibm site? Thank My bad gave the wrong command, the right one is: mmmount fs_gpfs01-o rs Also can you send output of mmlsnsd -X, need to check device type of the NSDs. Are you ok with deleting the file system and disks and building everything from scratch? Regards, The Spectrum Scale (GPFS) team ------------------------------------------------------------------------------------------------------------------ If you feel that your question can benefit other users of Spectrum Scale (GPFS), then please post it to the public IBM developerWroks Forum at https://www.ibm.com/developerworks/community/forums/html/forum?id=11111111-0000-0000-0000-000000000479 . If your query concerns a potential software error in Spectrum Scale (GPFS) and you have an IBM software maintenance contract please contact 1-800-237-5511 in the United States or your local IBM Service Center in other countries. The forum is informally monitored as time permits and should not be used for priority messages to the Spectrum Scale (GPFS) team. From: Ilan Schwarts To: IBM Spectrum Scale Cc: gpfsug-discuss-bounces at spectrumscale.org, gpfsug main discussion list Date: 07/04/2017 04:26 PM Subject: Re: [gpfsug-discuss] Fail to mount file system [root at LH20-GPFS1 ~]# mmmount fs_gpfs01 -a Tue Jul 4 13:52:07 IDT 2017: mmmount: Mounting file systems ... LH20-GPFS1: mount: mount fs_gpfs01 on /fs_gpfs01 failed: Wrong medium type mmdsh: LH20-GPFS1 remote shell process had return code 32. LH20-GPFS2: mount: mount fs_gpfs01 on /fs_gpfs01 failed: Stale file handle mmdsh: LH20-GPFS2 remote shell process had return code 32. mmmount: Command failed. Examine previous error messages to determine cause. [root at LH20-GPFS1 ~]# mmmount -o rs /fs_gpfs01 mmmount: Mount point can not be a relative path name: rs [root at LH20-GPFS1 ~]# mmmount -o rs fs_gpfs01 mmmount: Mount point can not be a relative path name: rs I recieve in "dmesg": [ 18.338044] sd 2:0:0:1: [sdc] Attached SCSI disk [ 141.363422] hvt_cn_callback: unexpected netlink message! [ 141.366153] hvt_cn_callback: unexpected netlink message! [ 4479.292850] tracedev: loading out-of-tree module taints kernel. [ 4479.292888] tracedev: module verification failed: signature and/or required key missing - tainting kernel [ 4482.928413] ------------[ cut here ]------------ [ 4482.928445] WARNING: at fs/xfs/xfs_aops.c:906 xfs_do_writepage+0x537/0x550 [xfs]() [ 4482.928446] Modules linked in: mmfs26(OE) mmfslinux(OE) tracedev(OE) iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi ext4 mbcache jbd2 loop intel_powerclamp iosf_mbi sg pcspkr hv_utils i2c_piix4 i2c_core nfsd auth_rpcgss nfs_acl lockd grace sunrpc binfmt_misc ip_tables xfs libcrc32c sd_mod crc_t10dif crct10dif_generic crct10dif_common ata_generic pata_acpi hv_netvsc hyperv_keyboard hid_hyperv hv_storvsc hyperv_fb serio_raw fjes floppy libata hv_vmbus dm_mirror dm_region_hash dm_log dm_mod [ 4482.928471] CPU: 1 PID: 15210 Comm: mmfsd Tainted: G OE ------------ 3.10.0-514.21.2.el7.x86_64 #1 On Tue, Jul 4, 2017 at 11:36 AM, IBM Spectrum Scale wrote: > What exactly do you mean by "I have received existing corrupted GPFS 4.2.2 > lab"? > Is the file system corrupted ? Maybe this error is then due to file system > corruption. > > Can you once try: mmmount fs_gpfs01 -a > If this does not work then try: mmmount -o rs fs_gpfs01 > > Let me know which mount is working. > > Regards, The Spectrum Scale (GPFS) team > > ------------------------------------------------------------------------------------------------------------------ > If you feel that your question can benefit other users of Spectrum Scale > (GPFS), then please post it to the public IBM developerWroks Forum at > https://www.ibm.com/developerworks/community/forums/html/forum?id=11111111-0000-0000-0000-000000000479 . > > If your query concerns a potential software error in Spectrum Scale (GPFS) > and you have an IBM software maintenance contract please contact > 1-800-237-5511 in the United States or your local IBM Service Center in > other countries. > > The forum is informally monitored as time permits and should not be used for > priority messages to the Spectrum Scale (GPFS) team. > > > > From: Ilan Schwarts > To: gpfsug-discuss at spectrumscale.org > Date: 07/04/2017 01:47 PM > Subject: [gpfsug-discuss] Fail to mount file system > Sent by: gpfsug-discuss-bounces at spectrumscale.org > ________________________________ > > > > Hi everyone, I have received existing corrupted GPFS 4.2.2 lab and I > am trying to make it work. > There are 2 nodes in a cluster: > [root at LH20-GPFS1 ~]# mmgetstate -a > > Node number Node name GPFS state > ------------------------------------------ > 1 LH20-GPFS1 active > 3 LH20-GPFS2 active > > The Cluster status is: > [root at LH20-GPFS1 ~]# mmlscluster > > GPFS cluster information > ======================== > GPFS cluster name: MyCluster.LH20-GPFS2 > GPFS cluster id: 10777108240438931454 > GPFS UID domain: MyCluster.LH20-GPFS2 > Remote shell command: /usr/bin/ssh > Remote file copy command: /usr/bin/scp > Repository type: CCR > > Node Daemon node name IP address Admin node name Designation > -------------------------------------------------------------------- > 1 LH20-GPFS1 10.10.158.61 LH20-GPFS1 quorum-manager > 3 LH20-GPFS2 10.10.158.62 LH20-GPFS2 > > There is a file system: > [root at LH20-GPFS1 ~]# mmlsnsd > > File system Disk name NSD servers > --------------------------------------------------------------------------- > fs_gpfs01 nynsd1 (directly attached) > fs_gpfs01 nynsd2 (directly attached) > > [root at LH20-GPFS1 ~]# > > On each Node, There is folder /fs_gpfs01 > The next step is to mount this fs_gpfs01 to be synced between the 2 nodes. > Whilte executing mmmount i get exception: > [root at LH20-GPFS1 ~]# mmmount /fs_gpfs01 > Tue Jul 4 11:14:18 IDT 2017: mmmount: Mounting file systems ... > mount: mount fs_gpfs01 on /fs_gpfs01 failed: Wrong medium type > mmmount: Command failed. Examine previous error messages to determine cause. > > > What am i doing wrong ? > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > -- - Ilan Schwarts -------------- next part -------------- An HTML attachment was scrubbed... URL: From ilan84 at gmail.com Wed Jul 5 08:02:19 2017 From: ilan84 at gmail.com (Ilan Schwarts) Date: Wed, 5 Jul 2017 10:02:19 +0300 Subject: [gpfsug-discuss] Fail to mount file system In-Reply-To: References: Message-ID: Hi, [root at LH20-GPFS2 ~]# mmlsnsd -X Disk name NSD volume ID Device Devtype Node name Remarks --------------------------------------------------------------------------------------------------- nynsd1 0A0A9E3D594D5CA8 - - LH20-GPFS2 (not found) directly attached nynsd2 0A0A9E3D594D5CA9 - - LH20-GPFS2 (not found) directly attached mmmount failed with -o rs root at LH20-GPFS2 ~]# mmmount fs_gpfs01 -o rs Wed Jul 5 09:58:29 IDT 2017: mmmount: Mounting file systems ... mount: mount fs_gpfs01 on /fs_gpfs01 failed: Wrong medium type mmmount: Command failed. Examine previous error messages to determine cause. and in logs /var/adm/ras/mmfs.log.latest: 2017-07-05_09:58:30.009+0300: [I] Command: mount fs_gpfs01 2017-07-05_09:58:30.890+0300: Failed to open fs_gpfs01. 2017-07-05_09:58:30.890+0300: Wrong medium type 2017-07-05_09:58:30.890+0300: [E] Failed to open fs_gpfs01. 2017-07-05_09:58:30.890+0300: [W] Command: err 48: mount fs_gpfs01 From scale at us.ibm.com Wed Jul 5 08:44:19 2017 From: scale at us.ibm.com (IBM Spectrum Scale) Date: Wed, 5 Jul 2017 13:14:19 +0530 Subject: [gpfsug-discuss] Fail to mount file system In-Reply-To: References: Message-ID: >From mmlsnsd output can see that the disks are not found by gpfs (maybe some connection issue or they have been changed/removed from backend) Please open a PMR and work with IBM support to resolve this. Regards, The Spectrum Scale (GPFS) team ------------------------------------------------------------------------------------------------------------------ If you feel that your question can benefit other users of Spectrum Scale (GPFS), then please post it to the public IBM developerWroks Forum at https://www.ibm.com/developerworks/community/forums/html/forum?id=11111111-0000-0000-0000-000000000479 . If your query concerns a potential software error in Spectrum Scale (GPFS) and you have an IBM software maintenance contract please contact 1-800-237-5511 in the United States or your local IBM Service Center in other countries. The forum is informally monitored as time permits and should not be used for priority messages to the Spectrum Scale (GPFS) team. From: Ilan Schwarts To: IBM Spectrum Scale Cc: gpfsug main discussion list , gpfsug-discuss-bounces at spectrumscale.org Date: 07/05/2017 12:32 PM Subject: Re: [gpfsug-discuss] Fail to mount file system Hi, [root at LH20-GPFS2 ~]# mmlsnsd -X Disk name NSD volume ID Device Devtype Node name Remarks --------------------------------------------------------------------------------------------------- nynsd1 0A0A9E3D594D5CA8 - - LH20-GPFS2 (not found) directly attached nynsd2 0A0A9E3D594D5CA9 - - LH20-GPFS2 (not found) directly attached mmmount failed with -o rs root at LH20-GPFS2 ~]# mmmount fs_gpfs01 -o rs Wed Jul 5 09:58:29 IDT 2017: mmmount: Mounting file systems ... mount: mount fs_gpfs01 on /fs_gpfs01 failed: Wrong medium type mmmount: Command failed. Examine previous error messages to determine cause. and in logs /var/adm/ras/mmfs.log.latest: 2017-07-05_09:58:30.009+0300: [I] Command: mount fs_gpfs01 2017-07-05_09:58:30.890+0300: Failed to open fs_gpfs01. 2017-07-05_09:58:30.890+0300: Wrong medium type 2017-07-05_09:58:30.890+0300: [E] Failed to open fs_gpfs01. 2017-07-05_09:58:30.890+0300: [W] Command: err 48: mount fs_gpfs01 -------------- next part -------------- An HTML attachment was scrubbed... URL: From UWEFALKE at de.ibm.com Wed Jul 5 09:00:23 2017 From: UWEFALKE at de.ibm.com (Uwe Falke) Date: Wed, 5 Jul 2017 10:00:23 +0200 Subject: [gpfsug-discuss] Fail to mount file system In-Reply-To: References: Message-ID: Hi, maybe you need to specify your NSDs via the nsddevices user exit (Identifies local physical devices that are used as GPFS Network Shared Disks (NSDs).). script to list the NSDs , place it under /var/mmfs/etc/nsddevices. There is a template under /usr/lpp/mmfs/samples/nsddevices.sample which should provide the necessary details. Mit freundlichen Gr??en / Kind regards Dr. Uwe Falke IT Specialist High Performance Computing Services / Integrated Technology Services / Data Center Services ------------------------------------------------------------------------------------------------------------------------------------------- IBM Deutschland Rathausstr. 7 09111 Chemnitz Phone: +49 371 6978 2165 Mobile: +49 175 575 2877 E-Mail: uwefalke at de.ibm.com ------------------------------------------------------------------------------------------------------------------------------------------- IBM Deutschland Business & Technology Services GmbH / Gesch?ftsf?hrung: Andreas Hasse, Thomas Wolter Sitz der Gesellschaft: Ehningen / Registergericht: Amtsgericht Stuttgart, HRB 17122 From ilan84 at gmail.com Wed Jul 5 13:12:14 2017 From: ilan84 at gmail.com (Ilan Schwarts) Date: Wed, 5 Jul 2017 15:12:14 +0300 Subject: [gpfsug-discuss] update smb package ? Message-ID: Hi, while trying to enable SMB service i receive the following root at LH20-GPFS1 ~]# mmces service enable smb LH20-GPFS1: Cannot enable SMB service on LH20-GPFS1 LH20-GPFS1: mmcesop: Prerequisite libraries not found or correct version not LH20-GPFS1: installed. Ensure gpfs.smb is properly installed. LH20-GPFS1: mmcesop: Command failed. Examine previous error messages to determine cause. mmdsh: LH20-GPFS1 remote shell process had return code 1. Do i use normal yum update ? how to solve this issue ? Thanks From ilan84 at gmail.com Wed Jul 5 13:18:54 2017 From: ilan84 at gmail.com (Ilan Schwarts) Date: Wed, 5 Jul 2017 15:18:54 +0300 Subject: [gpfsug-discuss] Fwd: update smb package ? In-Reply-To: References: Message-ID: [root at LH20-GPFS1 ~]# rpm -qa | grep gpfs gpfs.ext-4.2.2-0.x86_64 gpfs.msg.en_US-4.2.2-0.noarch gpfs.gui-4.2.2-0.noarch gpfs.gpl-4.2.2-0.noarch gpfs.gskit-8.0.50-57.x86_64 gpfs.gss.pmsensors-4.2.2-0.el7.x86_64 gpfs.adv-4.2.2-0.x86_64 gpfs.java-4.2.2-0.x86_64 gpfs.gss.pmcollector-4.2.2-0.el7.x86_64 gpfs.base-4.2.2-0.x86_64 gpfs.crypto-4.2.2-0.x86_64 [root at LH20-GPFS1 ~]# uname -a Linux LH20-GPFS1.LH20.com 3.10.0-514.21.2.el7.x86_64 #1 SMP Tue Jun 20 12:24:47 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux [root at LH20-GPFS1 ~]# From r.sobey at imperial.ac.uk Wed Jul 5 13:23:10 2017 From: r.sobey at imperial.ac.uk (Sobey, Richard A) Date: Wed, 5 Jul 2017 12:23:10 +0000 Subject: [gpfsug-discuss] Fwd: update smb package ? In-Reply-To: References: Message-ID: You don't have the gpfs.smb package installed. Yum install gpfs.smb Or install the package manually from /usr/lpp/mmfs//smb_rpms [root at ces ~]# rpm -qa | grep gpfs gpfs.smb-4.3.11_gpfs_21-8.el7.x86_64 -----Original Message----- From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Ilan Schwarts Sent: 05 July 2017 13:19 To: gpfsug main discussion list Subject: [gpfsug-discuss] Fwd: update smb package ? [root at LH20-GPFS1 ~]# rpm -qa | grep gpfs gpfs.ext-4.2.2-0.x86_64 gpfs.msg.en_US-4.2.2-0.noarch gpfs.gui-4.2.2-0.noarch gpfs.gpl-4.2.2-0.noarch gpfs.gskit-8.0.50-57.x86_64 gpfs.gss.pmsensors-4.2.2-0.el7.x86_64 gpfs.adv-4.2.2-0.x86_64 gpfs.java-4.2.2-0.x86_64 gpfs.gss.pmcollector-4.2.2-0.el7.x86_64 gpfs.base-4.2.2-0.x86_64 gpfs.crypto-4.2.2-0.x86_64 [root at LH20-GPFS1 ~]# uname -a Linux LH20-GPFS1.LH20.com 3.10.0-514.21.2.el7.x86_64 #1 SMP Tue Jun 20 12:24:47 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux [root at LH20-GPFS1 ~]# _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From ilan84 at gmail.com Wed Jul 5 13:29:11 2017 From: ilan84 at gmail.com (Ilan Schwarts) Date: Wed, 5 Jul 2017 15:29:11 +0300 Subject: [gpfsug-discuss] Fwd: update smb package ? In-Reply-To: References: Message-ID: [root at LH20-GPFS1 ~]# yum install gpfs.smb Loaded plugins: fastestmirror, langpacks base | 3.6 kB 00:00:00 epel/x86_64/metalink | 24 kB 00:00:00 epel | 4.3 kB 00:00:00 extras | 3.4 kB 00:00:00 updates | 3.4 kB 00:00:00 (1/4): epel/x86_64/updateinfo | 789 kB 00:00:00 (2/4): extras/7/x86_64/primary_db | 188 kB 00:00:00 (3/4): epel/x86_64/primary_db | 4.8 MB 00:00:00 (4/4): updates/7/x86_64/primary_db | 7.7 MB 00:00:01 Loading mirror speeds from cached hostfile * base: centos.spd.co.il * epel: mirror.nonstop.co.il * extras: centos.spd.co.il * updates: centos.spd.co.il No package gpfs.smb available. Error: Nothing to do [root at LH20-GPFS1 ~]# ls /usr/lpp/mmfs/4.2.2.0/ gpfs_rpms/ license/ manifest zimon_debs/ zimon_rpms/ something is missing in my machine :) On Wed, Jul 5, 2017 at 3:23 PM, Sobey, Richard A wrote: > You don't have the gpfs.smb package installed. > > > > Yum install gpfs.smb > > > > Or install the package manually from /usr/lpp/mmfs//smb_rpms > > > > [root at ces ~]# rpm -qa | grep gpfs > > gpfs.smb-4.3.11_gpfs_21-8.el7.x86_64 > > > > > > -----Original Message----- > From: gpfsug-discuss-bounces at spectrumscale.org > [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Ilan Schwarts > Sent: 05 July 2017 13:19 > To: gpfsug main discussion list > Subject: [gpfsug-discuss] Fwd: update smb package ? > > > > [root at LH20-GPFS1 ~]# rpm -qa | grep gpfs > > gpfs.ext-4.2.2-0.x86_64 > > gpfs.msg.en_US-4.2.2-0.noarch > > gpfs.gui-4.2.2-0.noarch > > gpfs.gpl-4.2.2-0.noarch > > gpfs.gskit-8.0.50-57.x86_64 > > gpfs.gss.pmsensors-4.2.2-0.el7.x86_64 > > gpfs.adv-4.2.2-0.x86_64 > > gpfs.java-4.2.2-0.x86_64 > > gpfs.gss.pmcollector-4.2.2-0.el7.x86_64 > > gpfs.base-4.2.2-0.x86_64 > > gpfs.crypto-4.2.2-0.x86_64 > > [root at LH20-GPFS1 ~]# uname -a > > Linux LH20-GPFS1.LH20.com 3.10.0-514.21.2.el7.x86_64 #1 SMP Tue Jun 20 > > 12:24:47 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux > > [root at LH20-GPFS1 ~]# > > _______________________________________________ > > gpfsug-discuss mailing list > > gpfsug-discuss at spectrumscale.org > > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -- - Ilan Schwarts From r.sobey at imperial.ac.uk Wed Jul 5 13:41:29 2017 From: r.sobey at imperial.ac.uk (Sobey, Richard A) Date: Wed, 5 Jul 2017 12:41:29 +0000 Subject: [gpfsug-discuss] Fwd: update smb package ? In-Reply-To: References: Message-ID: Ah... yes you need to download the protocols version of gpfs from Fix Central. Same GPFS but with the SMB/Object etc packages. -----Original Message----- From: Ilan Schwarts [mailto:ilan84 at gmail.com] Sent: 05 July 2017 13:29 To: gpfsug main discussion list ; Sobey, Richard A Subject: Re: [gpfsug-discuss] Fwd: update smb package ? [root at LH20-GPFS1 ~]# yum install gpfs.smb Loaded plugins: fastestmirror, langpacks base | 3.6 kB 00:00:00 epel/x86_64/metalink | 24 kB 00:00:00 epel | 4.3 kB 00:00:00 extras | 3.4 kB 00:00:00 updates | 3.4 kB 00:00:00 (1/4): epel/x86_64/updateinfo | 789 kB 00:00:00 (2/4): extras/7/x86_64/primary_db | 188 kB 00:00:00 (3/4): epel/x86_64/primary_db | 4.8 MB 00:00:00 (4/4): updates/7/x86_64/primary_db | 7.7 MB 00:00:01 Loading mirror speeds from cached hostfile * base: centos.spd.co.il * epel: mirror.nonstop.co.il * extras: centos.spd.co.il * updates: centos.spd.co.il No package gpfs.smb available. Error: Nothing to do [root at LH20-GPFS1 ~]# ls /usr/lpp/mmfs/4.2.2.0/ gpfs_rpms/ license/ manifest zimon_debs/ zimon_rpms/ something is missing in my machine :) On Wed, Jul 5, 2017 at 3:23 PM, Sobey, Richard A wrote: > You don't have the gpfs.smb package installed. > > > > Yum install gpfs.smb > > > > Or install the package manually from /usr/lpp/mmfs//smb_rpms > > > > [root at ces ~]# rpm -qa | grep gpfs > > gpfs.smb-4.3.11_gpfs_21-8.el7.x86_64 > > > > > > -----Original Message----- > From: gpfsug-discuss-bounces at spectrumscale.org > [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Ilan > Schwarts > Sent: 05 July 2017 13:19 > To: gpfsug main discussion list > Subject: [gpfsug-discuss] Fwd: update smb package ? > > > > [root at LH20-GPFS1 ~]# rpm -qa | grep gpfs > > gpfs.ext-4.2.2-0.x86_64 > > gpfs.msg.en_US-4.2.2-0.noarch > > gpfs.gui-4.2.2-0.noarch > > gpfs.gpl-4.2.2-0.noarch > > gpfs.gskit-8.0.50-57.x86_64 > > gpfs.gss.pmsensors-4.2.2-0.el7.x86_64 > > gpfs.adv-4.2.2-0.x86_64 > > gpfs.java-4.2.2-0.x86_64 > > gpfs.gss.pmcollector-4.2.2-0.el7.x86_64 > > gpfs.base-4.2.2-0.x86_64 > > gpfs.crypto-4.2.2-0.x86_64 > > [root at LH20-GPFS1 ~]# uname -a > > Linux LH20-GPFS1.LH20.com 3.10.0-514.21.2.el7.x86_64 #1 SMP Tue Jun 20 > > 12:24:47 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux > > [root at LH20-GPFS1 ~]# > > _______________________________________________ > > gpfsug-discuss mailing list > > gpfsug-discuss at spectrumscale.org > > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -- - Ilan Schwarts From ilan84 at gmail.com Wed Jul 5 14:08:39 2017 From: ilan84 at gmail.com (Ilan Schwarts) Date: Wed, 5 Jul 2017 16:08:39 +0300 Subject: [gpfsug-discuss] Fwd: update smb package ? In-Reply-To: References: Message-ID: Sorry for newbish question, What do you mean by "from Fix Central", Do i need to define another repository for the yum ? or download manually ? its spectrum scale 4.2.2 On Wed, Jul 5, 2017 at 3:41 PM, Sobey, Richard A wrote: > Ah... yes you need to download the protocols version of gpfs from Fix Central. Same GPFS but with the SMB/Object etc packages. > > -----Original Message----- > From: Ilan Schwarts [mailto:ilan84 at gmail.com] > Sent: 05 July 2017 13:29 > To: gpfsug main discussion list ; Sobey, Richard A > Subject: Re: [gpfsug-discuss] Fwd: update smb package ? > > [root at LH20-GPFS1 ~]# yum install gpfs.smb Loaded plugins: fastestmirror, langpacks base > > | 3.6 kB 00:00:00 > epel/x86_64/metalink > > | 24 kB 00:00:00 > epel > > | 4.3 kB 00:00:00 > extras > > | 3.4 kB 00:00:00 > updates > > | 3.4 kB 00:00:00 > (1/4): epel/x86_64/updateinfo > > | 789 kB 00:00:00 > (2/4): extras/7/x86_64/primary_db > > | 188 kB 00:00:00 > (3/4): epel/x86_64/primary_db > > | 4.8 MB 00:00:00 > (4/4): updates/7/x86_64/primary_db > > | 7.7 MB 00:00:01 > Loading mirror speeds from cached hostfile > * base: centos.spd.co.il > * epel: mirror.nonstop.co.il > * extras: centos.spd.co.il > * updates: centos.spd.co.il > No package gpfs.smb available. > Error: Nothing to do > > > [root at LH20-GPFS1 ~]# ls /usr/lpp/mmfs/4.2.2.0/ > gpfs_rpms/ license/ manifest zimon_debs/ zimon_rpms/ > > > something is missing in my machine :) > > > On Wed, Jul 5, 2017 at 3:23 PM, Sobey, Richard A wrote: >> You don't have the gpfs.smb package installed. >> >> >> >> Yum install gpfs.smb >> >> >> >> Or install the package manually from /usr/lpp/mmfs//smb_rpms >> >> >> >> [root at ces ~]# rpm -qa | grep gpfs >> >> gpfs.smb-4.3.11_gpfs_21-8.el7.x86_64 >> >> >> >> >> >> -----Original Message----- >> From: gpfsug-discuss-bounces at spectrumscale.org >> [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Ilan >> Schwarts >> Sent: 05 July 2017 13:19 >> To: gpfsug main discussion list >> Subject: [gpfsug-discuss] Fwd: update smb package ? >> >> >> >> [root at LH20-GPFS1 ~]# rpm -qa | grep gpfs >> >> gpfs.ext-4.2.2-0.x86_64 >> >> gpfs.msg.en_US-4.2.2-0.noarch >> >> gpfs.gui-4.2.2-0.noarch >> >> gpfs.gpl-4.2.2-0.noarch >> >> gpfs.gskit-8.0.50-57.x86_64 >> >> gpfs.gss.pmsensors-4.2.2-0.el7.x86_64 >> >> gpfs.adv-4.2.2-0.x86_64 >> >> gpfs.java-4.2.2-0.x86_64 >> >> gpfs.gss.pmcollector-4.2.2-0.el7.x86_64 >> >> gpfs.base-4.2.2-0.x86_64 >> >> gpfs.crypto-4.2.2-0.x86_64 >> >> [root at LH20-GPFS1 ~]# uname -a >> >> Linux LH20-GPFS1.LH20.com 3.10.0-514.21.2.el7.x86_64 #1 SMP Tue Jun 20 >> >> 12:24:47 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux >> >> [root at LH20-GPFS1 ~]# >> >> _______________________________________________ >> >> gpfsug-discuss mailing list >> >> gpfsug-discuss at spectrumscale.org >> >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >> >> >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at spectrumscale.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >> > > > > -- > > > - > Ilan Schwarts -- - Ilan Schwarts From S.J.Thompson at bham.ac.uk Wed Jul 5 14:40:46 2017 From: S.J.Thompson at bham.ac.uk (Simon Thompson (IT Research Support)) Date: Wed, 5 Jul 2017 13:40:46 +0000 Subject: [gpfsug-discuss] Fwd: update smb package ? In-Reply-To: References: Message-ID: IBM code comes from either IBM Passport Advantage (where you sign in with a corporate account that lists your product associations), or from IBM Fix Central (google it). Fix Central is supposed to be for service updates. Give the lack of experience, you may want to look at the install toolkit which ships with Spectrum Scale. Simon On 05/07/2017, 14:08, "gpfsug-discuss-bounces at spectrumscale.org on behalf of ilan84 at gmail.com" wrote: >Sorry for newbish question, >What do you mean by "from Fix Central", >Do i need to define another repository for the yum ? or download manually >? >its spectrum scale 4.2.2 > >On Wed, Jul 5, 2017 at 3:41 PM, Sobey, Richard A >wrote: >> Ah... yes you need to download the protocols version of gpfs from Fix >>Central. Same GPFS but with the SMB/Object etc packages. >> >> -----Original Message----- >> From: Ilan Schwarts [mailto:ilan84 at gmail.com] >> Sent: 05 July 2017 13:29 >> To: gpfsug main discussion list ; >>Sobey, Richard A >> Subject: Re: [gpfsug-discuss] Fwd: update smb package ? >> >> [root at LH20-GPFS1 ~]# yum install gpfs.smb Loaded plugins: >>fastestmirror, langpacks base >> >> | 3.6 kB 00:00:00 >> epel/x86_64/metalink >> >> | 24 kB 00:00:00 >> epel >> >> | 4.3 kB 00:00:00 >> extras >> >> | 3.4 kB 00:00:00 >> updates >> >> | 3.4 kB 00:00:00 >> (1/4): epel/x86_64/updateinfo >> >> | 789 kB 00:00:00 >> (2/4): extras/7/x86_64/primary_db >> >> | 188 kB 00:00:00 >> (3/4): epel/x86_64/primary_db >> >> | 4.8 MB 00:00:00 >> (4/4): updates/7/x86_64/primary_db >> >> | 7.7 MB 00:00:01 >> Loading mirror speeds from cached hostfile >> * base: centos.spd.co.il >> * epel: mirror.nonstop.co.il >> * extras: centos.spd.co.il >> * updates: centos.spd.co.il >> No package gpfs.smb available. >> Error: Nothing to do >> >> >> [root at LH20-GPFS1 ~]# ls /usr/lpp/mmfs/4.2.2.0/ >> gpfs_rpms/ license/ manifest zimon_debs/ zimon_rpms/ >> >> >> something is missing in my machine :) >> >> >> On Wed, Jul 5, 2017 at 3:23 PM, Sobey, Richard A >> wrote: >>> You don't have the gpfs.smb package installed. >>> >>> >>> >>> Yum install gpfs.smb >>> >>> >>> >>> Or install the package manually from /usr/lpp/mmfs//smb_rpms >>> >>> >>> >>> [root at ces ~]# rpm -qa | grep gpfs >>> >>> gpfs.smb-4.3.11_gpfs_21-8.el7.x86_64 >>> >>> >>> >>> >>> >>> -----Original Message----- >>> From: gpfsug-discuss-bounces at spectrumscale.org >>> [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Ilan >>> Schwarts >>> Sent: 05 July 2017 13:19 >>> To: gpfsug main discussion list >>> Subject: [gpfsug-discuss] Fwd: update smb package ? >>> >>> >>> >>> [root at LH20-GPFS1 ~]# rpm -qa | grep gpfs >>> >>> gpfs.ext-4.2.2-0.x86_64 >>> >>> gpfs.msg.en_US-4.2.2-0.noarch >>> >>> gpfs.gui-4.2.2-0.noarch >>> >>> gpfs.gpl-4.2.2-0.noarch >>> >>> gpfs.gskit-8.0.50-57.x86_64 >>> >>> gpfs.gss.pmsensors-4.2.2-0.el7.x86_64 >>> >>> gpfs.adv-4.2.2-0.x86_64 >>> >>> gpfs.java-4.2.2-0.x86_64 >>> >>> gpfs.gss.pmcollector-4.2.2-0.el7.x86_64 >>> >>> gpfs.base-4.2.2-0.x86_64 >>> >>> gpfs.crypto-4.2.2-0.x86_64 >>> >>> [root at LH20-GPFS1 ~]# uname -a >>> >>> Linux LH20-GPFS1.LH20.com 3.10.0-514.21.2.el7.x86_64 #1 SMP Tue Jun 20 >>> >>> 12:24:47 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux >>> >>> [root at LH20-GPFS1 ~]# >>> >>> _______________________________________________ >>> >>> gpfsug-discuss mailing list >>> >>> gpfsug-discuss at spectrumscale.org >>> >>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >>> >>> >>> _______________________________________________ >>> gpfsug-discuss mailing list >>> gpfsug-discuss at spectrumscale.org >>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >>> >> >> >> >> -- >> >> >> - >> Ilan Schwarts > > > >-- > > >- >Ilan Schwarts >_______________________________________________ >gpfsug-discuss mailing list >gpfsug-discuss at spectrumscale.org >http://gpfsug.org/mailman/listinfo/gpfsug-discuss From hpc-luke at uconn.edu Wed Jul 5 15:52:52 2017 From: hpc-luke at uconn.edu (hpc-luke at uconn.edu) Date: Wed, 05 Jul 2017 10:52:52 -0400 Subject: [gpfsug-discuss] Mass UID migration suggestions Message-ID: <595cfd44.kc2G2OUXdgiX+srO%hpc-luke@uconn.edu> Thank you both, I was already using the c++ stl hash map to do the mapping of uid_t to uid_t, but I will use that example to learn how to use the proper gpfs apis. And thank you for the ACL suggestion, as that is likely the best way to handle certain users who are logged in/running jobs constantly, where we would not like to force them to logout. And thank you for the reminder to re-run backups. Thank you for your time, Luke Storrs-HPC University of Connecticut From mweil at wustl.edu Wed Jul 5 16:51:50 2017 From: mweil at wustl.edu (Matt Weil) Date: Wed, 5 Jul 2017 10:51:50 -0500 Subject: [gpfsug-discuss] pmcollector node Message-ID: <390fca36-e49a-7255-1ecf-a467a6ee92a5@wustl.edu> Hello all, Question on the requirements on pmcollector node/s for a 500+ node cluster. Is there a sizing guide? What specifics should we scale? CPU Disks memory? Thanks Matt From kkr at lbl.gov Wed Jul 5 17:23:38 2017 From: kkr at lbl.gov (Kristy Kallback-Rose) Date: Wed, 5 Jul 2017 09:23:38 -0700 Subject: [gpfsug-discuss] Zimon checks, ability to use a subset of checks Message-ID: As I understand it, there is currently no way to collect just a subset of stats in a category. For example, CPU stats are: cpu_contexts cpu_guest cpu_guest_nice cpu_hiq cpu_idle cpu_interrupts cpu_iowait cpu_nice cpu_siq cpu_steal cpu_system cpu_user but I'm only interested in tracking a subset. The config file seems to want the category "CPU" which seems like an all-or-nothing approach. I am thinking about submitting an RFE to request the ability to pick and choose checks within a category, but thought I'd float the idea here before I submit. Would others find value in this? Thanks, Kristy -------------- next part -------------- An HTML attachment was scrubbed... URL: From Robert.Oesterlin at nuance.com Wed Jul 5 18:00:44 2017 From: Robert.Oesterlin at nuance.com (Oesterlin, Robert) Date: Wed, 5 Jul 2017 17:00:44 +0000 Subject: [gpfsug-discuss] Zimon checks, ability to use a subset of checks Message-ID: <11A5144D-A5AF-4829-B7D4-4313F357C6CB@nuance.com> Count me in! Bob Oesterlin Sr Principal Storage Engineer, Nuance From: on behalf of Kristy Kallback-Rose Reply-To: gpfsug main discussion list Date: Wednesday, July 5, 2017 at 11:23 AM To: gpfsug main discussion list Subject: [EXTERNAL] [gpfsug-discuss] Zimon checks, ability to use a subset of checks I am thinking about submitting an RFE to request the ability to pick and choose checks within a category, but thought I'd float the idea here before I submit. Would others find value in this? -------------- next part -------------- An HTML attachment was scrubbed... URL: From kkr at lbl.gov Wed Jul 5 19:22:14 2017 From: kkr at lbl.gov (Kristy Kallback-Rose) Date: Wed, 5 Jul 2017 11:22:14 -0700 Subject: [gpfsug-discuss] Meaning of API Stats Category In-Reply-To: References: Message-ID: Thank you Eric. That did help. On Mon, Jun 12, 2017 at 2:01 PM, IBM Spectrum Scale wrote: > Hello Kristy, > > The GPFSFileSystemAPI and GPFSNodeAPI sensor metrics are from the point of > view of "applications" in the sense that they provide stats about I/O > requests made to files in GPFS file systems from user level applications > using POSIX interfaces like open(), close(), read(), write(), etc. > > This is in contrast to similarly named sensors without the "API" suffix, > like GPFSFilesystem and GPFSNode. Those sensors provide stats about I/O > requests made by the GPFS code to NSDs (disks) making up GPFS file systems. > > The relationship between application I/O and disk I/O might or might not > be obvious. Consider some examples. An application that starts > sequentially reading a file might, at least initially, cause more disk I/O > than expected because GPFS has decided to prefetch data. An application > write() might not immediately cause a the writing of disk blocks due to the > operation of the pagepool. Ultimately, application write()s might cause > twice as much data written to disk due to the replication factor of the > file system. Application I/O concerns itself with user data; disk I/O > might have to occur to handle the user data and associated file system > metadata (like inodes and indirect blocks). > > The difference between GPFSFileSystemAPI and GPFSNodeAPI: > GPFSFileSystemAPI reports stats for application I/O per filesystem per > node; GPFSNodeAPI reports application I/O stats per node. Similarly, > GPFSFilesystem reports stats for disk I/O per filesystem per node; GPFSNode > reports disk I/O stats per node. > > I hope this helps. > Eric Agar > > > Regards, The Spectrum Scale (GPFS) team > > ------------------------------------------------------------ > ------------------------------------------------------ > If you feel that your question can benefit other users of Spectrum Scale > (GPFS), then please post it to the public IBM developerWroks Forum at > https://www.ibm.com/developerworks/community/ > forums/html/forum?id=11111111-0000-0000-0000-000000000479. > > If your query concerns a potential software error in Spectrum Scale (GPFS) > and you have an IBM software maintenance contract please contact > 1-800-237-5511 <(800)%20237-5511> in the United States or your local IBM > Service Center in other countries. > > The forum is informally monitored as time permits and should not be used > for priority messages to the Spectrum Scale (GPFS) team. > > > > From: "Oesterlin, Robert" > To: gpfsug main discussion list > Date: 06/12/2017 04:43 PM > Subject: Re: [gpfsug-discuss] Meaning of API Stats Category > Sent by: gpfsug-discuss-bounces at spectrumscale.org > ------------------------------ > > > > Hi Kristy > > What I *think* the difference is: > > gpfs_fis: - calls to the GPFS file system interface > gpfs_fs: calls from the node that actually make it to the NSD > server/metadata > > The difference being what?s served out of the local node pagepool. > > Bob Oesterlin > Sr Principal Storage Engineer, Nuance > > > *From: * on behalf of Kristy > Kallback-Rose > * Reply-To: *gpfsug main discussion list > > * Date: *Monday, June 12, 2017 at 3:17 PM > * To: *gpfsug main discussion list > * Subject: *[EXTERNAL] [gpfsug-discuss] Meaning of API Stats Category > > Hi, > > Can anyone provide more detail about what is meant by the following two > categories of stats? The PDG has a limited description as far as I could > see. I'm not sure what is meant by Application PoV. Would the Grafana > bridge count as an "application"? > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From sfadden at us.ibm.com Wed Jul 5 19:50:24 2017 From: sfadden at us.ibm.com (Scott Fadden) Date: Wed, 5 Jul 2017 18:50:24 +0000 Subject: [gpfsug-discuss] Zimon checks, ability to use a subset of checks In-Reply-To: <11A5144D-A5AF-4829-B7D4-4313F357C6CB@nuance.com> Message-ID: What do You mean by category? Node class, metric type or something else? On Jul 5, 2017, 10:01:33 AM, Robert.Oesterlin at nuance.com wrote: From: Robert.Oesterlin at nuance.com To: gpfsug-discuss at spectrumscale.org Cc: Date: Jul 5, 2017 10:01:33 AM Subject: Re: [gpfsug-discuss] Zimon checks, ability to use a subset of checks Count me in! Bob Oesterlin Sr Principal Storage Engineer, Nuance From: on behalf of Kristy Kallback-Rose Reply-To: gpfsug main discussion list Date: Wednesday, July 5, 2017 at 11:23 AM To: gpfsug main discussion list Subject: [EXTERNAL] [gpfsug-discuss] Zimon checks, ability to use a subset of checks I am thinking about submitting an RFE to request the ability to pick and choose checks within a category, but thought I'd float the idea here before I submit. Would others find value in this? -------------- next part -------------- An HTML attachment was scrubbed... URL: From sfadden at us.ibm.com Wed Jul 5 19:51:46 2017 From: sfadden at us.ibm.com (Scott Fadden) Date: Wed, 5 Jul 2017 18:51:46 +0000 Subject: [gpfsug-discuss] Zimon checks, ability to use a subset of checks In-Reply-To: Message-ID: Never mind just saw your earlier email On Jul 5, 2017, 11:50:24 AM, sfadden at us.ibm.com wrote: From: sfadden at us.ibm.com To: gpfsug-discuss at spectrumscale.org Cc: gpfsug-discuss at spectrumscale.org Date: Jul 5, 2017 11:50:24 AM Subject: Re: [gpfsug-discuss] Zimon checks, ability to use a subset of checks What do You mean by category? Node class, metric type or something else? On Jul 5, 2017, 10:01:33 AM, Robert.Oesterlin at nuance.com wrote: From: Robert.Oesterlin at nuance.com To: gpfsug-discuss at spectrumscale.org Cc: Date: Jul 5, 2017 10:01:33 AM Subject: Re: [gpfsug-discuss] Zimon checks, ability to use a subset of checks Count me in! Bob Oesterlin Sr Principal Storage Engineer, Nuance From: on behalf of Kristy Kallback-Rose Reply-To: gpfsug main discussion list Date: Wednesday, July 5, 2017 at 11:23 AM To: gpfsug main discussion list Subject: [EXTERNAL] [gpfsug-discuss] Zimon checks, ability to use a subset of checks I am thinking about submitting an RFE to request the ability to pick and choose checks within a category, but thought I'd float the idea here before I submit. Would others find value in this? -------------- next part -------------- An HTML attachment was scrubbed... URL: From scale at us.ibm.com Thu Jul 6 06:37:33 2017 From: scale at us.ibm.com (IBM Spectrum Scale) Date: Thu, 6 Jul 2017 11:07:33 +0530 Subject: [gpfsug-discuss] pmcollector node In-Reply-To: <390fca36-e49a-7255-1ecf-a467a6ee92a5@wustl.edu> References: <390fca36-e49a-7255-1ecf-a467a6ee92a5@wustl.edu> Message-ID: Hi Anna, Can you please check if you can answer this. Or else let me know who to contact for this. Regards, The Spectrum Scale (GPFS) team ------------------------------------------------------------------------------------------------------------------ If you feel that your question can benefit other users of Spectrum Scale (GPFS), then please post it to the public IBM developerWroks Forum at https://www.ibm.com/developerworks/community/forums/html/forum?id=11111111-0000-0000-0000-000000000479 . If your query concerns a potential software error in Spectrum Scale (GPFS) and you have an IBM software maintenance contract please contact 1-800-237-5511 in the United States or your local IBM Service Center in other countries. The forum is informally monitored as time permits and should not be used for priority messages to the Spectrum Scale (GPFS) team. From: Matt Weil To: gpfsug-discuss at spectrumscale.org Date: 07/05/2017 09:22 PM Subject: [gpfsug-discuss] pmcollector node Sent by: gpfsug-discuss-bounces at spectrumscale.org Hello all, Question on the requirements on pmcollector node/s for a 500+ node cluster. Is there a sizing guide? What specifics should we scale? CPU Disks memory? Thanks Matt _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From Wei1.Guo at UTSouthwestern.edu Thu Jul 6 18:49:32 2017 From: Wei1.Guo at UTSouthwestern.edu (Wei Guo) Date: Thu, 6 Jul 2017 17:49:32 +0000 Subject: [gpfsug-discuss] depmod: ERROR: fstatat(4, mmfs26.ko): No such file or directory Message-ID: <7304eaf93aa74265ae45a214288dfe4c@SWMS13MAIL10.swmed.org> Hi, All, We are testing to upgrade our clients to new RHEL 7.3 kernel with GPFS 4.2.1.0. When we have 3.10.0-514.26.2.el7, installing the gplbin has the following errors: # ./mmbuildgpl --build-package -v # cd /root/rpmbuild/RPMS/x86_64/ # rpm -ivh gpfs.gplbin-3.10.0-514.26.2.el7.x86_64-4.2.1-0.x86_64.rpm Running transaction Installing : gpfs.gplbin-3.10.0-514.26.2.el7.x86_64-4.2.1-0.x86_64 1/1 depmod: ERROR: fstatat(4, mmfs26.ko): No such file or directory depmod: ERROR: fstatat(4, mmfslinux.ko): No such file or directory depmod: ERROR: fstatat(4, tracedev.ko): No such file or directory depmod -a also show the three kernel extension not found. However, in the following directory, they are there. # pwd /lib/modules/3.10.0-514.26.2.el7.x86_64/extra # ls kernel mmfs26.ko mmfslinux.ko tracedev.ko The error does not show in a slightly older kernel -3.10.0-514.21.2 version. From https://www.ibm.com/support/knowledgecenter/en/STXKQY/gpfsclustersfaq.html Table 29, both versions should be supported. RHEL Distribution Latest Kernel Level Tested1 Minimum Kernel Level Required2 Minimum IBM Spectrum Scale Level Tested3 Minimum IBM Spectrum Scale Level Supported4 7.3 3.10.0-514 3.10.0-514 V4.1.1.11/V4.2.2.1 V4.1.1.11/V4.2.1.2 For technical reasons, this test node will not be added to production. A previous thread http://www.spectrumscale.org/pipermail/gpfsug-discuss/2016-April/001529.html indicated that this will be OK. However, it is better to get a clear conclusion before we update other client nodes. Shall we recompile the kernel? Thanks all. Wei Guo ________________________________ UT Southwestern Medical Center The future of medicine, today. -------------- next part -------------- An HTML attachment was scrubbed... URL: From S.J.Thompson at bham.ac.uk Thu Jul 6 18:52:44 2017 From: S.J.Thompson at bham.ac.uk (Simon Thompson (IT Research Support)) Date: Thu, 6 Jul 2017 17:52:44 +0000 Subject: [gpfsug-discuss] depmod: ERROR: fstatat(4, mmfs26.ko): No such file or directory In-Reply-To: <7304eaf93aa74265ae45a214288dfe4c@SWMS13MAIL10.swmed.org> References: <7304eaf93aa74265ae45a214288dfe4c@SWMS13MAIL10.swmed.org> Message-ID: Look in the kernel weak-updates directory, you will probably find some broken files in there. These come from things trying to update the kernel modules when you do the kernel upgrade. Just delete the three gpfs related ones and run depmod The safest way is to remove the gpfs.gplbin packages, then upgrade the kernel, reboot and add the new gpfs.gplbin packages for the new kernel. Simon ________________________________________ From: gpfsug-discuss-bounces at spectrumscale.org [gpfsug-discuss-bounces at spectrumscale.org] on behalf of Wei Guo [Wei1.Guo at UTSouthwestern.edu] Sent: 06 July 2017 18:49 To: gpfsug-discuss at spectrumscale.org Subject: [gpfsug-discuss] depmod: ERROR: fstatat(4, mmfs26.ko): No such file or directory Hi, All, We are testing to upgrade our clients to new RHEL 7.3 kernel with GPFS 4.2.1.0. When we have 3.10.0-514.26.2.el7, installing the gplbin has the following errors: # ./mmbuildgpl --build-package ?v # cd /root/rpmbuild/RPMS/x86_64/ # rpm -ivh gpfs.gplbin-3.10.0-514.26.2.el7.x86_64-4.2.1-0.x86_64.rpm Running transaction Installing : gpfs.gplbin-3.10.0-514.26.2.el7.x86_64-4.2.1-0.x86_64 1/1 depmod: ERROR: fstatat(4, mmfs26.ko): No such file or directory depmod: ERROR: fstatat(4, mmfslinux.ko): No such file or directory depmod: ERROR: fstatat(4, tracedev.ko): No such file or directory depmod -a also show the three kernel extension not found. However, in the following directory, they are there. # pwd /lib/modules/3.10.0-514.26.2.el7.x86_64/extra # ls kernel mmfs26.ko mmfslinux.ko tracedev.ko The error does not show in a slightly older kernel -3.10.0-514.21.2 version. From https://www.ibm.com/support/knowledgecenter/en/STXKQY/gpfsclustersfaq.html Table 29, both versions should be supported. RHEL Distribution Latest Kernel Level Tested1 Minimum Kernel Level Required2 Minimum IBM Spectrum Scale Level Tested3 Minimum IBM Spectrum Scale Level Supported4 7.3 3.10.0-514 3.10.0-514 V4.1.1.11/V4.2.2.1 V4.1.1.11/V4.2.1.2 For technical reasons, this test node will not be added to production. A previous thread http://www.spectrumscale.org/pipermail/gpfsug-discuss/2016-April/001529.html indicated that this will be OK. However, it is better to get a clear conclusion before we update other client nodes. Shall we recompile the kernel? Thanks all. Wei Guo ________________________________ UT Southwestern Medical Center The future of medicine, today. From abeattie at au1.ibm.com Thu Jul 6 06:07:07 2017 From: abeattie at au1.ibm.com (Andrew Beattie) Date: Thu, 6 Jul 2017 05:07:07 +0000 Subject: [gpfsug-discuss] Scale / Perfmon / Grafana - services running but no data Message-ID: An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: Image.14992893800360.png Type: image/png Size: 431718 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: Image.14992893800362.png Type: image/png Size: 1001127 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: Image.14993172756190.png Type: image/png Size: 381651 bytes Desc: not available URL: From neil.wilson at metoffice.gov.uk Fri Jul 7 10:18:40 2017 From: neil.wilson at metoffice.gov.uk (Wilson, Neil) Date: Fri, 7 Jul 2017 09:18:40 +0000 Subject: [gpfsug-discuss] Scale / Perfmon / Grafana - services running but no data In-Reply-To: References: Message-ID: Hi Andrew, Have you created new dashboards for GPFS? This shows you how to do it https://www.ibm.com/developerworks/community/wikis/home?lang=en#!/wiki/General%20Parallel%20File%20System%20(GPFS)/page/Creating%20Grafana%20dashboard Alternatively there are some predefined dashboards here https://www.ibm.com/developerworks/community/wikis/home?lang=en#!/wiki/General%20Parallel%20File%20System%20%28GPFS%29/page/Importing%20predefined%20Grafana%20dashboards that you can import and have a play around with? Regards Neil Neil Wilson Senior IT Practitioner Storage Team IT Services Met Office FitzRoy Road Exeter Devon EX1 3PB United Kingdom Tel: +44 (0)1392 885959 Email: neil.wilson at metoffice.gov.uk Website www.metoffice.gov.uk Our magazine Barometer is now available online at http://www.metoffice.gov.uk/barometer/ P Please consider the environment before printing this e-mail. Thank you. From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Andrew Beattie Sent: 06 July 2017 06:07 To: gpfsug-discuss at spectrumscale.org Subject: [gpfsug-discuss] Scale / Perfmon / Grafana - services running but no data Greetings, I'm currently setting up Grafana to interact with one of our Scale Clusters and i've followed the knowledge centre link in terms of setup. https://www.ibm.com/support/knowledgecenter/en/STXKQY_4.2.3/com.ibm.spectrum.scale.v4r23.doc/bl1adv_setuppmbridgeforgrafana.htm However while everything appears to be working i'm not seeing any data coming through the reports within the grafana server, even though I can see data in the Scale GUI The current environment: [root at sc01n02 ~]# mmlscluster GPFS cluster information ======================== GPFS cluster name: sc01.spectrum GPFS cluster id: 18085710661892594990 GPFS UID domain: sc01.spectrum Remote shell command: /usr/bin/ssh Remote file copy command: /usr/bin/scp Repository type: CCR Node Daemon node name IP address Admin node name Designation ------------------------------------------------------------------ 1 sc01n01 10.2.12.11 sc01n01 quorum-manager-perfmon 2 sc01n02 10.2.12.12 sc01n02 quorum-manager-perfmon 3 sc01n03 10.2.12.13 sc01n03 quorum-manager-perfmon [root at sc01n02 ~]# [root at sc01n02 ~]# mmlsconfig Configuration data for cluster sc01.spectrum: --------------------------------------------- clusterName sc01.spectrum clusterId 18085710661892594990 autoload yes profile gpfsProtocolDefaults dmapiFileHandleSize 32 minReleaseLevel 4.2.2.0 ccrEnabled yes cipherList AUTHONLY maxblocksize 16M [cesNodes] maxMBpS 5000 numaMemoryInterleave yes enforceFilesetQuotaOnRoot yes workerThreads 512 [common] tscCmdPortRange 60000-61000 cesSharedRoot /ibm/cesSharedRoot/ces cifsBypassTraversalChecking yes syncSambaMetadataOps yes cifsBypassShareLocksOnRename yes adminMode central File systems in cluster sc01.spectrum: -------------------------------------- /dev/cesSharedRoot /dev/icos_demo /dev/scale01 [root at sc01n02 ~]# [root at sc01n02 ~]# systemctl status pmcollector ? pmcollector.service - LSB: Start the ZIMon performance monitor collector. Loaded: loaded (/etc/rc.d/init.d/pmcollector) Active: active (running) since Tue 2017-05-30 08:46:32 AEST; 1 months 6 days ago Docs: man:systemd-sysv-generator(8) Main PID: 2693 (ZIMonCollector) CGroup: /system.slice/pmcollector.service ??2693 /opt/IBM/zimon/ZIMonCollector -C /opt/IBM/zimon/ZIMonCollector.cfg... ??2698 python /opt/IBM/zimon/bin/pmmonitor.py -f /opt/IBM/zimon/syshealth... May 30 08:46:32 sc01n02 systemd[1]: Starting LSB: Start the ZIMon performance mon...... May 30 08:46:32 sc01n02 pmcollector[2584]: Starting performance monitor collector... May 30 08:46:32 sc01n02 systemd[1]: Started LSB: Start the ZIMon performance moni...r.. Hint: Some lines were ellipsized, use -l to show in full. From Grafana Server: [cid:image002.jpg at 01D2F70A.17F595F0] when I send a set of files to the cluster (3.8GB) I can see performance metrics within the Scale GUI [cid:image004.jpg at 01D2F70A.17F595F0] yet from the Grafana Dashboard im not seeing any data points [cid:image006.jpg at 01D2F70A.17F595F0] Can anyone provide some hints as to what might be happening? Regards, Andrew Beattie Software Defined Storage - IT Specialist Phone: 614-2133-7927 E-mail: abeattie at au1.ibm.com -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image002.jpg Type: image/jpeg Size: 14522 bytes Desc: image002.jpg URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image004.jpg Type: image/jpeg Size: 60060 bytes Desc: image004.jpg URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image006.jpg Type: image/jpeg Size: 25781 bytes Desc: image006.jpg URL: From olaf.weiser at de.ibm.com Fri Jul 7 10:18:13 2017 From: olaf.weiser at de.ibm.com (Olaf Weiser) Date: Fri, 7 Jul 2017 09:18:13 +0000 Subject: [gpfsug-discuss] Scale / Perfmon / Grafana - services running but no data In-Reply-To: References: Message-ID: An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/png Size: 431718 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/png Size: 1001127 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/png Size: 381651 bytes Desc: not available URL: From r.sobey at imperial.ac.uk Fri Jul 7 13:01:39 2017 From: r.sobey at imperial.ac.uk (Sobey, Richard A) Date: Fri, 7 Jul 2017 12:01:39 +0000 Subject: [gpfsug-discuss] Requirement to keep gpfs.smb the same on all nodes In-Reply-To: References: Message-ID: Just following up on this, has anyone successfully deployed Protocols (SMB) on RHEL 7.3 with the 4.2.3-2 packages? Richard From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Sobey, Richard A Sent: 04 July 2017 12:12 To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] Requirement to keep gpfs.smb the same on all nodes OK Simon, thanks. I suppose we're all in the same boat having to get change management approval etc! From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Simon Thompson (IT Research Support) Sent: 04 July 2017 12:09 To: gpfsug main discussion list > Subject: Re: [gpfsug-discuss] Requirement to keep gpfs.smb the same on all nodes AFAIK. Always. We have had the service eat itself BTW by having different code releases and trying this. Yes its a PITA that we have to get a change approval for it (so we don't do it as often as we should)... The upgrade process upgrades the SMB registry, we have also seen the CTDB lock stuff break when they are not running the same code release, so now we just don't do this. Simon From: > on behalf of "Sobey, Richard A" > Reply-To: "gpfsug-discuss at spectrumscale.org" > Date: Tuesday, 4 July 2017 at 11:54 To: "gpfsug-discuss at spectrumscale.org" > Subject: [gpfsug-discuss] Requirement to keep gpfs.smb the same on all nodes Hi all, For how long has this requirement been in force, and why? https://www.ibm.com/support/knowledgecenter/en/STXKQY_4.2.3/com.ibm.spectrum.scale.v4r23.doc/bl1ins_updatingsmb.htm All protocol nodes running the SMB service must have the same version of gpfs.smb installed at any time. This requires a brief outage of the SMB service to upgrade gpfs.smb to the newer version across all protocol nodes. The procedure outlined here is intended to reduce the outage to a minimum. Previously I've upgraded nodes one at a time over the course of a few days. Is the impact just that we won't be supported, or will a hole open up beneath my feet and swallow me whole? I really don't fancy the headache of getting approvals to get an outage of even 5 minutes at 6am.... Cheers Richard -------------- next part -------------- An HTML attachment was scrubbed... URL: From kkr at lbl.gov Fri Jul 7 23:32:40 2017 From: kkr at lbl.gov (Kristy Kallback-Rose) Date: Fri, 7 Jul 2017 15:32:40 -0700 Subject: [gpfsug-discuss] Hold the Date - Spectrum Scale Day @ HPCXXL (Sept 2017, NYC) Message-ID: Hello, More details will be provided as they become available, but just so you can make a placeholder on your calendar, there will be a Spectrum Scale Day the week of September 25th - 29th, likely Thursday, September 28th. This will be a part of the larger HPCXXL meeting (https://www.spxxl.org/?q=New-York-City-2017 ). You may recall this group was formerly called SPXXL and the website is in the process of transitioning to the new name (and getting a new certificate). You will be able to attend *just* the Spectrum Scale day if that is the only portion of the event you would like to attend. The NYC location is great for Spectrum Scale events because many IBMers, including developers, can come in from Poughkeepsie. More as we get closer to the date and details are settled. Cheers, Kristy -------------- next part -------------- An HTML attachment was scrubbed... URL: From a.khiredine at meteo.dz Sun Jul 9 08:26:44 2017 From: a.khiredine at meteo.dz (Atmane) Date: Sun, 9 Jul 2017 08:26:44 +0100 Subject: [gpfsug-discuss] GPFS Storage Server (GSS) Message-ID: From a.khiredine at meteo.dz Sun Jul 9 09:00:07 2017 From: a.khiredine at meteo.dz (Atmane) Date: Sun, 9 Jul 2017 09:00:07 +0100 Subject: [gpfsug-discuss] get free space in GSS Message-ID: Dear all, My name is Khiredine Atmane and I am a HPC system administrator at the National Office of Meteorology Algeria . We have a GSS24 running gss2.5.10.3-3b and gpfs-4.2.0.3. GSS configuration: 4 enclosures, 6 SSDs, 1 empty slots, 239 disks total, 0 NVRAM partitions disks = 3Tb SSD = 200 Gb df -h Filesystem Size Used Avail Use% Mounted on /dev/gpfs1 49T 18T 31T 38% /gpfs1 /dev/gpfs2 53T 13T 40T 25% /gpfs2 /dev/gpfs3 25T 4.9T 20T 21% /gpfs3 /dev/gpfs4 11T 133M 11T 1% /gpfs4 /dev/gpfs5 323T 34T 290T 11% /gpfs5 Total Is 461 To I think we have more space Could anyone make recommendation to troubleshoot find how many free space in GSS ? How to find the available space ? Thank you! Atmane -- Atmane Khiredine HPC System Admin | Office National de la M?t?orologie T?l : +213 21 50 73 93 Poste 303 | Fax : +213 21 50 79 40 | E-mail : a.khiredine at meteo.dz From laurence at qsplace.co.uk Sun Jul 9 09:58:05 2017 From: laurence at qsplace.co.uk (Laurence Horrocks-Barlow) Date: Sun, 09 Jul 2017 09:58:05 +0100 Subject: [gpfsug-discuss] get free space in GSS In-Reply-To: References: Message-ID: You can check the recovery groups to see if there is any remaining space. I don't have access to my test system to confirm the syntax however if memory serves. Run mmlsrecoverygroup to get a list of all the recovery groups then: mmlsrecoverygroup -L This will list all your declustered arrays and their free space. Their might be another method, however this way has always worked well for me. -- Lauz On 9 July 2017 09:00:07 BST, Atmane wrote: >Dear all, > >My name is Khiredine Atmane and I am a HPC system administrator at the > >National Office of Meteorology Algeria . We have a GSS24 running >gss2.5.10.3-3b and gpfs-4.2.0.3. > >GSS configuration: 4 enclosures, 6 SSDs, 1 empty slots, 239 disks >total, 0 >NVRAM partitions > >disks = 3Tb >SSD = 200 Gb >df -h >Filesystem Size Used Avail Use% Mounted on > >/dev/gpfs1 49T 18T 31T 38% /gpfs1 >/dev/gpfs2 53T 13T 40T 25% /gpfs2 >/dev/gpfs3 25T 4.9T 20T 21% /gpfs3 >/dev/gpfs4 11T 133M 11T 1% /gpfs4 >/dev/gpfs5 323T 34T 290T 11% /gpfs5 > >Total Is 461 To > >I think we have more space >Could anyone make recommendation to troubleshoot find how many free >space >in GSS ? >How to find the available space ? >Thank you! > >Atmane > > > >-- >Atmane Khiredine >HPC System Admin | Office National de la M?t?orologie >T?l : +213 21 50 73 93 Poste 303 | Fax : +213 21 50 79 40 | E-mail : >a.khiredine at meteo.dz >_______________________________________________ >gpfsug-discuss mailing list >gpfsug-discuss at spectrumscale.org >http://gpfsug.org/mailman/listinfo/gpfsug-discuss -- Sent from my Android device with K-9 Mail. Please excuse my brevity. -------------- next part -------------- An HTML attachment was scrubbed... URL: From a.khiredine at meteo.dz Sun Jul 9 13:26:26 2017 From: a.khiredine at meteo.dz (atmane khiredine) Date: Sun, 9 Jul 2017 12:26:26 +0000 Subject: [gpfsug-discuss] get free space in GSS In-Reply-To: References: , Message-ID: <4B32CB5C696F2849BDEF7DF9EACE884B5B41A465@SDEB-EXC02.meteo.dz> thank you very much for replying. I can not find the free space Here is the output of mmlsrecoverygroup [root at server1 ~]#mmlsrecoverygroup declustered arrays with recovery group vdisks vdisks servers ------------------ ----------- ------ ------- BB1RGL 3 18 server1,server2 BB1RGR 3 18 server2,server1 -------------------------------------------------------------- [root at server ~]# mmlsrecoverygroup BB1RGL -L declustered recovery group arrays vdisks pdisks format version ----------------- ----------- ------ ------ -------------- BB1RGL 3 18 119 4.2.0.1 declustered needs replace scrub background activity array service vdisks pdisks spares threshold free space duration task progress priority ----------- ------- ------ ------ ------ --------- ---------- -------- ------------------------- LOG no 1 3 0,0 1 558 GiB 14 days scrub 51% low DA1 no 11 58 2,31 2 12 GiB 14 days scrub 78% low DA2 no 6 58 2,31 2 4096 MiB 14 days scrub 10% low declustered checksum vdisk RAID code array vdisk size block size granularity state remarks ------------------ ------------------ ----------- ---------- ---------- ----------- ----- ------- gss0_logtip 3WayReplication LOG 128 MiB 1 MiB 512 ok logTip gss0_loghome 4WayReplication DA1 40 GiB 1 MiB 512 ok log BB1RGL_GPFS4_META1 4WayReplication DA1 451 GiB 1 MiB 32 KiB ok BB1RGL_GPFS4_DATA1 8+2p DA1 5133 GiB 1 MiB 32 KiB ok BB1RGL_GPFS1_META1 4WayReplication DA1 451 GiB 1 MiB 32 KiB ok BB1RGL_GPFS1_DATA1 8+2p DA1 12 TiB 1 MiB 32 KiB ok BB1RGL_GPFS3_META1 4WayReplication DA1 451 GiB 1 MiB 32 KiB ok BB1RGL_GPFS3_DATA1 8+2p DA1 12 TiB 1 MiB 32 KiB ok BB1RGL_GPFS2_META1 4WayReplication DA1 451 GiB 1 MiB 32 KiB ok BB1RGL_GPFS2_DATA1 8+2p DA1 13 TiB 2 MiB 32 KiB ok BB1RGL_GPFS2_META2 4WayReplication DA2 451 GiB 1 MiB 32 KiB ok BB1RGL_GPFS2_DATA2 8+2p DA2 13 TiB 2 MiB 32 KiB ok BB1RGL_GPFS1_META2 4WayReplication DA2 451 GiB 1 MiB 32 KiB ok BB1RGL_GPFS1_DATA2 8+2p DA2 12 TiB 1 MiB 32 KiB ok BB1RGL_GPFS5_META1 4WayReplication DA1 750 GiB 1 MiB 32 KiB ok BB1RGL_GPFS5_DATA1 8+2p DA1 70 TiB 16 MiB 32 KiB ok BB1RGL_GPFS5_META2 4WayReplication DA2 750 GiB 1 MiB 32 KiB ok BB1RGL_GPFS5_DATA2 8+2p DA2 90 TiB 16 MiB 32 KiB ok config data declustered array VCD spares actual rebuild spare space remarks ------------------ ------------------ ------------- --------------------------------- ---------------- rebuild space DA1 31 34 pdisk rebuild space DA2 31 35 pdisk config data max disk group fault tolerance actual disk group fault tolerance remarks ------------------ --------------------------------- --------------------------------- ---------------- rg descriptor 1 enclosure + 1 drawer 1 enclosure + 1 drawer limiting fault tolerance system index 2 enclosure 1 enclosure + 1 drawer limited by rg descriptor vdisk max disk group fault tolerance actual disk group fault tolerance remarks ------------------ --------------------------------- --------------------------------- ---------------- gss0_logtip 2 enclosure 1 enclosure + 1 drawer limited by rg descriptor gss0_loghome 1 enclosure + 1 drawer 1 enclosure + 1 drawer BB1RGL_GPFS4_META1 1 enclosure + 1 drawer 1 enclosure + 1 drawer BB1RGL_GPFS4_DATA1 2 drawer 2 drawer BB1RGL_GPFS1_META1 1 enclosure + 1 drawer 1 enclosure + 1 drawer BB1RGL_GPFS1_DATA1 2 drawer 2 drawer BB1RGL_GPFS3_META1 1 enclosure + 1 drawer 1 enclosure + 1 drawer BB1RGL_GPFS3_DATA1 2 drawer 2 drawer BB1RGL_GPFS2_META1 1 enclosure + 1 drawer 1 enclosure + 1 drawer BB1RGL_GPFS2_DATA1 2 drawer 2 drawer BB1RGL_GPFS2_META2 3 enclosure 1 enclosure + 1 drawer limited by rg descriptor BB1RGL_GPFS2_DATA2 2 drawer 2 drawer BB1RGL_GPFS1_META2 3 enclosure 1 enclosure + 1 drawer limited by rg descriptor BB1RGL_GPFS1_DATA2 2 drawer 2 drawer BB1RGL_GPFS5_META1 1 enclosure + 1 drawer 1 enclosure + 1 drawer BB1RGL_GPFS5_DATA1 2 drawer 2 drawer BB1RGL_GPFS5_META2 3 enclosure 1 enclosure + 1 drawer limited by rg descriptor BB1RGL_GPFS5_DATA2 2 drawer 2 drawer active recovery group server servers ----------------------------------------------- ------- server1 server1,server2 Atmane Khiredine HPC System Administrator | Office National de la M?t?orologie T?l : +213 21 50 73 93 # 303 | Fax : +213 21 50 79 40 | E-mail : a.khiredine at meteo.dz ________________________________ De : Laurence Horrocks-Barlow [laurence at qsplace.co.uk] Envoy? : dimanche 9 juillet 2017 09:58 ? : gpfsug main discussion list; atmane khiredine; gpfsug-discuss at spectrumscale.org Objet : Re: [gpfsug-discuss] get free space in GSS You can check the recovery groups to see if there is any remaining space. I don't have access to my test system to confirm the syntax however if memory serves. Run mmlsrecoverygroup to get a list of all the recovery groups then: mmlsrecoverygroup -L This will list all your declustered arrays and their free space. Their might be another method, however this way has always worked well for me. -- Lauz On 9 July 2017 09:00:07 BST, Atmane wrote: Dear all, My name is Khiredine Atmane and I am a HPC system administrator at the National Office of Meteorology Algeria . We have a GSS24 running gss2.5.10.3-3b and gpfs-4.2.0.3. GSS configuration: 4 enclosures, 6 SSDs, 1 empty slots, 239 disks total, 0 NVRAM partitions disks = 3Tb SSD = 200 Gb df -h Filesystem Size Used Avail Use% Mounted on /dev/gpfs1 49T 18T 31T 38% /gpfs1 /dev/gpfs2 53T 13T 40T 25% /gpfs2 /dev/gpfs3 25T 4.9T 20T 21% /gpfs3 /dev/gpfs4 11T 133M 11T 1% /gpfs4 /dev/gpfs5 323T 34T 290T 11% /gpfs5 Total Is 461 To I think we have more space Could anyone make recommendation to troubleshoot find how many free space in GSS ? How to find the available space ? Thank you! Atmane -- Sent from my Android device with K-9 Mail. Please excuse my brevity. From janfrode at tanso.net Sun Jul 9 17:45:32 2017 From: janfrode at tanso.net (Jan-Frode Myklebust) Date: Sun, 09 Jul 2017 16:45:32 +0000 Subject: [gpfsug-discuss] get free space in GSS In-Reply-To: <4B32CB5C696F2849BDEF7DF9EACE884B5B41A465@SDEB-EXC02.meteo.dz> References: <4B32CB5C696F2849BDEF7DF9EACE884B5B41A465@SDEB-EXC02.meteo.dz> Message-ID: You had it here: [root at server ~]# mmlsrecoverygroup BB1RGL -L declustered recovery group arrays vdisks pdisks format version ----------------- ----------- ------ ------ -------------- BB1RGL 3 18 119 4.2.0.1 declustered needs replace scrub background activity array service vdisks pdisks spares threshold free space duration task progress priority ----------- ------- ------ ------ ------ --------- ---------- -------- ------------------------- LOG no 1 3 0,0 1 558 GiB 14 days scrub 51% low DA1 no 11 58 2,31 2 12 GiB 14 days scrub 78% low DA2 no 6 58 2,31 2 4096 MiB 14 days scrub 10% low 12 GiB in DA1, and 4096 MiB i DA2, but effectively you'll get less when you add a raidCode to the vdisk. Best way to use it id to just don't specify a size to the vdisk, and max possible size will be used. -jf s?n. 9. jul. 2017 kl. 14.26 skrev atmane khiredine : > thank you very much for replying. I can not find the free space > > Here is the output of mmlsrecoverygroup > > [root at server1 ~]#mmlsrecoverygroup > > declustered > arrays with > recovery group vdisks vdisks servers > ------------------ ----------- ------ ------- > BB1RGL 3 18 server1,server2 > BB1RGR 3 18 server2,server1 > -------------------------------------------------------------- > [root at server ~]# mmlsrecoverygroup BB1RGL -L > > declustered > recovery group arrays vdisks pdisks format version > ----------------- ----------- ------ ------ -------------- > BB1RGL 3 18 119 4.2.0.1 > > declustered needs replace > scrub background activity > array service vdisks pdisks spares threshold free space > duration task progress priority > ----------- ------- ------ ------ ------ --------- ---------- > -------- ------------------------- > LOG no 1 3 0,0 1 558 GiB 14 > days scrub 51% low > DA1 no 11 58 2,31 2 12 GiB 14 > days scrub 78% low > DA2 no 6 58 2,31 2 4096 MiB 14 > days scrub 10% low > > declustered > checksum > vdisk RAID code array vdisk size block > size granularity state remarks > ------------------ ------------------ ----------- ---------- > ---------- ----------- ----- ------- > gss0_logtip 3WayReplication LOG 128 MiB 1 > MiB 512 ok logTip > gss0_loghome 4WayReplication DA1 40 GiB 1 > MiB 512 ok log > BB1RGL_GPFS4_META1 4WayReplication DA1 451 GiB 1 > MiB 32 KiB ok > BB1RGL_GPFS4_DATA1 8+2p DA1 5133 GiB 1 > MiB 32 KiB ok > BB1RGL_GPFS1_META1 4WayReplication DA1 451 GiB 1 > MiB 32 KiB ok > BB1RGL_GPFS1_DATA1 8+2p DA1 12 TiB 1 > MiB 32 KiB ok > BB1RGL_GPFS3_META1 4WayReplication DA1 451 GiB 1 > MiB 32 KiB ok > BB1RGL_GPFS3_DATA1 8+2p DA1 12 TiB 1 > MiB 32 KiB ok > BB1RGL_GPFS2_META1 4WayReplication DA1 451 GiB 1 > MiB 32 KiB ok > BB1RGL_GPFS2_DATA1 8+2p DA1 13 TiB 2 > MiB 32 KiB ok > BB1RGL_GPFS2_META2 4WayReplication DA2 451 GiB 1 > MiB 32 KiB ok > BB1RGL_GPFS2_DATA2 8+2p DA2 13 TiB 2 > MiB 32 KiB ok > BB1RGL_GPFS1_META2 4WayReplication DA2 451 GiB 1 > MiB 32 KiB ok > BB1RGL_GPFS1_DATA2 8+2p DA2 12 TiB 1 > MiB 32 KiB ok > BB1RGL_GPFS5_META1 4WayReplication DA1 750 GiB 1 > MiB 32 KiB ok > BB1RGL_GPFS5_DATA1 8+2p DA1 70 TiB 16 > MiB 32 KiB ok > BB1RGL_GPFS5_META2 4WayReplication DA2 750 GiB 1 > MiB 32 KiB ok > BB1RGL_GPFS5_DATA2 8+2p DA2 90 TiB 16 > MiB 32 KiB ok > > config data declustered array VCD spares actual rebuild > spare space remarks > ------------------ ------------------ ------------- > --------------------------------- ---------------- > rebuild space DA1 31 34 pdisk > rebuild space DA2 31 35 pdisk > > > config data max disk group fault tolerance actual disk group > fault tolerance remarks > ------------------ --------------------------------- > --------------------------------- ---------------- > rg descriptor 1 enclosure + 1 drawer 1 enclosure + 1 > drawer limiting fault tolerance > system index 2 enclosure 1 enclosure + 1 > drawer limited by rg descriptor > > vdisk max disk group fault tolerance actual disk group > fault tolerance remarks > ------------------ --------------------------------- > --------------------------------- ---------------- > gss0_logtip 2 enclosure 1 enclosure + 1 > drawer limited by rg descriptor > gss0_loghome 1 enclosure + 1 drawer 1 enclosure + 1 > drawer > BB1RGL_GPFS4_META1 1 enclosure + 1 drawer 1 enclosure + 1 > drawer > BB1RGL_GPFS4_DATA1 2 drawer 2 drawer > BB1RGL_GPFS1_META1 1 enclosure + 1 drawer 1 enclosure + 1 > drawer > BB1RGL_GPFS1_DATA1 2 drawer 2 drawer > BB1RGL_GPFS3_META1 1 enclosure + 1 drawer 1 enclosure + 1 > drawer > BB1RGL_GPFS3_DATA1 2 drawer 2 drawer > BB1RGL_GPFS2_META1 1 enclosure + 1 drawer 1 enclosure + 1 > drawer > BB1RGL_GPFS2_DATA1 2 drawer 2 drawer > BB1RGL_GPFS2_META2 3 enclosure 1 enclosure + 1 > drawer limited by rg descriptor > BB1RGL_GPFS2_DATA2 2 drawer 2 drawer > BB1RGL_GPFS1_META2 3 enclosure 1 enclosure + 1 > drawer limited by rg descriptor > BB1RGL_GPFS1_DATA2 2 drawer 2 drawer > BB1RGL_GPFS5_META1 1 enclosure + 1 drawer 1 enclosure + 1 > drawer > BB1RGL_GPFS5_DATA1 2 drawer 2 drawer > BB1RGL_GPFS5_META2 3 enclosure 1 enclosure + 1 > drawer limited by rg descriptor > BB1RGL_GPFS5_DATA2 2 drawer 2 drawer > > active recovery group server servers > ----------------------------------------------- ------- > server1 server1,server2 > > > Atmane Khiredine > HPC System Administrator | Office National de la M?t?orologie > T?l : +213 21 50 73 93 # 303 | Fax : +213 21 50 79 40 | E-mail : > a.khiredine at meteo.dz > ________________________________ > De : Laurence Horrocks-Barlow [laurence at qsplace.co.uk] > Envoy? : dimanche 9 juillet 2017 09:58 > ? : gpfsug main discussion list; atmane khiredine; > gpfsug-discuss at spectrumscale.org > Objet : Re: [gpfsug-discuss] get free space in GSS > > You can check the recovery groups to see if there is any remaining space. > > I don't have access to my test system to confirm the syntax however if > memory serves. > > Run mmlsrecoverygroup to get a list of all the recovery groups then: > > mmlsrecoverygroup -L > > This will list all your declustered arrays and their free space. > > Their might be another method, however this way has always worked well for > me. > > -- Lauz > > > > On 9 July 2017 09:00:07 BST, Atmane wrote: > > Dear all, > > My name is Khiredine Atmane and I am a HPC system administrator at the > National Office of Meteorology Algeria . We have a GSS24 running > gss2.5.10.3-3b and gpfs-4.2.0.3. > > GSS configuration: 4 enclosures, 6 SSDs, 1 empty slots, 239 disks total, 0 > NVRAM partitions > > disks = 3Tb > SSD = 200 Gb > df -h > Filesystem Size Used Avail Use% Mounted on > > /dev/gpfs1 49T 18T 31T 38% /gpfs1 > /dev/gpfs2 53T 13T 40T 25% /gpfs2 > /dev/gpfs3 25T 4.9T 20T 21% /gpfs3 > /dev/gpfs4 11T 133M 11T 1% /gpfs4 > /dev/gpfs5 323T 34T 290T 11% /gpfs5 > > Total Is 461 To > > I think we have more space > Could anyone make recommendation to troubleshoot find how many free space > in GSS ? > How to find the available space ? > Thank you! > > Atmane > > > > -- > Sent from my Android device with K-9 Mail. Please excuse my brevity. > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: From kums at us.ibm.com Sun Jul 9 17:52:02 2017 From: kums at us.ibm.com (Kumaran Rajaram) Date: Sun, 9 Jul 2017 12:52:02 -0400 Subject: [gpfsug-discuss] get free space in GSS In-Reply-To: <4B32CB5C696F2849BDEF7DF9EACE884B5B41A465@SDEB-EXC02.meteo.dz> References: , <4B32CB5C696F2849BDEF7DF9EACE884B5B41A465@SDEB-EXC02.meteo.dz> Message-ID: Hi Atmane, >> I can not find the free space Based on your output below, your setup currently has two recovery groups BB1RGL and BB1RGR. Issue "mmlsrecoverygroup BB1RGL -L" and "mmlsrecoverygroup BB1RGR -L" to obtain free space in each DA. Based on your "mmlsrecoverygroup BB1RGL -L" output below, BB1RGL "DA1" has 12GiB and "DA2" has 4GiB free space. The metadataOnly and dataOnly vdisk/NSD are created from DA1 and DA2. declustered needs replace scrub background activity array service vdisks pdisks spares threshold free space duration task progress priority ----------- ------- ------ ------ ------ --------- ---------- -------- ------------------------- LOG no 1 3 0,0 1 558 GiB 14 days scrub 51% low DA1 no 11 58 2,31 2 12 GiB 14 days scrub 78% low DA2 no 6 58 2,31 2 4096 MiB 14 days scrub 10% low In addition, you may use "mmlsnsd" to obtain mapping of file-system to vdisk/NSD + use "mmdf " command to query user or available capacity on a GPFS file system. Hope this helps, -Kums From: atmane khiredine To: Laurence Horrocks-Barlow , "gpfsug main discussion list" Date: 07/09/2017 08:27 AM Subject: Re: [gpfsug-discuss] get free space in GSS Sent by: gpfsug-discuss-bounces at spectrumscale.org thank you very much for replying. I can not find the free space Here is the output of mmlsrecoverygroup [root at server1 ~]#mmlsrecoverygroup declustered arrays with recovery group vdisks vdisks servers ------------------ ----------- ------ ------- BB1RGL 3 18 server1,server2 BB1RGR 3 18 server2,server1 -------------------------------------------------------------- [root at server ~]# mmlsrecoverygroup BB1RGL -L declustered recovery group arrays vdisks pdisks format version ----------------- ----------- ------ ------ -------------- BB1RGL 3 18 119 4.2.0.1 declustered needs replace scrub background activity array service vdisks pdisks spares threshold free space duration task progress priority ----------- ------- ------ ------ ------ --------- ---------- -------- ------------------------- LOG no 1 3 0,0 1 558 GiB 14 days scrub 51% low DA1 no 11 58 2,31 2 12 GiB 14 days scrub 78% low DA2 no 6 58 2,31 2 4096 MiB 14 days scrub 10% low declustered checksum vdisk RAID code array vdisk size block size granularity state remarks ------------------ ------------------ ----------- ---------- ---------- ----------- ----- ------- gss0_logtip 3WayReplication LOG 128 MiB 1 MiB 512 ok logTip gss0_loghome 4WayReplication DA1 40 GiB 1 MiB 512 ok log BB1RGL_GPFS4_META1 4WayReplication DA1 451 GiB 1 MiB 32 KiB ok BB1RGL_GPFS4_DATA1 8+2p DA1 5133 GiB 1 MiB 32 KiB ok BB1RGL_GPFS1_META1 4WayReplication DA1 451 GiB 1 MiB 32 KiB ok BB1RGL_GPFS1_DATA1 8+2p DA1 12 TiB 1 MiB 32 KiB ok BB1RGL_GPFS3_META1 4WayReplication DA1 451 GiB 1 MiB 32 KiB ok BB1RGL_GPFS3_DATA1 8+2p DA1 12 TiB 1 MiB 32 KiB ok BB1RGL_GPFS2_META1 4WayReplication DA1 451 GiB 1 MiB 32 KiB ok BB1RGL_GPFS2_DATA1 8+2p DA1 13 TiB 2 MiB 32 KiB ok BB1RGL_GPFS2_META2 4WayReplication DA2 451 GiB 1 MiB 32 KiB ok BB1RGL_GPFS2_DATA2 8+2p DA2 13 TiB 2 MiB 32 KiB ok BB1RGL_GPFS1_META2 4WayReplication DA2 451 GiB 1 MiB 32 KiB ok BB1RGL_GPFS1_DATA2 8+2p DA2 12 TiB 1 MiB 32 KiB ok BB1RGL_GPFS5_META1 4WayReplication DA1 750 GiB 1 MiB 32 KiB ok BB1RGL_GPFS5_DATA1 8+2p DA1 70 TiB 16 MiB 32 KiB ok BB1RGL_GPFS5_META2 4WayReplication DA2 750 GiB 1 MiB 32 KiB ok BB1RGL_GPFS5_DATA2 8+2p DA2 90 TiB 16 MiB 32 KiB ok config data declustered array VCD spares actual rebuild spare space remarks ------------------ ------------------ ------------- --------------------------------- ---------------- rebuild space DA1 31 34 pdisk rebuild space DA2 31 35 pdisk config data max disk group fault tolerance actual disk group fault tolerance remarks ------------------ --------------------------------- --------------------------------- ---------------- rg descriptor 1 enclosure + 1 drawer 1 enclosure + 1 drawer limiting fault tolerance system index 2 enclosure 1 enclosure + 1 drawer limited by rg descriptor vdisk max disk group fault tolerance actual disk group fault tolerance remarks ------------------ --------------------------------- --------------------------------- ---------------- gss0_logtip 2 enclosure 1 enclosure + 1 drawer limited by rg descriptor gss0_loghome 1 enclosure + 1 drawer 1 enclosure + 1 drawer BB1RGL_GPFS4_META1 1 enclosure + 1 drawer 1 enclosure + 1 drawer BB1RGL_GPFS4_DATA1 2 drawer 2 drawer BB1RGL_GPFS1_META1 1 enclosure + 1 drawer 1 enclosure + 1 drawer BB1RGL_GPFS1_DATA1 2 drawer 2 drawer BB1RGL_GPFS3_META1 1 enclosure + 1 drawer 1 enclosure + 1 drawer BB1RGL_GPFS3_DATA1 2 drawer 2 drawer BB1RGL_GPFS2_META1 1 enclosure + 1 drawer 1 enclosure + 1 drawer BB1RGL_GPFS2_DATA1 2 drawer 2 drawer BB1RGL_GPFS2_META2 3 enclosure 1 enclosure + 1 drawer limited by rg descriptor BB1RGL_GPFS2_DATA2 2 drawer 2 drawer BB1RGL_GPFS1_META2 3 enclosure 1 enclosure + 1 drawer limited by rg descriptor BB1RGL_GPFS1_DATA2 2 drawer 2 drawer BB1RGL_GPFS5_META1 1 enclosure + 1 drawer 1 enclosure + 1 drawer BB1RGL_GPFS5_DATA1 2 drawer 2 drawer BB1RGL_GPFS5_META2 3 enclosure 1 enclosure + 1 drawer limited by rg descriptor BB1RGL_GPFS5_DATA2 2 drawer 2 drawer active recovery group server servers ----------------------------------------------- ------- server1 server1,server2 Atmane Khiredine HPC System Administrator | Office National de la M?t?orologie T?l : +213 21 50 73 93 # 303 | Fax : +213 21 50 79 40 | E-mail : a.khiredine at meteo.dz ________________________________ De : Laurence Horrocks-Barlow [laurence at qsplace.co.uk] Envoy? : dimanche 9 juillet 2017 09:58 ? : gpfsug main discussion list; atmane khiredine; gpfsug-discuss at spectrumscale.org Objet : Re: [gpfsug-discuss] get free space in GSS You can check the recovery groups to see if there is any remaining space. I don't have access to my test system to confirm the syntax however if memory serves. Run mmlsrecoverygroup to get a list of all the recovery groups then: mmlsrecoverygroup -L This will list all your declustered arrays and their free space. Their might be another method, however this way has always worked well for me. -- Lauz On 9 July 2017 09:00:07 BST, Atmane wrote: Dear all, My name is Khiredine Atmane and I am a HPC system administrator at the National Office of Meteorology Algeria . We have a GSS24 running gss2.5.10.3-3b and gpfs-4.2.0.3. GSS configuration: 4 enclosures, 6 SSDs, 1 empty slots, 239 disks total, 0 NVRAM partitions disks = 3Tb SSD = 200 Gb df -h Filesystem Size Used Avail Use% Mounted on /dev/gpfs1 49T 18T 31T 38% /gpfs1 /dev/gpfs2 53T 13T 40T 25% /gpfs2 /dev/gpfs3 25T 4.9T 20T 21% /gpfs3 /dev/gpfs4 11T 133M 11T 1% /gpfs4 /dev/gpfs5 323T 34T 290T 11% /gpfs5 Total Is 461 To I think we have more space Could anyone make recommendation to troubleshoot find how many free space in GSS ? How to find the available space ? Thank you! Atmane -- Sent from my Android device with K-9 Mail. Please excuse my brevity. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From a.khiredine at meteo.dz Mon Jul 10 10:39:27 2017 From: a.khiredine at meteo.dz (Atmane) Date: Mon, 10 Jul 2017 10:39:27 +0100 Subject: [gpfsug-discuss] New Version Of GSS 3.1b 16-Feb-2017 Message-ID: Dear all, There is a new version of GSS Is there someone who made the update ? thanks Lenovo System x GPFS Storage Server (GSS) Version 3.1b 16-Feb-2017 What?s new in Lenovo GSS, Version 3.1 ? New features: - RHEL 7.2 ? GSS Expandability ? Online addition of more JBODs to an existing GSS building block (max. 6 JBOD total) ? Must be same JBOD type and drive type as in the existing building block ? Selectable Spectrum Scale (GPFS) software version and edition ?Four GSS tarballs, for Spectrum Scale {Standard or Advanced Edition} @ {v4.1.1 or v4.2.1} ? Hardware news: ? 10TB drive support: two JBOD MTMs (0796-HCJ/16X and 0796-HCK/17X), drive FRU (01GV110), no drive option ? Withdrawal of the 3TB drive models (0796-HC3/07X and 0796-HC4/08X) ? GSS22 in xConfig (no more need for special-bid) ? Software and firmware news: ? Update of IBM Spectrum Scale v4.2.1 to latest PTF level ? Update of Intel OPA from 10.1 to 10.2 (incl. performance fixes) ? Refresh of server and adapter FW levels to Scalable Infrastructure ?16C? recommended levels ? Not much news this time, as ?16C? FW is almost identical to ?16B - List GPFS RPM gpfs.adv-4.2.1-2.12.x86_64.rpm gpfs.base-4.2.1-2.12.x86_64.rpm gpfs.callhome-4.2.1-1.000.el7.noarch.rpm gpfs.callhome-ecc-client-4.2.1-1.000.noarch.rpm gpfs.crypto-4.2.1-2.12.x86_64.rpm gpfs.docs-4.2.1-2.12.noarch.rpm gpfs.ext-4.2.1-2.12.x86_64.rpm gpfs.gnr-4.2.1-2.12.x86_64.rpm gpfs.gnr.base-1.0.0-0.x86_64.rpm gpfs.gpl-4.2.1-2.12.noarch.rpm gpfs.gskit-8.0.50-57.x86_64.rpm gpfs.gss.firmware-4.2.0-5.x86_64.rpm gpfs.gss.pmcollector-4.2.2-2.el7.x86_64.rpm gpfs.gss.pmsensors-4.2.2-2.el7.x86_64.rpm gpfs.gui-4.2.1-2.3.noarch.rpm gpfs.java-4.2.2-2.x86_64.rpm gpfs.msg.en_US-4.2.1-2.12.noarch.rpm -- Atmane Khiredine HPC System Admin | Office National de la M?t?orologie T?l : +213 21 50 73 93 Poste 303 | Fax : +213 21 50 79 40 | E-mail : a.khiredine at meteo.dz From Greg.Lehmann at csiro.au Tue Jul 11 05:54:39 2017 From: Greg.Lehmann at csiro.au (Greg.Lehmann at csiro.au) Date: Tue, 11 Jul 2017 04:54:39 +0000 Subject: [gpfsug-discuss] Requirement to keep gpfs.smb the same on all nodes In-Reply-To: References: Message-ID: <4c9ae144c1114b85b7f2cdc27eefd749@exch1-cdc.nexus.csiro.au> Yes, although it is early days for us and I would not say we have finished testing as yet. We have upgraded twice to get there from 4.2.3-0. It seems OK and I have not noticed any changes from 4.2.3.0. Greg From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Sobey, Richard A Sent: Friday, 7 July 2017 10:02 PM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] Requirement to keep gpfs.smb the same on all nodes Just following up on this, has anyone successfully deployed Protocols (SMB) on RHEL 7.3 with the 4.2.3-2 packages? Richard From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Sobey, Richard A Sent: 04 July 2017 12:12 To: gpfsug main discussion list > Subject: Re: [gpfsug-discuss] Requirement to keep gpfs.smb the same on all nodes OK Simon, thanks. I suppose we're all in the same boat having to get change management approval etc! From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Simon Thompson (IT Research Support) Sent: 04 July 2017 12:09 To: gpfsug main discussion list > Subject: Re: [gpfsug-discuss] Requirement to keep gpfs.smb the same on all nodes AFAIK. Always. We have had the service eat itself BTW by having different code releases and trying this. Yes its a PITA that we have to get a change approval for it (so we don't do it as often as we should)... The upgrade process upgrades the SMB registry, we have also seen the CTDB lock stuff break when they are not running the same code release, so now we just don't do this. Simon From: > on behalf of "Sobey, Richard A" > Reply-To: "gpfsug-discuss at spectrumscale.org" > Date: Tuesday, 4 July 2017 at 11:54 To: "gpfsug-discuss at spectrumscale.org" > Subject: [gpfsug-discuss] Requirement to keep gpfs.smb the same on all nodes Hi all, For how long has this requirement been in force, and why? https://www.ibm.com/support/knowledgecenter/en/STXKQY_4.2.3/com.ibm.spectrum.scale.v4r23.doc/bl1ins_updatingsmb.htm All protocol nodes running the SMB service must have the same version of gpfs.smb installed at any time. This requires a brief outage of the SMB service to upgrade gpfs.smb to the newer version across all protocol nodes. The procedure outlined here is intended to reduce the outage to a minimum. Previously I've upgraded nodes one at a time over the course of a few days. Is the impact just that we won't be supported, or will a hole open up beneath my feet and swallow me whole? I really don't fancy the headache of getting approvals to get an outage of even 5 minutes at 6am.... Cheers Richard -------------- next part -------------- An HTML attachment was scrubbed... URL: From heiner.billich at psi.ch Tue Jul 11 10:36:39 2017 From: heiner.billich at psi.ch (Billich Heinrich Rainer (PSI)) Date: Tue, 11 Jul 2017 09:36:39 +0000 Subject: [gpfsug-discuss] does AFM support NFS via RDMA Message-ID: <3478917F-7864-4D84-847F-1ED249098AD9@psi.ch> Hello, We run AFM using NFS as transport between home and cache. Using IP-over-Infiniband we see a throughput between 1 and 2 GB/s. This is not bad but far from what a native IB link provides ? 6GB/s . Does AFM?s nfs client on gateway nodes support NFS using RDMA? I would like to try. Or should we try to tune nfs and the IP stack ? I wonder if anybody got throughput above 2 GB/s using IPoIB and FDR between two nodes? We can?t use a native gpfs multicluster mount ? this links home and cache much too strong: If home fails cache will unmount the cache fileset ? this is what I get from the manuals. We run spectrum scale 4.2.2/4.2.3 on Redhat 7. Thank you, Heiner Billich -- Paul Scherrer Institut Heiner Billich WHGA 106 CH 5232 Villigen 056 310 36 02 From abeattie at au1.ibm.com Tue Jul 11 11:14:37 2017 From: abeattie at au1.ibm.com (Andrew Beattie) Date: Tue, 11 Jul 2017 10:14:37 +0000 Subject: [gpfsug-discuss] does AFM support NFS via RDMA In-Reply-To: <3478917F-7864-4D84-847F-1ED249098AD9@psi.ch> References: <3478917F-7864-4D84-847F-1ED249098AD9@psi.ch> Message-ID: An HTML attachment was scrubbed... URL: From bbanister at jumptrading.com Tue Jul 11 15:46:42 2017 From: bbanister at jumptrading.com (Bryan Banister) Date: Tue, 11 Jul 2017 14:46:42 +0000 Subject: [gpfsug-discuss] does AFM support NFS via RDMA In-Reply-To: References: <3478917F-7864-4D84-847F-1ED249098AD9@psi.ch> Message-ID: <44d78fdff53f457a998f240cdf4510d0@jumptrading.com> Sounds like a very interesting topic for an upcoming GPFS UG meeting? say SC?17? -B From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Andrew Beattie Sent: Tuesday, July 11, 2017 5:15 AM To: gpfsug-discuss at spectrumscale.org Cc: gpfsug-discuss at spectrumscale.org; jake.carroll at uq.edu.au Subject: Re: [gpfsug-discuss] does AFM support NFS via RDMA Bilich, Reach out to Jake Carrol at Uni of QLD UQ have been playing with NFS over 10GB / 40GB and 100GB Ethernet and there is LOTS of tuning that you can do to improve how things work Regards, Andrew Beattie Software Defined Storage - IT Specialist Phone: 614-2133-7927 E-mail: abeattie at au1.ibm.com ----- Original message ----- From: "Billich Heinrich Rainer (PSI)" > Sent by: gpfsug-discuss-bounces at spectrumscale.org To: "gpfsug-discuss at spectrumscale.org" > Cc: Subject: [gpfsug-discuss] does AFM support NFS via RDMA Date: Tue, Jul 11, 2017 7:36 PM Hello, We run AFM using NFS as transport between home and cache. Using IP-over-Infiniband we see a throughput between 1 and 2 GB/s. This is not bad but far from what a native IB link provides ? 6GB/s . Does AFM?s nfs client on gateway nodes support NFS using RDMA? I would like to try. Or should we try to tune nfs and the IP stack ? I wonder if anybody got throughput above 2 GB/s using IPoIB and FDR between two nodes? We can?t use a native gpfs multicluster mount ? this links home and cache much too strong: If home fails cache will unmount the cache fileset ? this is what I get from the manuals. We run spectrum scale 4.2.2/4.2.3 on Redhat 7. Thank you, Heiner Billich -- Paul Scherrer Institut Heiner Billich WHGA 106 CH 5232 Villigen 056 310 36 02 _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss ________________________________ Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you are hereby notified that any review, dissemination or copying of this email is strictly prohibited, and to please notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request or solicitation of any kind to buy, sell, subscribe, redeem or perform any type of transaction of a financial product. -------------- next part -------------- An HTML attachment was scrubbed... URL: From jake.carroll at uq.edu.au Tue Jul 11 22:38:43 2017 From: jake.carroll at uq.edu.au (Jake Carroll) Date: Tue, 11 Jul 2017 21:38:43 +0000 Subject: [gpfsug-discuss] does AFM support NFS via RDMA In-Reply-To: <44d78fdff53f457a998f240cdf4510d0@jumptrading.com> References: <3478917F-7864-4D84-847F-1ED249098AD9@psi.ch> <44d78fdff53f457a998f240cdf4510d0@jumptrading.com> Message-ID: <72D0CC62-8663-4072-AFA1-735D75EEBBE1@uq.edu.au> I?ll be there! From: Bryan Banister Date: Wednesday, 12 July 2017 at 12:46 am To: gpfsug main discussion list Cc: Jake Carroll Subject: RE: [gpfsug-discuss] does AFM support NFS via RDMA Sounds like a very interesting topic for an upcoming GPFS UG meeting? say SC?17? -B From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Andrew Beattie Sent: Tuesday, July 11, 2017 5:15 AM To: gpfsug-discuss at spectrumscale.org Cc: gpfsug-discuss at spectrumscale.org; jake.carroll at uq.edu.au Subject: Re: [gpfsug-discuss] does AFM support NFS via RDMA Bilich, Reach out to Jake Carrol at Uni of QLD UQ have been playing with NFS over 10GB / 40GB and 100GB Ethernet and there is LOTS of tuning that you can do to improve how things work Regards, Andrew Beattie Software Defined Storage - IT Specialist Phone: 614-2133-7927 E-mail: abeattie at au1.ibm.com ----- Original message ----- From: "Billich Heinrich Rainer (PSI)" > Sent by: gpfsug-discuss-bounces at spectrumscale.org To: "gpfsug-discuss at spectrumscale.org" > Cc: Subject: [gpfsug-discuss] does AFM support NFS via RDMA Date: Tue, Jul 11, 2017 7:36 PM Hello, We run AFM using NFS as transport between home and cache. Using IP-over-Infiniband we see a throughput between 1 and 2 GB/s. This is not bad but far from what a native IB link provides ? 6GB/s . Does AFM?s nfs client on gateway nodes support NFS using RDMA? I would like to try. Or should we try to tune nfs and the IP stack ? I wonder if anybody got throughput above 2 GB/s using IPoIB and FDR between two nodes? We can?t use a native gpfs multicluster mount ? this links home and cache much too strong: If home fails cache will unmount the cache fileset ? this is what I get from the manuals. We run spectrum scale 4.2.2/4.2.3 on Redhat 7. Thank you, Heiner Billich -- Paul Scherrer Institut Heiner Billich WHGA 106 CH 5232 Villigen 056 310 36 02 _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss ________________________________ Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you are hereby notified that any review, dissemination or copying of this email is strictly prohibited, and to please notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request or solicitation of any kind to buy, sell, subscribe, redeem or perform any type of transaction of a financial product. -------------- next part -------------- An HTML attachment was scrubbed... URL: From kkr at lbl.gov Tue Jul 11 23:07:49 2017 From: kkr at lbl.gov (Kristy Kallback-Rose) Date: Tue, 11 Jul 2017 15:07:49 -0700 Subject: [gpfsug-discuss] does AFM support NFS via RDMA In-Reply-To: <44d78fdff53f457a998f240cdf4510d0@jumptrading.com> References: <3478917F-7864-4D84-847F-1ED249098AD9@psi.ch> <44d78fdff53f457a998f240cdf4510d0@jumptrading.com> Message-ID: <9BA6A8E3-D633-4DFF-826F-5ACE49361694@lbl.gov> Sounds good. Is someone willing to take on this talk? User-driven talks on real experiences are always welcome. Cheers, Kristy > On Jul 11, 2017, at 7:46 AM, Bryan Banister wrote: > > Sounds like a very interesting topic for an upcoming GPFS UG meeting? say SC?17? > -B > > From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org ] On Behalf Of Andrew Beattie > Sent: Tuesday, July 11, 2017 5:15 AM > To: gpfsug-discuss at spectrumscale.org > Cc: gpfsug-discuss at spectrumscale.org ; jake.carroll at uq.edu.au > Subject: Re: [gpfsug-discuss] does AFM support NFS via RDMA > > Bilich, > > Reach out to Jake Carrol at Uni of QLD > > UQ have been playing with NFS over 10GB / 40GB and 100GB Ethernet > and there is LOTS of tuning that you can do to improve how things work > > Regards, > Andrew Beattie > Software Defined Storage - IT Specialist > Phone: 614-2133-7927 > E-mail: abeattie at au1.ibm.com > > > ----- Original message ----- > From: "Billich Heinrich Rainer (PSI)" > > Sent by: gpfsug-discuss-bounces at spectrumscale.org > To: "gpfsug-discuss at spectrumscale.org " > > Cc: > Subject: [gpfsug-discuss] does AFM support NFS via RDMA > Date: Tue, Jul 11, 2017 7:36 PM > > Hello, > > We run AFM using NFS as transport between home and cache. Using IP-over-Infiniband we see a throughput between 1 and 2 GB/s. This is not bad but far from what a native IB link provides ? 6GB/s . Does AFM?s nfs client on gateway nodes support NFS using RDMA? I would like to try. Or should we try to tune nfs and the IP stack ? I wonder if anybody got throughput above 2 GB/s using IPoIB and FDR between two nodes? > > We can?t use a native gpfs multicluster mount ? this links home and cache much too strong: If home fails cache will unmount the cache fileset ? this is what I get from the manuals. > > We run spectrum scale 4.2.2/4.2.3 on Redhat 7. > > Thank you, > > Heiner Billich > > -- > Paul Scherrer Institut > Heiner Billich > WHGA 106 > CH 5232 Villigen > 056 310 36 02 > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > > Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you are hereby notified that any review, dissemination or copying of this email is strictly prohibited, and to please notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request or solicitation of any kind to buy, sell, subscribe, redeem or perform any type of transaction of a financial product. > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From Robert.Oesterlin at nuance.com Wed Jul 12 17:06:40 2017 From: Robert.Oesterlin at nuance.com (Oesterlin, Robert) Date: Wed, 12 Jul 2017 16:06:40 +0000 Subject: [gpfsug-discuss] WekaIO Unveils Cloud-Native Scalable File System Message-ID: Interesting. Performance is one thing, but how usable. IBM, watch your back :-) ?WekaIO is the world?s fastest distributed file system, processing four times the workload compared to IBM Spectrum Scale measured on Standard Performance Evaluation Corp. (SPEC) SFS 2014, an independent industry benchmark. Utilizing only 120 cloud compute instances with locally attached storage, WekaIO completed 1,000 simultaneous software builds compared to 240 on IBM?s high-end FlashSystem 900.? https://www.hpcwire.com/off-the-wire/wekaio-unveils-cloud-native-scalable-file-system/ Bob Oesterlin Sr Principal Storage Engineer, Nuance 507-269-0413 -------------- next part -------------- An HTML attachment was scrubbed... URL: From oehmes at gmail.com Wed Jul 12 18:24:19 2017 From: oehmes at gmail.com (Sven Oehme) Date: Wed, 12 Jul 2017 17:24:19 +0000 Subject: [gpfsug-discuss] WekaIO Unveils Cloud-Native Scalable File System In-Reply-To: References: Message-ID: while i really like competition on SpecSFS, the claims from the WekaIO people are lets say 'alternative facts' at best The Spectrum Scale results were done on 4 Nodes with 2 Flash Storage devices attached, they compare this to a WekaIO system with 14 times more memory (14 TB vs 1TB) , 120 SSD's (vs 64 Flashcore Modules) across 15 times more compute nodes (60 vs 4) . said all this, the article claims 1000 builds, while the actual submission only delivers 500 --> https://www.spec.org/sfs2014/results/sfs2014.html so they need 14 times more memory and cores and 2 times flash to show twice as many builds at double the response time, i leave this to everybody who understands this facts to judge how great that result really is. Said all this, Spectrum Scale scales almost linear if you double the nodes , network and storage accordingly, so there is no reason to believe we couldn't easily beat this, its just a matter of assemble the HW in a lab and run the test. btw we scale to 10k+ nodes , 2500 times the number we used in our publication :-D Sven On Wed, Jul 12, 2017 at 9:06 AM Oesterlin, Robert < Robert.Oesterlin at nuance.com> wrote: > Interesting. Performance is one thing, but how usable. IBM, watch your > back :-) > > > > *?WekaIO is the world?s fastest distributed file system, processing four > times the workload compared to IBM Spectrum Scale measured on Standard > Performance Evaluation Corp. (SPEC) SFS 2014, an independent industry > benchmark. Utilizing only 120 cloud compute instances with locally attached > storage, WekaIO completed 1,000 simultaneous software builds compared to > 240 on IBM?s high-end FlashSystem 900.?* > > > > > https://www.hpcwire.com/off-the-wire/wekaio-unveils-cloud-native-scalable-file-system/ > > > > Bob Oesterlin > Sr Principal Storage Engineer, Nuance > 507-269-0413 <(507)%20269-0413> > > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ewahl at osc.edu Wed Jul 12 19:20:06 2017 From: ewahl at osc.edu (Edward Wahl) Date: Wed, 12 Jul 2017 14:20:06 -0400 Subject: [gpfsug-discuss] WekaIO Unveils Cloud-Native Scalable File System In-Reply-To: References: Message-ID: <20170712142006.297cc9f2@osc.edu> Ah benchmarks... There are Lies, damn Lies, and then benchmarks. I've been in HPC a while on both the vendor (Cray) and customer side, and until I see Lustre, BeeGFS, Spectrum Scale, StorNext, OrangeFS, CEPH, Gluster, 'Flash in the pan v1', etc. all run on the EXACT same hardware I take ALL benchmarks with a POUND of salt. Too easy to finagle whatever result you want. Besides, benchmarks and real world performance are vastly different unless you are using IO kernels based on your local apps as your benchmark. I have a feeling MANY of the folks on this list feel similarly. ;) I recall when we figured out how someone cheated a SPEC test once by only using the inner-track of drives. ^_^ Ed On Wed, 12 Jul 2017 16:06:40 +0000 "Oesterlin, Robert" wrote: > Interesting. Performance is one thing, but how usable. IBM, watch your > back :-) > > ?WekaIO is the world?s fastest distributed file system, processing four times > the workload compared to IBM Spectrum Scale measured on Standard Performance > Evaluation Corp. (SPEC) SFS 2014, an independent industry benchmark. > Utilizing only 120 cloud compute instances with locally attached storage, > WekaIO completed 1,000 simultaneous software builds compared to 240 on IBM?s > high-end FlashSystem 900.? > > https://www.hpcwire.com/off-the-wire/wekaio-unveils-cloud-native-scalable-file-system/ > > Bob Oesterlin > Sr Principal Storage Engineer, Nuance > 507-269-0413 -- Ed Wahl Ohio Supercomputer Center 614-292-9302 From r.sobey at imperial.ac.uk Wed Jul 12 19:20:32 2017 From: r.sobey at imperial.ac.uk (Sobey, Richard A) Date: Wed, 12 Jul 2017 18:20:32 +0000 Subject: [gpfsug-discuss] WekaIO Unveils Cloud-Native Scalable File System In-Reply-To: References: Message-ID: I'm reading it as "WeakIO" which probably isn't a good thing.. both in the context of my eyesight and the negative connotation of the product :) ________________________________ From: gpfsug-discuss-bounces at spectrumscale.org on behalf of Oesterlin, Robert Sent: 12 July 2017 17:06 To: gpfsug main discussion list Subject: [gpfsug-discuss] WekaIO Unveils Cloud-Native Scalable File System Interesting. Performance is one thing, but how usable. IBM, watch your back :-) ?WekaIO is the world?s fastest distributed file system, processing four times the workload compared to IBM Spectrum Scale measured on Standard Performance Evaluation Corp. (SPEC) SFS 2014, an independent industry benchmark. Utilizing only 120 cloud compute instances with locally attached storage, WekaIO completed 1,000 simultaneous software builds compared to 240 on IBM?s high-end FlashSystem 900.? https://www.hpcwire.com/off-the-wire/wekaio-unveils-cloud-native-scalable-file-system/ Bob Oesterlin Sr Principal Storage Engineer, Nuance 507-269-0413 -------------- next part -------------- An HTML attachment was scrubbed... URL: From Robert.Oesterlin at nuance.com Wed Jul 12 19:27:12 2017 From: Robert.Oesterlin at nuance.com (Oesterlin, Robert) Date: Wed, 12 Jul 2017 18:27:12 +0000 Subject: [gpfsug-discuss] WekaIO Unveils Cloud-Native Scalable File System Message-ID: <92349D18-3614-4235-B30C-ADCCE3782CDD@nuance.com> Ah yes - Sven keeping us honest! Bob Oesterlin Sr Principal Storage Engineer, Nuance From: on behalf of Sven Oehme Reply-To: gpfsug main discussion list Date: Wednesday, July 12, 2017 at 12:24 PM To: gpfsug main discussion list Subject: [EXTERNAL] Re: [gpfsug-discuss] WekaIO Unveils Cloud-Native Scalable File System while i really like competition on SpecSFS, the claims from the WekaIO people are lets say 'alternative facts' at best The Spectrum Scale results were done on 4 Nodes with 2 Flash Storage devices attached, they compare this to a WekaIO system with 14 times more memory (14 TB vs 1TB) , 120 SSD's (vs 64 Flashcore Modules) across 15 times more compute nodes (60 vs 4) . said all this, the article claims 1000 builds, while the actual submission only delivers 500 --> https://www.spec.org/sfs2014/results/sfs2014.html -------------- next part -------------- An HTML attachment was scrubbed... URL: From sannaik2 at in.ibm.com Fri Jul 14 06:55:30 2017 From: sannaik2 at in.ibm.com (Sandeep Naik1) Date: Fri, 14 Jul 2017 11:25:30 +0530 Subject: [gpfsug-discuss] get free space in GSS In-Reply-To: References: , <4B32CB5C696F2849BDEF7DF9EACE884B5B41A465@SDEB-EXC02.meteo.dz> Message-ID: Hi Atmane, There can be two meaning of available free space? One what is available on existing filesystem. For this you rightly referred to df -h command o/p. This is the actual free space available in already created filesystem. Filesystem Size Used Avail Use% Mounted on /dev/gpfs1 49T 18T 31T 38% /gpfs1 /dev/gpfs2 53T 13T 40T 25% /gpfs2 /dev/gpfs3 25T 4.9T 20T 21% /gpfs3 /dev/gpfs4 11T 133M 11T 1% /gpfs4 /dev/gpfs5 323T 34T 290T 11% /gpfs5 The other is free space available in DA. For which as every one said use mmlsrecoverygroup -L Please note that is will give you raw free capacity. For usable free capacity in DA you have to add RAID over head. But based on your o/p you have very little/no free space left in DA. [root at server1 ~]#mmlsrecoverygroup declustered arrays with recovery group vdisks vdisks servers ------------------ ----------- ------ ------- BB1RGL 3 18 server1,server2 BB1RGR 3 18 server2,server1 -------------------------------------------------------------- [root at server ~]# mmlsrecoverygroup BB1RGL -L declustered recovery group arrays vdisks pdisks format version ----------------- ----------- ------ ------ -------------- BB1RGL 3 18 119 4.2.0.1 declustered needs replace scrub background activity array service vdisks pdisks spares threshold free space duration task progress priority ----------- ------- ------ ------ ------ --------- ---------- -------- ------------------------- LOG no 1 3 0,0 1 558 GiB 14 days scrub 51% low DA1 no 11 58 2,31 2 12 GiB 14 days scrub 78% low DA2 no 6 58 2,31 2 4096 MiB 14 days scrub 10% low Thanks, Sandeep Naik Elastic Storage server / GPFS Test ETZ-B, Hinjewadi Pune India (+91) 8600994314 From: "Kumaran Rajaram" To: gpfsug main discussion list , atmane khiredine Date: 09/07/2017 10:22 PM Subject: Re: [gpfsug-discuss] get free space in GSS Sent by: gpfsug-discuss-bounces at spectrumscale.org Hi Atmane, >> I can not find the free space Based on your output below, your setup currently has two recovery groups BB1RGL and BB1RGR. Issue "mmlsrecoverygroup BB1RGL -L" and "mmlsrecoverygroup BB1RGR -L" to obtain free space in each DA. Based on your "mmlsrecoverygroup BB1RGL -L" output below, BB1RGL "DA1" has 12GiB and "DA2" has 4GiB free space. The metadataOnly and dataOnly vdisk/NSD are created from DA1 and DA2. declustered needs replace scrub background activity array service vdisks pdisks spares threshold free space duration task progress priority ----------- ------- ------ ------ ------ --------- ---------- -------- ------------------------- LOG no 1 3 0,0 1 558 GiB 14 days scrub 51% low DA1 no 11 58 2,31 2 12 GiB 14 days scrub 78% low DA2 no 6 58 2,31 2 4096 MiB 14 days scrub 10% low In addition, you may use "mmlsnsd" to obtain mapping of file-system to vdisk/NSD + use "mmdf " command to query user or available capacity on a GPFS file system. Hope this helps, -Kums From: atmane khiredine To: Laurence Horrocks-Barlow , "gpfsug main discussion list" Date: 07/09/2017 08:27 AM Subject: Re: [gpfsug-discuss] get free space in GSS Sent by: gpfsug-discuss-bounces at spectrumscale.org thank you very much for replying. I can not find the free space Here is the output of mmlsrecoverygroup [root at server1 ~]#mmlsrecoverygroup declustered arrays with recovery group vdisks vdisks servers ------------------ ----------- ------ ------- BB1RGL 3 18 server1,server2 BB1RGR 3 18 server2,server1 -------------------------------------------------------------- [root at server ~]# mmlsrecoverygroup BB1RGL -L declustered recovery group arrays vdisks pdisks format version ----------------- ----------- ------ ------ -------------- BB1RGL 3 18 119 4.2.0.1 declustered needs replace scrub background activity array service vdisks pdisks spares threshold free space duration task progress priority ----------- ------- ------ ------ ------ --------- ---------- -------- ------------------------- LOG no 1 3 0,0 1 558 GiB 14 days scrub 51% low DA1 no 11 58 2,31 2 12 GiB 14 days scrub 78% low DA2 no 6 58 2,31 2 4096 MiB 14 days scrub 10% low declustered checksum vdisk RAID code array vdisk size block size granularity state remarks ------------------ ------------------ ----------- ---------- ---------- ----------- ----- ------- gss0_logtip 3WayReplication LOG 128 MiB 1 MiB 512 ok logTip gss0_loghome 4WayReplication DA1 40 GiB 1 MiB 512 ok log BB1RGL_GPFS4_META1 4WayReplication DA1 451 GiB 1 MiB 32 KiB ok BB1RGL_GPFS4_DATA1 8+2p DA1 5133 GiB 1 MiB 32 KiB ok BB1RGL_GPFS1_META1 4WayReplication DA1 451 GiB 1 MiB 32 KiB ok BB1RGL_GPFS1_DATA1 8+2p DA1 12 TiB 1 MiB 32 KiB ok BB1RGL_GPFS3_META1 4WayReplication DA1 451 GiB 1 MiB 32 KiB ok BB1RGL_GPFS3_DATA1 8+2p DA1 12 TiB 1 MiB 32 KiB ok BB1RGL_GPFS2_META1 4WayReplication DA1 451 GiB 1 MiB 32 KiB ok BB1RGL_GPFS2_DATA1 8+2p DA1 13 TiB 2 MiB 32 KiB ok BB1RGL_GPFS2_META2 4WayReplication DA2 451 GiB 1 MiB 32 KiB ok BB1RGL_GPFS2_DATA2 8+2p DA2 13 TiB 2 MiB 32 KiB ok BB1RGL_GPFS1_META2 4WayReplication DA2 451 GiB 1 MiB 32 KiB ok BB1RGL_GPFS1_DATA2 8+2p DA2 12 TiB 1 MiB 32 KiB ok BB1RGL_GPFS5_META1 4WayReplication DA1 750 GiB 1 MiB 32 KiB ok BB1RGL_GPFS5_DATA1 8+2p DA1 70 TiB 16 MiB 32 KiB ok BB1RGL_GPFS5_META2 4WayReplication DA2 750 GiB 1 MiB 32 KiB ok BB1RGL_GPFS5_DATA2 8+2p DA2 90 TiB 16 MiB 32 KiB ok config data declustered array VCD spares actual rebuild spare space remarks ------------------ ------------------ ------------- --------------------------------- ---------------- rebuild space DA1 31 34 pdisk rebuild space DA2 31 35 pdisk config data max disk group fault tolerance actual disk group fault tolerance remarks ------------------ --------------------------------- --------------------------------- ---------------- rg descriptor 1 enclosure + 1 drawer 1 enclosure + 1 drawer limiting fault tolerance system index 2 enclosure 1 enclosure + 1 drawer limited by rg descriptor vdisk max disk group fault tolerance actual disk group fault tolerance remarks ------------------ --------------------------------- --------------------------------- ---------------- gss0_logtip 2 enclosure 1 enclosure + 1 drawer limited by rg descriptor gss0_loghome 1 enclosure + 1 drawer 1 enclosure + 1 drawer BB1RGL_GPFS4_META1 1 enclosure + 1 drawer 1 enclosure + 1 drawer BB1RGL_GPFS4_DATA1 2 drawer 2 drawer BB1RGL_GPFS1_META1 1 enclosure + 1 drawer 1 enclosure + 1 drawer BB1RGL_GPFS1_DATA1 2 drawer 2 drawer BB1RGL_GPFS3_META1 1 enclosure + 1 drawer 1 enclosure + 1 drawer BB1RGL_GPFS3_DATA1 2 drawer 2 drawer BB1RGL_GPFS2_META1 1 enclosure + 1 drawer 1 enclosure + 1 drawer BB1RGL_GPFS2_DATA1 2 drawer 2 drawer BB1RGL_GPFS2_META2 3 enclosure 1 enclosure + 1 drawer limited by rg descriptor BB1RGL_GPFS2_DATA2 2 drawer 2 drawer BB1RGL_GPFS1_META2 3 enclosure 1 enclosure + 1 drawer limited by rg descriptor BB1RGL_GPFS1_DATA2 2 drawer 2 drawer BB1RGL_GPFS5_META1 1 enclosure + 1 drawer 1 enclosure + 1 drawer BB1RGL_GPFS5_DATA1 2 drawer 2 drawer BB1RGL_GPFS5_META2 3 enclosure 1 enclosure + 1 drawer limited by rg descriptor BB1RGL_GPFS5_DATA2 2 drawer 2 drawer active recovery group server servers ----------------------------------------------- ------- server1 server1,server2 Atmane Khiredine HPC System Administrator | Office National de la M?t?orologie T?l : +213 21 50 73 93 # 303 | Fax : +213 21 50 79 40 | E-mail : a.khiredine at meteo.dz ________________________________ De : Laurence Horrocks-Barlow [laurence at qsplace.co.uk] Envoy? : dimanche 9 juillet 2017 09:58 ? : gpfsug main discussion list; atmane khiredine; gpfsug-discuss at spectrumscale.org Objet : Re: [gpfsug-discuss] get free space in GSS You can check the recovery groups to see if there is any remaining space. I don't have access to my test system to confirm the syntax however if memory serves. Run mmlsrecoverygroup to get a list of all the recovery groups then: mmlsrecoverygroup -L This will list all your declustered arrays and their free space. Their might be another method, however this way has always worked well for me. -- Lauz On 9 July 2017 09:00:07 BST, Atmane wrote: Dear all, My name is Khiredine Atmane and I am a HPC system administrator at the National Office of Meteorology Algeria . We have a GSS24 running gss2.5.10.3-3b and gpfs-4.2.0.3. GSS configuration: 4 enclosures, 6 SSDs, 1 empty slots, 239 disks total, 0 NVRAM partitions disks = 3Tb SSD = 200 Gb df -h Filesystem Size Used Avail Use% Mounted on /dev/gpfs1 49T 18T 31T 38% /gpfs1 /dev/gpfs2 53T 13T 40T 25% /gpfs2 /dev/gpfs3 25T 4.9T 20T 21% /gpfs3 /dev/gpfs4 11T 133M 11T 1% /gpfs4 /dev/gpfs5 323T 34T 290T 11% /gpfs5 Total Is 461 To I think we have more space Could anyone make recommendation to troubleshoot find how many free space in GSS ? How to find the available space ? Thank you! Atmane -- Sent from my Android device with K-9 Mail. Please excuse my brevity. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From S.J.Thompson at bham.ac.uk Mon Jul 17 13:13:58 2017 From: S.J.Thompson at bham.ac.uk (Simon Thompson (IT Research Support)) Date: Mon, 17 Jul 2017 12:13:58 +0000 Subject: [gpfsug-discuss] Job Vacancy: Research Storage Systems Senior Specialist/Specialist Message-ID: Hi all, Members of this group may be particularly interested in the role "Research Storage Systems Senior Specialist/Specialist"... As part of the University of Birmingham's investment in our ability to support outstanding research by providing technical computing facilities, we are expanding the team and currently have 6 vacancies. I've provided a short description of each post, but please do follow the links where you will find the full job description attached at the bottom of the page. For some of the posts, they are graded either at 7 or 8 and will be appointed based upon skills and experience, the expectation is that if the appointment is made at grade 7 that as the successful candidate grows into the role, we should be able to regrade up. Research Storage Systems Senior Specialist/Specialist: https://goo.gl/NsL1EG Responsible for the delivery and maintenance of research storage systems, focussed on the delivery of Spectrum Scale storage systems and data protection. (this is available either as a grade 8 or grade 7 post depending on skills and experience so may suit someone wishing to grow into the senior role) HPC Specialist post (Research Systems Administrator / Senior Research Systems Administrator): https://goo.gl/1SxM4j Helping to deliver and operationally support the technical computing environments, with a focus on supporting and delivery of HPC and HTC services. (this is available either as a grade 7 or grade 8 post depending on skills and experience so may suit someone wishing to grow into the senior role) Research Computing (Analytics): https://goo.gl/uCNdMH Helping our researchers to understand data analytics and supporting their research Senior Research Software Engineer: https://goo.gl/dcGgAz Working with research groups to develop and deliver bespoke software solutions to support their research Research Training and Engagement Officer: https://goo.gl/U48m7z Helping with the delivery and coordination of training and engagement works to support users helping ensure they are able to use the facilities to support their research. Research IT Partner in the College of Arts and Law: https://goo.gl/A7czEA Providing technical knowledge and skills to support project delivery through research bid preparation to successful solution delivery. Simon From cgirda at wustl.edu Mon Jul 17 20:40:42 2017 From: cgirda at wustl.edu (Chakravarthy Girda) Date: Mon, 17 Jul 2017 14:40:42 -0500 Subject: [gpfsug-discuss] Setting up IBM Spectrum Scale performance monitoring bridge for Grafana Message-ID: Hello Team, This is Chakri from Washu at STL. Thank you for the great opportunity to join this group. I am trying to setup performance monitoring for our GPFS cluster. As part of the project configured pmcollector and pmsensors on our GPFS cluster. 1. Created a 'spectrumscale' data-source bridge on our grafana ( NOT SET TO DEFAULT ) https://www.ibm.com/support/knowledgecenter/en/STXKQY_4.2.3/com.ibm.spectrum.scale.v4r23.doc/bl1adv_setuppmbridgeforgrafana.htm 2. Created a new dash-board by importing the pre-built dashboard. https://www.ibm.com/developerworks/community/wikis/home?lang=en#!/wiki/General%20Parallel%20File%20System%20%28GPFS%29/page/Importing%20predefined%20Grafana%20dashboards Here is the issue. I don't get any graph updates if I don't set "spectrumscale" as DEFAULT data-source but that is breaking rest of the graphs ( we have ton of dashboards). So I had to uncheck the "spectrumscale" as default data-source. If I go and set the "data-source" manually to "spectrumscale" on the pre-built dashboard graphs. I see the wheel spinning but no updates. Any ideas? Thank you Chakri From Robert.Oesterlin at nuance.com Tue Jul 18 12:45:38 2017 From: Robert.Oesterlin at nuance.com (Oesterlin, Robert) Date: Tue, 18 Jul 2017 11:45:38 +0000 Subject: [gpfsug-discuss] Setting up IBM Spectrum Scale performance monitoring bridge for Grafana Message-ID: Hi Chakri If you?re getting the ?ole ?spinning wheel? on your dashboard, then it?s one of two things: 1) The Grafana bridge is not running 2) The dashboard is requesting a metric that isn?t available. Assuming that you?ve verified that the pmcollector/pmsensor setup is work right in your cluster, I?d then start looking at the log files for the Grafana Bridge and the pmcollector to see if you can determine if either is producing an error - like the metric wasn?t found. The other thing to try is setup a small test graph with a known metric being collected by you pmsensor configuration, rather than try one of Helene?s default dashboards, which are fairly complex. Drop me a note directly if you need to. Bob Oesterlin Sr Principal Storage Engineer, Nuance From cgirda at wustl.edu Tue Jul 18 15:57:05 2017 From: cgirda at wustl.edu (Chakravarthy Girda) Date: Tue, 18 Jul 2017 09:57:05 -0500 Subject: [gpfsug-discuss] Setting up IBM Spectrum Scale performance monitoring bridge for Grafana In-Reply-To: References: Message-ID: Bob, Found the issue to be with https is getting blocked with "direct" connection. Switched it to proxy on the bridge-port. That helped and now I can see graphs. Thank you Chakri On 7/18/17 6:45 AM, Oesterlin, Robert wrote: > Hi Chakri > > If you?re getting the ?ole ?spinning wheel? on your dashboard, then it?s one of two things: > > 1) The Grafana bridge is not running > 2) The dashboard is requesting a metric that isn?t available. > > Assuming that you?ve verified that the pmcollector/pmsensor setup is work right in your cluster, I?d then start looking at the log files for the Grafana Bridge and the pmcollector to see if you can determine if either is producing an error - like the metric wasn?t found. The other thing to try is setup a small test graph with a known metric being collected by you pmsensor configuration, rather than try one of Helene?s default dashboards, which are fairly complex. > > Drop me a note directly if you need to. > > Bob Oesterlin > Sr Principal Storage Engineer, Nuance > > > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss From david_johnson at brown.edu Tue Jul 18 18:21:06 2017 From: david_johnson at brown.edu (David Johnson) Date: Tue, 18 Jul 2017 13:21:06 -0400 Subject: [gpfsug-discuss] mmsysmon.py revisited Message-ID: We also noticed a fair amount of CPU time accumulated by mmsysmon.py on our diskless compute nodes. I read the earlier query, where it was answered: > ces == Cluster Export Services, mmsysmon.py comes from mmcesmon. It is used for managing export services of GPFS. If it is killed, your nfs/smb etc will be out of work. > Their overhead is small and they are very important. Don't attempt to kill them. > Our question is this ? we don?t run the latest ?protocols", our NFS is CNFS, and our CIFS is clustered CIFS. I can understand it might be needed with Ganesha, but on every node? Why in the world would I be getting this daemon running on all client nodes, when I didn?t install the ?protocols" version of the distribution? We have release 4.2.2 at the moment. How can we disable this? Thanks, ? ddj -------------- next part -------------- An HTML attachment was scrubbed... URL: From jonathon.anderson at colorado.edu Tue Jul 18 18:51:21 2017 From: jonathon.anderson at colorado.edu (Jonathon A Anderson) Date: Tue, 18 Jul 2017 17:51:21 +0000 Subject: [gpfsug-discuss] mmsysmon.py revisited In-Reply-To: References: Message-ID: There?s no official way to cleanly disable it so far as I know yet; but you can defacto disable it by deleting /var/mmfs/mmsysmon/mmsysmonitor.conf. It?s a huge problem. I don?t understand why it hasn?t been given much credit by dev or support. ~jonathon On 7/18/17, 11:21 AM, "gpfsug-discuss-bounces at spectrumscale.org on behalf of David Johnson" wrote: We also noticed a fair amount of CPU time accumulated by mmsysmon.py on our diskless compute nodes. I read the earlier query, where it was answered: ces == Cluster Export Services, mmsysmon.py comes from mmcesmon. It is used for managing export services of GPFS. If it is killed, your nfs/smb etc will be out of work. Their overhead is small and they are very important. Don't attempt to kill them. Our question is this ? we don?t run the latest ?protocols", our NFS is CNFS, and our CIFS is clustered CIFS. I can understand it might be needed with Ganesha, but on every node? Why in the world would I be getting this daemon running on all client nodes, when I didn?t install the ?protocols" version of the distribution? We have release 4.2.2 at the moment. How can we disable this? Thanks, ? ddj From S.J.Thompson at bham.ac.uk Tue Jul 18 20:21:46 2017 From: S.J.Thompson at bham.ac.uk (Simon Thompson (IT Research Support)) Date: Tue, 18 Jul 2017 19:21:46 +0000 Subject: [gpfsug-discuss] SOBAR questions In-Reply-To: References: Message-ID: So just following up on my questions from January. We tried to do 2. I.e. Restore to a new file-system with different block sizes. It got part way through creating the file-sets on the new SOBAR file-system and then GPFS asserts and crashes... We weren't actually intentionally trying to move block sizes, but because we were restoring from a traditional SAN based system to a shiny new GNR based system, we'd manually done the FS create steps. I have a PMR open now. I don't know if someone internally in IBM actually tried this after my emails, as apparently there is a similar internal defect which is ~6 months old... Simon From: > on behalf of Marc A Kaplan > Reply-To: "gpfsug-discuss at spectrumscale.org" > Date: Friday, 20 January 2017 at 17:57 To: "gpfsug-discuss at spectrumscale.org" > Subject: Re: [gpfsug-discuss] SOBAR questions I worked on some aspects of SOBAR, but without studying and testing the commands - I'm not in a position right now to give simple definitive answers - having said that.... Generally your questions are reasonable and the answer is: "Yes it should be possible to do that, but you might be going a bit beyond the design point.., so you'll need to try it out on a (smaller) test system with some smaller tedst files. Point by point. 1. If SOBAR is unable to restore a particular file, perhaps because the premigration did not complete -- you should only lose that particular file, and otherwise "keep going". 2. I think SOBAR helps you build a similar file system to the original, including block sizes. So you'd have to go in and tweak the file system creation step(s). I think this is reasonable... If you hit a problem... IMO that would be a fair APAR. 3. Similar to 2. From: "Simon Thompson (Research Computing - IT Services)" > To: "gpfsug-discuss at spectrumscale.org" > Date: 01/20/2017 10:44 AM Subject: [gpfsug-discuss] SOBAR questions Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ We've recently been looking at deploying SOBAR to support DR of some of our file-systems, I have some questions (as ever!) that I can't see are clearly documented, so was wondering if anyone has any insight on this. 1. If we elect not to premigrate certain files, are we still able to use SOBAR? We are happy to take a hit that those files will never be available again, but some are multi TB files which change daily and we can't stream to tape effectively. 2. When doing a restore, does the block size of the new SOBAR'd to file-system have to match? For example the old FS was 1MB blocks, the new FS we create with 2MB blocks. Will this work (this strikes me as one way we might be able to migrate an FS to a new block size?)? 3. If the file-system was originally created with an older GPFS code but has since been upgraded, does restore work, and does it matter what client code? E.g. We have a file-system that was originally 3.5.x, its been upgraded over time to 4.2.2.0. Will this work if the client code was say 4.2.2.5 (with an appropriate FS version). E.g. Mmlsfs lists, "13.01 (3.5.0.0) Original file system version" and "16.00 (4.2.2.0) Current file system version". Say there was 4.2.2.5 which created version 16.01 file-system as the new FS, what would happen? This sort of detail is missing from: https://www.ibm.com/support/knowledgecenter/STXKQY_4.2.2/com.ibm.spectrum.s cale.v4r22.doc/bl1adv_sobarrestore.htm But is probably quite important for us to know! Thanks Simon _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From leslie.james.elliott at gmail.com Wed Jul 19 08:22:49 2017 From: leslie.james.elliott at gmail.com (leslie elliott) Date: Wed, 19 Jul 2017 17:22:49 +1000 Subject: [gpfsug-discuss] AFM over NFS Message-ID: we are having a problem linking a target to a fileset we are able to manually connect with NFSv4 to the correct path on an NFS export down a particular subdirectory path, but when when we create a fileset with this same path as an afmTarget it connects with NFSv3 and actually connects to the top of the export even though mmafmctl displays the extended path information are we able to tell AFM to connect with NFSv4 in any way to work around this problem the NFS comes from a closed system, we can not change the configuration on it to fix the problem on the target thanks leslie -------------- next part -------------- An HTML attachment was scrubbed... URL: From Greg.Lehmann at csiro.au Wed Jul 19 08:53:58 2017 From: Greg.Lehmann at csiro.au (Greg.Lehmann at csiro.au) Date: Wed, 19 Jul 2017 07:53:58 +0000 Subject: [gpfsug-discuss] Scale / Perfmon / Grafana - services running but no data In-Reply-To: References: Message-ID: I?m having a play with this now too. Has anybody coded a systemd unit to handle step 2b in the knowledge centre article ? bridge creation on the gpfs side? It would save me a bit of effort. I?m also wondering about the CherryPy version. It looks like this has been developed on SLES which has the newer version mentioned as a standard package and yet RHEL with an older version of CherryPy is perhaps more common as it seems to have the best support for features of GPFS, like object and block protocols. Maybe SLES is in favour now? Cheers, Greg From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Andrew Beattie Sent: Thursday, 6 July 2017 3:07 PM To: gpfsug-discuss at spectrumscale.org Subject: [gpfsug-discuss] Scale / Perfmon / Grafana - services running but no data Greetings, I'm currently setting up Grafana to interact with one of our Scale Clusters and i've followed the knowledge centre link in terms of setup. https://www.ibm.com/support/knowledgecenter/en/STXKQY_4.2.3/com.ibm.spectrum.scale.v4r23.doc/bl1adv_setuppmbridgeforgrafana.htm However while everything appears to be working i'm not seeing any data coming through the reports within the grafana server, even though I can see data in the Scale GUI The current environment: [root at sc01n02 ~]# mmlscluster GPFS cluster information ======================== GPFS cluster name: sc01.spectrum GPFS cluster id: 18085710661892594990 GPFS UID domain: sc01.spectrum Remote shell command: /usr/bin/ssh Remote file copy command: /usr/bin/scp Repository type: CCR Node Daemon node name IP address Admin node name Designation ------------------------------------------------------------------ 1 sc01n01 10.2.12.11 sc01n01 quorum-manager-perfmon 2 sc01n02 10.2.12.12 sc01n02 quorum-manager-perfmon 3 sc01n03 10.2.12.13 sc01n03 quorum-manager-perfmon [root at sc01n02 ~]# [root at sc01n02 ~]# mmlsconfig Configuration data for cluster sc01.spectrum: --------------------------------------------- clusterName sc01.spectrum clusterId 18085710661892594990 autoload yes profile gpfsProtocolDefaults dmapiFileHandleSize 32 minReleaseLevel 4.2.2.0 ccrEnabled yes cipherList AUTHONLY maxblocksize 16M [cesNodes] maxMBpS 5000 numaMemoryInterleave yes enforceFilesetQuotaOnRoot yes workerThreads 512 [common] tscCmdPortRange 60000-61000 cesSharedRoot /ibm/cesSharedRoot/ces cifsBypassTraversalChecking yes syncSambaMetadataOps yes cifsBypassShareLocksOnRename yes adminMode central File systems in cluster sc01.spectrum: -------------------------------------- /dev/cesSharedRoot /dev/icos_demo /dev/scale01 [root at sc01n02 ~]# [root at sc01n02 ~]# systemctl status pmcollector ? pmcollector.service - LSB: Start the ZIMon performance monitor collector. Loaded: loaded (/etc/rc.d/init.d/pmcollector) Active: active (running) since Tue 2017-05-30 08:46:32 AEST; 1 months 6 days ago Docs: man:systemd-sysv-generator(8) Main PID: 2693 (ZIMonCollector) CGroup: /system.slice/pmcollector.service ??2693 /opt/IBM/zimon/ZIMonCollector -C /opt/IBM/zimon/ZIMonCollector.cfg... ??2698 python /opt/IBM/zimon/bin/pmmonitor.py -f /opt/IBM/zimon/syshealth... May 30 08:46:32 sc01n02 systemd[1]: Starting LSB: Start the ZIMon performance mon...... May 30 08:46:32 sc01n02 pmcollector[2584]: Starting performance monitor collector... May 30 08:46:32 sc01n02 systemd[1]: Started LSB: Start the ZIMon performance moni...r.. Hint: Some lines were ellipsized, use -l to show in full. From Grafana Server: [cid:image002.jpg at 01D300B7.CFE73E50] when I send a set of files to the cluster (3.8GB) I can see performance metrics within the Scale GUI [cid:image004.jpg at 01D300B7.CFE73E50] yet from the Grafana Dashboard im not seeing any data points [cid:image006.jpg at 01D300B7.CFE73E50] Can anyone provide some hints as to what might be happening? Regards, Andrew Beattie Software Defined Storage - IT Specialist Phone: 614-2133-7927 E-mail: abeattie at au1.ibm.com -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image002.jpg Type: image/jpeg Size: 19427 bytes Desc: image002.jpg URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image004.jpg Type: image/jpeg Size: 84412 bytes Desc: image004.jpg URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image006.jpg Type: image/jpeg Size: 37285 bytes Desc: image006.jpg URL: From janfrode at tanso.net Wed Jul 19 12:09:48 2017 From: janfrode at tanso.net (Jan-Frode Myklebust) Date: Wed, 19 Jul 2017 11:09:48 +0000 Subject: [gpfsug-discuss] SOBAR questions In-Reply-To: References: Message-ID: Nils Haustein did such a migration from v7000 Unified to ESS last year. Used SOBAR to avoid recalls from HSM. I believe he wrote a whitepaper on the process.. -jf tir. 18. jul. 2017 kl. 21.21 skrev Simon Thompson (IT Research Support) < S.J.Thompson at bham.ac.uk>: > So just following up on my questions from January. > > We tried to do 2. I.e. Restore to a new file-system with different block > sizes. It got part way through creating the file-sets on the new SOBAR > file-system and then GPFS asserts and crashes... We weren't actually > intentionally trying to move block sizes, but because we were restoring > from a traditional SAN based system to a shiny new GNR based system, we'd > manually done the FS create steps. > > I have a PMR open now. I don't know if someone internally in IBM actually > tried this after my emails, as apparently there is a similar internal > defect which is ~6 months old... > > Simon > > From: on behalf of Marc A > Kaplan > Reply-To: "gpfsug-discuss at spectrumscale.org" < > gpfsug-discuss at spectrumscale.org> > Date: Friday, 20 January 2017 at 17:57 > > To: "gpfsug-discuss at spectrumscale.org" > Subject: Re: [gpfsug-discuss] SOBAR questions > > I worked on some aspects of SOBAR, but without studying and testing the > commands - I'm not in a position right now to give simple definitive > answers - > having said that.... > > Generally your questions are reasonable and the answer is: "Yes it should > be possible to do that, but you might be going a bit beyond the design > point.., > so you'll need to try it out on a (smaller) test system with some smaller > tedst files. > > Point by point. > > 1. If SOBAR is unable to restore a particular file, perhaps because the > premigration did not complete -- you should only lose that particular file, > and otherwise "keep going". > > 2. I think SOBAR helps you build a similar file system to the original, > including block sizes. So you'd have to go in and tweak the file system > creation step(s). > I think this is reasonable... If you hit a problem... IMO that would be a > fair APAR. > > 3. Similar to 2. > > > > > > From: "Simon Thompson (Research Computing - IT Services)" < > S.J.Thompson at bham.ac.uk> > To: "gpfsug-discuss at spectrumscale.org" < > gpfsug-discuss at spectrumscale.org> > Date: 01/20/2017 10:44 AM > Subject: [gpfsug-discuss] SOBAR questions > Sent by: gpfsug-discuss-bounces at spectrumscale.org > ------------------------------ > > > > We've recently been looking at deploying SOBAR to support DR of some of > our file-systems, I have some questions (as ever!) that I can't see are > clearly documented, so was wondering if anyone has any insight on this. > > 1. If we elect not to premigrate certain files, are we still able to use > SOBAR? We are happy to take a hit that those files will never be available > again, but some are multi TB files which change daily and we can't stream > to tape effectively. > > 2. When doing a restore, does the block size of the new SOBAR'd to > file-system have to match? For example the old FS was 1MB blocks, the new > FS we create with 2MB blocks. Will this work (this strikes me as one way > we might be able to migrate an FS to a new block size?)? > > 3. If the file-system was originally created with an older GPFS code but > has since been upgraded, does restore work, and does it matter what client > code? E.g. We have a file-system that was originally 3.5.x, its been > upgraded over time to 4.2.2.0. Will this work if the client code was say > 4.2.2.5 (with an appropriate FS version). E.g. Mmlsfs lists, "13.01 > (3.5.0.0) Original file system version" and "16.00 (4.2.2.0) Current file > system version". Say there was 4.2.2.5 which created version 16.01 > file-system as the new FS, what would happen? > > This sort of detail is missing from: > https://www.ibm.com/support/knowledgecenter/STXKQY_4.2.2/com.ibm.spectrum.s > cale.v4r22.doc/bl1adv_sobarrestore.htm > > But is probably quite important for us to know! > > Thanks > > Simon > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: From Robert.Oesterlin at nuance.com Wed Jul 19 12:26:43 2017 From: Robert.Oesterlin at nuance.com (Oesterlin, Robert) Date: Wed, 19 Jul 2017 11:26:43 +0000 Subject: [gpfsug-discuss] Scale / Perfmon / Grafana - services running but no data In-Reply-To: References: Message-ID: Getting this: python zimonGrafanaIntf.py ?s < pmcollector host> via system is a bit of a tricky process, since this process will abort unless the pmcollector is fully up. With a large database, I?ve seen it take 3-5 mins for pmcollector to fully initialize. I?m sure a simple ?sleep and try again? wrapper would take care of that. It?s on my lengthy to-do list! On the CherryPy version - I run the bridge on my RH/Centos system with python 3.4 and used ?pip install cherrypy? and it picked up the latest version. Seems to work just fine. Bob Oesterlin Sr Principal Storage Engineer, Nuance From: on behalf of "Greg.Lehmann at csiro.au" Reply-To: gpfsug main discussion list Date: Wednesday, July 19, 2017 at 2:54 AM To: "gpfsug-discuss at spectrumscale.org" Subject: [EXTERNAL] Re: [gpfsug-discuss] Scale / Perfmon / Grafana - services running but no data I?m having a play with this now too. Has anybody coded a systemd unit to handle step 2b in the knowledge centre article ? bridge creation on the gpfs side? It would save me a bit of effort. I?m also wondering about the CherryPy version. It looks like this has been developed on SLES which has the newer version mentioned as a standard package and yet RHEL with an older version of CherryPy is perhaps more common as it seems to have the best support for features of GPFS, like object and block protocols. Maybe SLES is in favour now? -------------- next part -------------- An HTML attachment was scrubbed... URL: From MDIETZ at de.ibm.com Wed Jul 19 14:05:49 2017 From: MDIETZ at de.ibm.com (Mathias Dietz) Date: Wed, 19 Jul 2017 15:05:49 +0200 Subject: [gpfsug-discuss] mmsysmon.py revisited In-Reply-To: References: Message-ID: thanks for the feedback. Let me clarify what mmsysmon is doing. Since IBM Spectrum Scale 4.2.1 the mmsysmon process is used for the overall health monitoring and CES failover handling. Even without CES it is an essential part of the system because it monitors the individual components and provides health state information and error events. This information is needed by other Spectrum Scale components (mmhealth command, the IBM Spectrum Scale GUI, Support tools, Install Toolkit,..) and therefore disabling mmsysmon will impact them. > It?s a huge problem. I don?t understand why it hasn?t been given > much credit by dev or support. Over the last couple of month, the development team has put a strong focus on this topic. In order to monitor the health of the individual components, mmsysmon listens for notifications/callback but also has to do some polling. We are trying to reduce the polling overhead constantly and replace polling with notifications when possible. Several improvements have been added to 4.2.3, including the ability to configure the polling frequency to reduce the overhead. (mmhealth config interval) See https://www.ibm.com/support/knowledgecenter/en/STXKQY_4.2.3/com.ibm.spectrum.scale.v4r23.doc/bl1adm_mmhealth.htm In addition a new option has been introduced to clock align the monitoring threads in order to reduce CPU jitter. Nevertheless, we don't see significant CPU consumption by mmsysmon on our test systems. It might be a problem specific to your system environment or a wrong configuration therefore please get in contact with IBM support to analyze the root cause of the high usage. Kind regards Mathias Dietz IBM Spectrum Scale - Release Lead Architect and RAS Architect gpfsug-discuss-bounces at spectrumscale.org wrote on 07/18/2017 07:51:21 PM: > From: Jonathon A Anderson > To: gpfsug main discussion list > Date: 07/18/2017 07:51 PM > Subject: Re: [gpfsug-discuss] mmsysmon.py revisited > Sent by: gpfsug-discuss-bounces at spectrumscale.org > > There?s no official way to cleanly disable it so far as I know yet; > but you can defacto disable it by deleting /var/mmfs/mmsysmon/ > mmsysmonitor.conf. > > It?s a huge problem. I don?t understand why it hasn?t been given > much credit by dev or support. > > ~jonathon > > > On 7/18/17, 11:21 AM, "gpfsug-discuss-bounces at spectrumscale.org on > behalf of David Johnson" on behalf of david_johnson at brown.edu> wrote: > > > > > We also noticed a fair amount of CPU time accumulated by mmsysmon.py on > our diskless compute nodes. I read the earlier query, where it > was answered: > > > > > ces == Cluster Export Services, mmsysmon.py comes from > mmcesmon. It is used for managing export services of GPFS. If it is > killed, your nfs/smb etc will be out of work. > Their overhead is small and they are very important. Don't > attempt to kill them. > > > > > > > Our question is this ? we don?t run the latest ?protocols", our > NFS is CNFS, and our CIFS is clustered CIFS. > I can understand it might be needed with Ganesha, but on every node? > > > Why in the world would I be getting this daemon running on all > client nodes, when I didn?t install the ?protocols" version > of the distribution? We have release 4.2.2 at the moment. How > can we disable this? > > > Thanks, > ? ddj > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From david_johnson at brown.edu Wed Jul 19 14:28:23 2017 From: david_johnson at brown.edu (David Johnson) Date: Wed, 19 Jul 2017 09:28:23 -0400 Subject: [gpfsug-discuss] mmsysmon.py revisited In-Reply-To: References: Message-ID: I have opened a PMR, and the official response reflects what you just posted. In addition, it seems there are some performance issues with Python 2 that will be improved with eventual migration to Python 3. I was unaware of the mmhealth functions that the mmsysmon daemon provides. The impact we were seeing was some variation in MPI benchmark results when the nodes were fully loaded. I suppose it would be possible to turn off mmsysmon during the benchmarking, but I appreciate the effort at streamlining the monitor service. Cutting back on fork/exec, better python, less polling, more notifications? all good. Thanks for the details, ? ddj > On Jul 19, 2017, at 9:05 AM, Mathias Dietz wrote: > > thanks for the feedback. > > Let me clarify what mmsysmon is doing. > Since IBM Spectrum Scale 4.2.1 the mmsysmon process is used for the overall health monitoring and CES failover handling. > Even without CES it is an essential part of the system because it monitors the individual components and provides health state information and error events. > This information is needed by other Spectrum Scale components (mmhealth command, the IBM Spectrum Scale GUI, Support tools, Install Toolkit,..) and therefore disabling mmsysmon will impact them. > > > It?s a huge problem. I don?t understand why it hasn?t been given > > much credit by dev or support. > > Over the last couple of month, the development team has put a strong focus on this topic. > In order to monitor the health of the individual components, mmsysmon listens for notifications/callback but also has to do some polling. > We are trying to reduce the polling overhead constantly and replace polling with notifications when possible. > > Several improvements have been added to 4.2.3, including the ability to configure the polling frequency to reduce the overhead. (mmhealth config interval) > See https://www.ibm.com/support/knowledgecenter/en/STXKQY_4.2.3/com.ibm.spectrum.scale.v4r23.doc/bl1adm_mmhealth.htm > In addition a new option has been introduced to clock align the monitoring threads in order to reduce CPU jitter. > > Nevertheless, we don't see significant CPU consumption by mmsysmon on our test systems. > It might be a problem specific to your system environment or a wrong configuration therefore please get in contact with IBM support to analyze the root cause of the high usage. > > Kind regards > > Mathias Dietz > > IBM Spectrum Scale - Release Lead Architect and RAS Architect > > > gpfsug-discuss-bounces at spectrumscale.org wrote on 07/18/2017 07:51:21 PM: > > > From: Jonathon A Anderson > > To: gpfsug main discussion list > > Date: 07/18/2017 07:51 PM > > Subject: Re: [gpfsug-discuss] mmsysmon.py revisited > > Sent by: gpfsug-discuss-bounces at spectrumscale.org > > > > There?s no official way to cleanly disable it so far as I know yet; > > but you can defacto disable it by deleting /var/mmfs/mmsysmon/ > > mmsysmonitor.conf. > > > > It?s a huge problem. I don?t understand why it hasn?t been given > > much credit by dev or support. > > > > ~jonathon > > > > > > On 7/18/17, 11:21 AM, "gpfsug-discuss-bounces at spectrumscale.org on > > behalf of David Johnson" > on behalf of david_johnson at brown.edu> wrote: > > > > > > > > > > We also noticed a fair amount of CPU time accumulated by mmsysmon.py on > > our diskless compute nodes. I read the earlier query, where it > > was answered: > > > > > > > > > > ces == Cluster Export Services, mmsysmon.py comes from > > mmcesmon. It is used for managing export services of GPFS. If it is > > killed, your nfs/smb etc will be out of work. > > Their overhead is small and they are very important. Don't > > attempt to kill them. > > > > > > > > > > > > > > Our question is this ? we don?t run the latest ?protocols", our > > NFS is CNFS, and our CIFS is clustered CIFS. > > I can understand it might be needed with Ganesha, but on every node? > > > > > > Why in the world would I be getting this daemon running on all > > client nodes, when I didn?t install the ?protocols" version > > of the distribution? We have release 4.2.2 at the moment. How > > can we disable this? > > > > > > Thanks, > > ? ddj > > > > > > _______________________________________________ > > gpfsug-discuss mailing list > > gpfsug-discuss at spectrumscale.org > > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From mdharris at us.ibm.com Wed Jul 19 15:40:17 2017 From: mdharris at us.ibm.com (Michael D Harris) Date: Wed, 19 Jul 2017 10:40:17 -0400 Subject: [gpfsug-discuss] mmsysmon.py revisited In-Reply-To: References: Message-ID: Hi David, Re: "The impact we were seeing was some variation in MPI benchmark results when the nodes were fully loaded." MPI workloads show the most mmhealth impact. Specifically the more sensitive the workload is to jitter the higher the potential impact. The mmhealth config interval, as per Mathias's link, is a scalar applied to all monitor interval values in the configuration file. As such it currently modifies the server side monitoring and health reporting in addition to mitigating mpi client impact. So "medium" == 5 is a good perhaps reasonable value - whereas the "slow" == 10 scalar may be too infrequent for your server side monitoring and reporting (so your 30 second update becomes 5 minutes). The clock alignment that Mathias mentioned is a new investigatory undocumented tool for MPI workloads. It nearly completely removes all mmhealth MPI jitter while retaining default monitor intervals. It also naturally generates thundering herds of all client reporting to the quorum nodes. So while you may mitigate the client MPI jitter you may severely impact the server throughput on those intervals if not also exceed connection and thread limits. Configuring "clients" separately from "servers" without resorting to alignment is another area of investigation. I'm not familiar with your PMR but as Mathias mentioned "mmhealth config interval medium" would be a good start. In testing that Kums and I have done the "mmhealth config interval medium" value provides mitigation almost as good as the mentioned clock alignment for MPI for say a psnap with barrier type workload . Regards, Mike Harris IBM Spectrum Scale - Core Team From: gpfsug-discuss-request at spectrumscale.org To: gpfsug-discuss at spectrumscale.org Date: 07/19/2017 09:28 AM Subject: gpfsug-discuss Digest, Vol 66, Issue 30 Sent by: gpfsug-discuss-bounces at spectrumscale.org Send gpfsug-discuss mailing list submissions to gpfsug-discuss at spectrumscale.org To subscribe or unsubscribe via the World Wide Web, visit http://gpfsug.org/mailman/listinfo/gpfsug-discuss or, via email, send a message with subject or body 'help' to gpfsug-discuss-request at spectrumscale.org You can reach the person managing the list at gpfsug-discuss-owner at spectrumscale.org When replying, please edit your Subject line so it is more specific than "Re: Contents of gpfsug-discuss digest..." Today's Topics: 1. Re: mmsysmon.py revisited (Mathias Dietz) 2. Re: mmsysmon.py revisited (David Johnson) ---------------------------------------------------------------------- Message: 1 Date: Wed, 19 Jul 2017 15:05:49 +0200 From: "Mathias Dietz" To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] mmsysmon.py revisited Message-ID: Content-Type: text/plain; charset="iso-8859-1" thanks for the feedback. Let me clarify what mmsysmon is doing. Since IBM Spectrum Scale 4.2.1 the mmsysmon process is used for the overall health monitoring and CES failover handling. Even without CES it is an essential part of the system because it monitors the individual components and provides health state information and error events. This information is needed by other Spectrum Scale components (mmhealth command, the IBM Spectrum Scale GUI, Support tools, Install Toolkit,..) and therefore disabling mmsysmon will impact them. > It?s a huge problem. I don?t understand why it hasn?t been given > much credit by dev or support. Over the last couple of month, the development team has put a strong focus on this topic. In order to monitor the health of the individual components, mmsysmon listens for notifications/callback but also has to do some polling. We are trying to reduce the polling overhead constantly and replace polling with notifications when possible. Several improvements have been added to 4.2.3, including the ability to configure the polling frequency to reduce the overhead. (mmhealth config interval) See https://www.ibm.com/support/knowledgecenter/en/STXKQY_4.2.3/com.ibm.spectrum.scale.v4r23.doc/bl1adm_mmhealth.htm In addition a new option has been introduced to clock align the monitoring threads in order to reduce CPU jitter. Nevertheless, we don't see significant CPU consumption by mmsysmon on our test systems. It might be a problem specific to your system environment or a wrong configuration therefore please get in contact with IBM support to analyze the root cause of the high usage. Kind regards Mathias Dietz IBM Spectrum Scale - Release Lead Architect and RAS Architect gpfsug-discuss-bounces at spectrumscale.org wrote on 07/18/2017 07:51:21 PM: > From: Jonathon A Anderson > To: gpfsug main discussion list > Date: 07/18/2017 07:51 PM > Subject: Re: [gpfsug-discuss] mmsysmon.py revisited > Sent by: gpfsug-discuss-bounces at spectrumscale.org > > There?s no official way to cleanly disable it so far as I know yet; > but you can defacto disable it by deleting /var/mmfs/mmsysmon/ > mmsysmonitor.conf. > > It?s a huge problem. I don?t understand why it hasn?t been given > much credit by dev or support. > > ~jonathon > > > On 7/18/17, 11:21 AM, "gpfsug-discuss-bounces at spectrumscale.org on > behalf of David Johnson" on behalf of david_johnson at brown.edu> wrote: > > > > > We also noticed a fair amount of CPU time accumulated by mmsysmon.py on > our diskless compute nodes. I read the earlier query, where it > was answered: > > > > > ces == Cluster Export Services, mmsysmon.py comes from > mmcesmon. It is used for managing export services of GPFS. If it is > killed, your nfs/smb etc will be out of work. > Their overhead is small and they are very important. Don't > attempt to kill them. > > > > > > > Our question is this ? we don?t run the latest ?protocols", our > NFS is CNFS, and our CIFS is clustered CIFS. > I can understand it might be needed with Ganesha, but on every node? > > > Why in the world would I be getting this daemon running on all > client nodes, when I didn?t install the ?protocols" version > of the distribution? We have release 4.2.2 at the moment. How > can we disable this? > > > Thanks, > ? ddj > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: < http://gpfsug.org/pipermail/gpfsug-discuss/attachments/20170719/8c0e33e9/attachment-0001.html > ------------------------------ Message: 2 Date: Wed, 19 Jul 2017 09:28:23 -0400 From: David Johnson To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] mmsysmon.py revisited Message-ID: Content-Type: text/plain; charset="utf-8" I have opened a PMR, and the official response reflects what you just posted. In addition, it seems there are some performance issues with Python 2 that will be improved with eventual migration to Python 3. I was unaware of the mmhealth functions that the mmsysmon daemon provides. The impact we were seeing was some variation in MPI benchmark results when the nodes were fully loaded. I suppose it would be possible to turn off mmsysmon during the benchmarking, but I appreciate the effort at streamlining the monitor service. Cutting back on fork/exec, better python, less polling, more notifications? all good. Thanks for the details, ? ddj > On Jul 19, 2017, at 9:05 AM, Mathias Dietz wrote: > > thanks for the feedback. > > Let me clarify what mmsysmon is doing. > Since IBM Spectrum Scale 4.2.1 the mmsysmon process is used for the overall health monitoring and CES failover handling. > Even without CES it is an essential part of the system because it monitors the individual components and provides health state information and error events. > This information is needed by other Spectrum Scale components (mmhealth command, the IBM Spectrum Scale GUI, Support tools, Install Toolkit,..) and therefore disabling mmsysmon will impact them. > > > It?s a huge problem. I don?t understand why it hasn?t been given > > much credit by dev or support. > > Over the last couple of month, the development team has put a strong focus on this topic. > In order to monitor the health of the individual components, mmsysmon listens for notifications/callback but also has to do some polling. > We are trying to reduce the polling overhead constantly and replace polling with notifications when possible. > > Several improvements have been added to 4.2.3, including the ability to configure the polling frequency to reduce the overhead. (mmhealth config interval) > See https://www.ibm.com/support/knowledgecenter/en/STXKQY_4.2.3/com.ibm.spectrum.scale.v4r23.doc/bl1adm_mmhealth.htm < https://www.ibm.com/support/knowledgecenter/en/STXKQY_4.2.3/com.ibm.spectrum.scale.v4r23.doc/bl1adm_mmhealth.htm > > In addition a new option has been introduced to clock align the monitoring threads in order to reduce CPU jitter. > > Nevertheless, we don't see significant CPU consumption by mmsysmon on our test systems. > It might be a problem specific to your system environment or a wrong configuration therefore please get in contact with IBM support to analyze the root cause of the high usage. > > Kind regards > > Mathias Dietz > > IBM Spectrum Scale - Release Lead Architect and RAS Architect > > > gpfsug-discuss-bounces at spectrumscale.org wrote on 07/18/2017 07:51:21 PM: > > > From: Jonathon A Anderson > > To: gpfsug main discussion list > > Date: 07/18/2017 07:51 PM > > Subject: Re: [gpfsug-discuss] mmsysmon.py revisited > > Sent by: gpfsug-discuss-bounces at spectrumscale.org > > > > There?s no official way to cleanly disable it so far as I know yet; > > but you can defacto disable it by deleting /var/mmfs/mmsysmon/ > > mmsysmonitor.conf. > > > > It?s a huge problem. I don?t understand why it hasn?t been given > > much credit by dev or support. > > > > ~jonathon > > > > > > On 7/18/17, 11:21 AM, "gpfsug-discuss-bounces at spectrumscale.org on > > behalf of David Johnson" > on behalf of david_johnson at brown.edu> wrote: > > > > > > > > > > We also noticed a fair amount of CPU time accumulated by mmsysmon.py on > > our diskless compute nodes. I read the earlier query, where it > > was answered: > > > > > > > > > > ces == Cluster Export Services, mmsysmon.py comes from > > mmcesmon. It is used for managing export services of GPFS. If it is > > killed, your nfs/smb etc will be out of work. > > Their overhead is small and they are very important. Don't > > attempt to kill them. > > > > > > > > > > > > > > Our question is this ? we don?t run the latest ?protocols", our > > NFS is CNFS, and our CIFS is clustered CIFS. > > I can understand it might be needed with Ganesha, but on every node? > > > > > > Why in the world would I be getting this daemon running on all > > client nodes, when I didn?t install the ?protocols" version > > of the distribution? We have release 4.2.2 at the moment. How > > can we disable this? > > > > > > Thanks, > > ? ddj > > > > > > _______________________________________________ > > gpfsug-discuss mailing list > > gpfsug-discuss at spectrumscale.org > > http://gpfsug.org/mailman/listinfo/gpfsug-discuss < http://gpfsug.org/mailman/listinfo/gpfsug-discuss> > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: < http://gpfsug.org/pipermail/gpfsug-discuss/attachments/20170719/669c525b/attachment.html > ------------------------------ _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss End of gpfsug-discuss Digest, Vol 66, Issue 30 ********************************************** -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: From jonathon.anderson at colorado.edu Wed Jul 19 18:52:14 2017 From: jonathon.anderson at colorado.edu (Jonathon A Anderson) Date: Wed, 19 Jul 2017 17:52:14 +0000 Subject: [gpfsug-discuss] mmsysmon.py revisited In-Reply-To: References: Message-ID: > It might be a problem specific to your system environment or a wrong configuration therefore please get in contact with IBM support to analyze the root cause of the high usage. I suspect it?s actually a result of frequent IO interrupts causing jitter in conflict with MPI on the shared Intel Omni-Path network, in our case. We?ve already tried pursuing support on this through our vendor, DDN, and got no-where. Eventually we were the ones who tried killing mmsysmon, and that fixed our problem. The official company line of ?we don't see significant CPU consumption by mmsysmon on our test systems? isn?t helping. Do you have a test system with OPA? ~jonathon On 7/19/17, 7:05 AM, "gpfsug-discuss-bounces at spectrumscale.org on behalf of Mathias Dietz" wrote: thanks for the feedback. Let me clarify what mmsysmon is doing. Since IBM Spectrum Scale 4.2.1 the mmsysmon process is used for the overall health monitoring and CES failover handling. Even without CES it is an essential part of the system because it monitors the individual components and provides health state information and error events. This information is needed by other Spectrum Scale components (mmhealth command, the IBM Spectrum Scale GUI, Support tools, Install Toolkit,..) and therefore disabling mmsysmon will impact them. > It?s a huge problem. I don?t understand why it hasn?t been given > much credit by dev or support. Over the last couple of month, the development team has put a strong focus on this topic. In order to monitor the health of the individual components, mmsysmon listens for notifications/callback but also has to do some polling. We are trying to reduce the polling overhead constantly and replace polling with notifications when possible. Several improvements have been added to 4.2.3, including the ability to configure the polling frequency to reduce the overhead. (mmhealth config interval) See https://www.ibm.com/support/knowledgecenter/en/STXKQY_4.2.3/com.ibm.spectrum.scale.v4r23.doc/bl1adm_mmhealth.htm In addition a new option has been introduced to clock align the monitoring threads in order to reduce CPU jitter. Nevertheless, we don't see significant CPU consumption by mmsysmon on our test systems. It might be a problem specific to your system environment or a wrong configuration therefore please get in contact with IBM support to analyze the root cause of the high usage. Kind regards Mathias Dietz IBM Spectrum Scale - Release Lead Architect and RAS Architect gpfsug-discuss-bounces at spectrumscale.org wrote on 07/18/2017 07:51:21 PM: > From: Jonathon A Anderson > To: gpfsug main discussion list > Date: 07/18/2017 07:51 PM > Subject: Re: [gpfsug-discuss] mmsysmon.py revisited > Sent by: gpfsug-discuss-bounces at spectrumscale.org > > There?s no official way to cleanly disable it so far as I know yet; > but you can defacto disable it by deleting /var/mmfs/mmsysmon/ > mmsysmonitor.conf. > > It?s a huge problem. I don?t understand why it hasn?t been given > much credit by dev or support. > > ~jonathon > > > On 7/18/17, 11:21 AM, "gpfsug-discuss-bounces at spectrumscale.org on > behalf of David Johnson" on behalf of david_johnson at brown.edu> wrote: > > > > > We also noticed a fair amount of CPU time accumulated by mmsysmon.py on > our diskless compute nodes. I read the earlier query, where it > was answered: > > > > > ces == Cluster Export Services, mmsysmon.py comes from > mmcesmon. It is used for managing export services of GPFS. If it is > killed, your nfs/smb etc will be out of work. > Their overhead is small and they are very important. Don't > attempt to kill them. > > > > > > > Our question is this ? we don?t run the latest ?protocols", our > NFS is CNFS, and our CIFS is clustered CIFS. > I can understand it might be needed with Ganesha, but on every node? > > > Why in the world would I be getting this daemon running on all > client nodes, when I didn?t install the ?protocols" version > of the distribution? We have release 4.2.2 at the moment. How > can we disable this? > > > Thanks, > ? ddj > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss From david_johnson at brown.edu Wed Jul 19 19:12:37 2017 From: david_johnson at brown.edu (david_johnson at brown.edu) Date: Wed, 19 Jul 2017 14:12:37 -0400 Subject: [gpfsug-discuss] mmsysmon.py revisited In-Reply-To: References: Message-ID: <85E16CE6-FAF6-4C8E-80A9-3ED66580BD21@brown.edu> We have FDR14 Mellanox fabric, probably similar interrupt load as OPA. -- ddj Dave Johnson On Jul 19, 2017, at 1:52 PM, Jonathon A Anderson wrote: >> It might be a problem specific to your system environment or a wrong configuration therefore please get in contact with IBM support to analyze the root cause of the high usage. > > I suspect it?s actually a result of frequent IO interrupts causing jitter in conflict with MPI on the shared Intel Omni-Path network, in our case. > > We?ve already tried pursuing support on this through our vendor, DDN, and got no-where. Eventually we were the ones who tried killing mmsysmon, and that fixed our problem. > > The official company line of ?we don't see significant CPU consumption by mmsysmon on our test systems? isn?t helping. Do you have a test system with OPA? > > ~jonathon > > > On 7/19/17, 7:05 AM, "gpfsug-discuss-bounces at spectrumscale.org on behalf of Mathias Dietz" wrote: > > thanks for the feedback. > > Let me clarify what mmsysmon is doing. > Since IBM Spectrum Scale 4.2.1 the mmsysmon process is used for the overall health monitoring and CES failover handling. > Even without CES it is an essential part of the system because it monitors the individual components and provides health state information and error events. > > This information is needed by other Spectrum Scale components (mmhealth command, the IBM Spectrum Scale GUI, Support tools, Install Toolkit,..) and therefore disabling mmsysmon will impact them. > > >> It?s a huge problem. I don?t understand why it hasn?t been given > >> much credit by dev or support. > > Over the last couple of month, the development team has put a strong focus on this topic. > > In order to monitor the health of the individual components, mmsysmon listens for notifications/callback but also has to do some polling. > We are trying to reduce the polling overhead constantly and replace polling with notifications when possible. > > > Several improvements have been added to 4.2.3, including the ability to configure the polling frequency to reduce the overhead. (mmhealth config interval) > > See https://www.ibm.com/support/knowledgecenter/en/STXKQY_4.2.3/com.ibm.spectrum.scale.v4r23.doc/bl1adm_mmhealth.htm > In addition a new option has been introduced to clock align the monitoring threads in order to reduce CPU jitter. > > > Nevertheless, we don't see significant CPU consumption by mmsysmon on our test systems. > > It might be a problem specific to your system environment or a wrong configuration therefore please get in contact with IBM support to analyze the root cause of the high usage. > > Kind regards > > Mathias Dietz > > IBM Spectrum Scale - Release Lead Architect and RAS Architect > > > > gpfsug-discuss-bounces at spectrumscale.org wrote on 07/18/2017 07:51:21 PM: > >> From: Jonathon A Anderson >> To: gpfsug main discussion list >> Date: 07/18/2017 07:51 PM >> Subject: Re: [gpfsug-discuss] mmsysmon.py revisited >> Sent by: gpfsug-discuss-bounces at spectrumscale.org >> >> There?s no official way to cleanly disable it so far as I know yet; >> but you can defacto disable it by deleting /var/mmfs/mmsysmon/ >> mmsysmonitor.conf. >> >> It?s a huge problem. I don?t understand why it hasn?t been given >> much credit by dev or support. >> >> ~jonathon >> >> >> On 7/18/17, 11:21 AM, "gpfsug-discuss-bounces at spectrumscale.org on >> behalf of David Johnson" > on behalf of david_johnson at brown.edu> wrote: >> >> >> >> >> We also noticed a fair amount of CPU time accumulated by mmsysmon.py on >> our diskless compute nodes. I read the earlier query, where it >> was answered: >> >> >> >> >> ces == Cluster Export Services, mmsysmon.py comes from >> mmcesmon. It is used for managing export services of GPFS. If it is >> killed, your nfs/smb etc will be out of work. >> Their overhead is small and they are very important. Don't >> attempt to kill them. >> >> >> >> >> >> >> Our question is this ? we don?t run the latest ?protocols", our >> NFS is CNFS, and our CIFS is clustered CIFS. >> I can understand it might be needed with Ganesha, but on every node? >> >> >> Why in the world would I be getting this daemon running on all >> client nodes, when I didn?t install the ?protocols" version >> of the distribution? We have release 4.2.2 at the moment. How >> can we disable this? >> >> >> Thanks, >> ? ddj >> >> >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at spectrumscale.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss From jonathon.anderson at colorado.edu Wed Jul 19 19:29:22 2017 From: jonathon.anderson at colorado.edu (Jonathon A Anderson) Date: Wed, 19 Jul 2017 18:29:22 +0000 Subject: [gpfsug-discuss] mmsysmon.py revisited In-Reply-To: <85E16CE6-FAF6-4C8E-80A9-3ED66580BD21@brown.edu> References: <85E16CE6-FAF6-4C8E-80A9-3ED66580BD21@brown.edu> Message-ID: OPA behaves _significantly_ differently from Mellanox IB. OPA uses the host CPU for packet processing, whereas Mellanox IB uses a discrete asic on the HBA. As a result, OPA is much more sensitive to task placement and interrupts, in our experience, because the host CPU load competes with the fabric IO processing load. ~jonathon On 7/19/17, 12:12 PM, "gpfsug-discuss-bounces at spectrumscale.org on behalf of david_johnson at brown.edu" wrote: We have FDR14 Mellanox fabric, probably similar interrupt load as OPA. -- ddj Dave Johnson On Jul 19, 2017, at 1:52 PM, Jonathon A Anderson wrote: >> It might be a problem specific to your system environment or a wrong configuration therefore please get in contact with IBM support to analyze the root cause of the high usage. > > I suspect it?s actually a result of frequent IO interrupts causing jitter in conflict with MPI on the shared Intel Omni-Path network, in our case. > > We?ve already tried pursuing support on this through our vendor, DDN, and got no-where. Eventually we were the ones who tried killing mmsysmon, and that fixed our problem. > > The official company line of ?we don't see significant CPU consumption by mmsysmon on our test systems? isn?t helping. Do you have a test system with OPA? > > ~jonathon > > > On 7/19/17, 7:05 AM, "gpfsug-discuss-bounces at spectrumscale.org on behalf of Mathias Dietz" wrote: > > thanks for the feedback. > > Let me clarify what mmsysmon is doing. > Since IBM Spectrum Scale 4.2.1 the mmsysmon process is used for the overall health monitoring and CES failover handling. > Even without CES it is an essential part of the system because it monitors the individual components and provides health state information and error events. > > This information is needed by other Spectrum Scale components (mmhealth command, the IBM Spectrum Scale GUI, Support tools, Install Toolkit,..) and therefore disabling mmsysmon will impact them. > > >> It?s a huge problem. I don?t understand why it hasn?t been given > >> much credit by dev or support. > > Over the last couple of month, the development team has put a strong focus on this topic. > > In order to monitor the health of the individual components, mmsysmon listens for notifications/callback but also has to do some polling. > We are trying to reduce the polling overhead constantly and replace polling with notifications when possible. > > > Several improvements have been added to 4.2.3, including the ability to configure the polling frequency to reduce the overhead. (mmhealth config interval) > > See https://www.ibm.com/support/knowledgecenter/en/STXKQY_4.2.3/com.ibm.spectrum.scale.v4r23.doc/bl1adm_mmhealth.htm > In addition a new option has been introduced to clock align the monitoring threads in order to reduce CPU jitter. > > > Nevertheless, we don't see significant CPU consumption by mmsysmon on our test systems. > > It might be a problem specific to your system environment or a wrong configuration therefore please get in contact with IBM support to analyze the root cause of the high usage. > > Kind regards > > Mathias Dietz > > IBM Spectrum Scale - Release Lead Architect and RAS Architect > > > > gpfsug-discuss-bounces at spectrumscale.org wrote on 07/18/2017 07:51:21 PM: > >> From: Jonathon A Anderson >> To: gpfsug main discussion list >> Date: 07/18/2017 07:51 PM >> Subject: Re: [gpfsug-discuss] mmsysmon.py revisited >> Sent by: gpfsug-discuss-bounces at spectrumscale.org >> >> There?s no official way to cleanly disable it so far as I know yet; >> but you can defacto disable it by deleting /var/mmfs/mmsysmon/ >> mmsysmonitor.conf. >> >> It?s a huge problem. I don?t understand why it hasn?t been given >> much credit by dev or support. >> >> ~jonathon >> >> >> On 7/18/17, 11:21 AM, "gpfsug-discuss-bounces at spectrumscale.org on >> behalf of David Johnson" > on behalf of david_johnson at brown.edu> wrote: >> >> >> >> >> We also noticed a fair amount of CPU time accumulated by mmsysmon.py on >> our diskless compute nodes. I read the earlier query, where it >> was answered: >> >> >> >> >> ces == Cluster Export Services, mmsysmon.py comes from >> mmcesmon. It is used for managing export services of GPFS. If it is >> killed, your nfs/smb etc will be out of work. >> Their overhead is small and they are very important. Don't >> attempt to kill them. >> >> >> >> >> >> >> Our question is this ? we don?t run the latest ?protocols", our >> NFS is CNFS, and our CIFS is clustered CIFS. >> I can understand it might be needed with Ganesha, but on every node? >> >> >> Why in the world would I be getting this daemon running on all >> client nodes, when I didn?t install the ?protocols" version >> of the distribution? We have release 4.2.2 at the moment. How >> can we disable this? >> >> >> Thanks, >> ? ddj >> >> >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at spectrumscale.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss From john.hearns at asml.com Thu Jul 20 08:39:29 2017 From: john.hearns at asml.com (John Hearns) Date: Thu, 20 Jul 2017 07:39:29 +0000 Subject: [gpfsug-discuss] mmsysmon.py revisited In-Reply-To: References: <85E16CE6-FAF6-4C8E-80A9-3ED66580BD21@brown.edu> Message-ID: This is really interesting. I know we can look at the interrupt rates of course, but is there a way we can quantify the effects of interrupts / OS jitter here? -----Original Message----- From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Jonathon A Anderson Sent: Wednesday, July 19, 2017 8:29 PM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] mmsysmon.py revisited OPA behaves _significantly_ differently from Mellanox IB. OPA uses the host CPU for packet processing, whereas Mellanox IB uses a discrete asic on the HBA. As a result, OPA is much more sensitive to task placement and interrupts, in our experience, because the host CPU load competes with the fabric IO processing load. ~jonathon On 7/19/17, 12:12 PM, "gpfsug-discuss-bounces at spectrumscale.org on behalf of david_johnson at brown.edu" wrote: We have FDR14 Mellanox fabric, probably similar interrupt load as OPA. -- ddj Dave Johnson On Jul 19, 2017, at 1:52 PM, Jonathon A Anderson wrote: >> It might be a problem specific to your system environment or a wrong configuration therefore please get in contact with IBM support to analyze the root cause of the high usage. > > I suspect it?s actually a result of frequent IO interrupts causing jitter in conflict with MPI on the shared Intel Omni-Path network, in our case. > > We?ve already tried pursuing support on this through our vendor, DDN, and got no-where. Eventually we were the ones who tried killing mmsysmon, and that fixed our problem. > > The official company line of ?we don't see significant CPU consumption by mmsysmon on our test systems? isn?t helping. Do you have a test system with OPA? > > ~jonathon > > > On 7/19/17, 7:05 AM, "gpfsug-discuss-bounces at spectrumscale.org on behalf of Mathias Dietz" wrote: > > thanks for the feedback. > > Let me clarify what mmsysmon is doing. > Since IBM Spectrum Scale 4.2.1 the mmsysmon process is used for the overall health monitoring and CES failover handling. > Even without CES it is an essential part of the system because it monitors the individual components and provides health state information and error events. > > This information is needed by other Spectrum Scale components (mmhealth command, the IBM Spectrum Scale GUI, Support tools, Install Toolkit,..) and therefore disabling mmsysmon will impact them. > > >> It?s a huge problem. I don?t understand why it hasn?t been given > >> much credit by dev or support. > > Over the last couple of month, the development team has put a strong focus on this topic. > > In order to monitor the health of the individual components, mmsysmon listens for notifications/callback but also has to do some polling. > We are trying to reduce the polling overhead constantly and replace polling with notifications when possible. > > > Several improvements have been added to 4.2.3, including the ability to configure the polling frequency to reduce the overhead. (mmhealth config interval) > > See https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.ibm.com%2Fsupport%2Fknowledgecenter%2Fen%2FSTXKQY_4.2.3%2Fcom.ibm.spectrum.scale.v4r23.doc%2Fbl1adm_mmhealth.htm&data=01%7C01%7Cjohn.hearns%40asml.com%7C63ff97a625e4499cb8ba08d4ced41754%7Caf73baa8f5944eb2a39d93e96cad61fc%7C1&sdata=Uzdg4ogcQwidNfi8TMp%2FdCMqnSLTFxU4y8n2ub%2F28xQ%3D&reserved=0 > In addition a new option has been introduced to clock align the monitoring threads in order to reduce CPU jitter. > > > Nevertheless, we don't see significant CPU consumption by mmsysmon on our test systems. > > It might be a problem specific to your system environment or a wrong configuration therefore please get in contact with IBM support to analyze the root cause of the high usage. > > Kind regards > > Mathias Dietz > > IBM Spectrum Scale - Release Lead Architect and RAS Architect > > > > gpfsug-discuss-bounces at spectrumscale.org wrote on 07/18/2017 07:51:21 PM: > >> From: Jonathon A Anderson >> To: gpfsug main discussion list >> Date: 07/18/2017 07:51 PM >> Subject: Re: [gpfsug-discuss] mmsysmon.py revisited >> Sent by: gpfsug-discuss-bounces at spectrumscale.org >> >> There?s no official way to cleanly disable it so far as I know yet; >> but you can defacto disable it by deleting /var/mmfs/mmsysmon/ >> mmsysmonitor.conf. >> >> It?s a huge problem. I don?t understand why it hasn?t been given >> much credit by dev or support. >> >> ~jonathon >> >> >> On 7/18/17, 11:21 AM, "gpfsug-discuss-bounces at spectrumscale.org on >> behalf of David Johnson" > on behalf of david_johnson at brown.edu> wrote: >> >> >> >> >> We also noticed a fair amount of CPU time accumulated by mmsysmon.py on >> our diskless compute nodes. I read the earlier query, where it >> was answered: >> >> >> >> >> ces == Cluster Export Services, mmsysmon.py comes from >> mmcesmon. It is used for managing export services of GPFS. If it is >> killed, your nfs/smb etc will be out of work. >> Their overhead is small and they are very important. Don't >> attempt to kill them. >> >> >> >> >> >> >> Our question is this ? we don?t run the latest ?protocols", our >> NFS is CNFS, and our CIFS is clustered CIFS. >> I can understand it might be needed with Ganesha, but on every node? >> >> >> Why in the world would I be getting this daemon running on all >> client nodes, when I didn?t install the ?protocols" version >> of the distribution? We have release 4.2.2 at the moment. How >> can we disable this? >> >> >> Thanks, >> ? ddj >> >> >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at spectrumscale.org >> https://emea01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fgpfsug.org%2Fmailman%2Flistinfo%2Fgpfsug-discuss&data=01%7C01%7Cjohn.hearns%40asml.com%7C63ff97a625e4499cb8ba08d4ced41754%7Caf73baa8f5944eb2a39d93e96cad61fc%7C1&sdata=9yatmHdcxwy%2FD%2FoZuLVDkPjpIeS9F7crTLl2MoUUIyo%3D&reserved=0 > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > https://emea01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fgpfsug.org%2Fmailman%2Flistinfo%2Fgpfsug-discuss&data=01%7C01%7Cjohn.hearns%40asml.com%7C63ff97a625e4499cb8ba08d4ced41754%7Caf73baa8f5944eb2a39d93e96cad61fc%7C1&sdata=9yatmHdcxwy%2FD%2FoZuLVDkPjpIeS9F7crTLl2MoUUIyo%3D&reserved=0 _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://emea01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fgpfsug.org%2Fmailman%2Flistinfo%2Fgpfsug-discuss&data=01%7C01%7Cjohn.hearns%40asml.com%7C63ff97a625e4499cb8ba08d4ced41754%7Caf73baa8f5944eb2a39d93e96cad61fc%7C1&sdata=9yatmHdcxwy%2FD%2FoZuLVDkPjpIeS9F7crTLl2MoUUIyo%3D&reserved=0 _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://emea01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fgpfsug.org%2Fmailman%2Flistinfo%2Fgpfsug-discuss&data=01%7C01%7Cjohn.hearns%40asml.com%7C63ff97a625e4499cb8ba08d4ced41754%7Caf73baa8f5944eb2a39d93e96cad61fc%7C1&sdata=9yatmHdcxwy%2FD%2FoZuLVDkPjpIeS9F7crTLl2MoUUIyo%3D&reserved=0 -- The information contained in this communication and any attachments is confidential and may be privileged, and is for the sole use of the intended recipient(s). Any unauthorized review, use, disclosure or distribution is prohibited. Unless explicitly stated otherwise in the body of this communication or the attachment thereto (if any), the information is provided on an AS-IS basis without any express or implied warranties or liabilities. To the extent you are relying on this information, you are doing so at your own risk. If you are not the intended recipient, please notify the sender immediately by replying to this message and destroy all copies of this message and any attachments. Neither the sender nor the company/group of companies he or she represents shall be liable for the proper and complete transmission of the information contained in this communication, or for any delay in its receipt. From MDIETZ at de.ibm.com Thu Jul 20 10:30:50 2017 From: MDIETZ at de.ibm.com (Mathias Dietz) Date: Thu, 20 Jul 2017 11:30:50 +0200 Subject: [gpfsug-discuss] mmsysmon.py revisited In-Reply-To: References: Message-ID: Jonathon, its important to separate the two issues "high CPU consumption" and "CPU Jitter". As mentioned, we are aware of the CPU jitter issue and already put several improvements in place. (more to come with the next release) Did you try with a lower polling frequency and/or enabling clock alignment as Mike suggested ? Non-MPI workloads are usually not impacted by CPU jitter, but might be impacted by high CPU consumption. But we don't see such such high CPU consumption in the lab and therefore ask affected customers to get in contact with IBM support to find the root cause. Kind regards Mathias Dietz IBM Spectrum Scale - Release Lead Architect and RAS Architect gpfsug-discuss-bounces at spectrumscale.org wrote on 07/19/2017 07:52:14 PM: > From: Jonathon A Anderson > To: gpfsug main discussion list > Date: 07/19/2017 07:52 PM > Subject: Re: [gpfsug-discuss] mmsysmon.py revisited > Sent by: gpfsug-discuss-bounces at spectrumscale.org > > > It might be a problem specific to your system environment or a > wrong configuration therefore please get in contact with IBM support > to analyze the root cause of the high usage. > > I suspect it?s actually a result of frequent IO interrupts causing > jitter in conflict with MPI on the shared Intel Omni-Path network, > in our case. > > We?ve already tried pursuing support on this through our vendor, > DDN, and got no-where. Eventually we were the ones who tried killing > mmsysmon, and that fixed our problem. > > The official company line of ?we don't see significant CPU > consumption by mmsysmon on our test systems? isn?t helping. Do you > have a test system with OPA? > > ~jonathon > > > On 7/19/17, 7:05 AM, "gpfsug-discuss-bounces at spectrumscale.org on > behalf of Mathias Dietz" on behalf of MDIETZ at de.ibm.com> wrote: > > thanks for the feedback. > > Let me clarify what mmsysmon is doing. > Since IBM Spectrum Scale 4.2.1 the mmsysmon process is used for > the overall health monitoring and CES failover handling. > Even without CES it is an essential part of the system because > it monitors the individual components and provides health state > information and error events. > > This information is needed by other Spectrum Scale components > (mmhealth command, the IBM Spectrum Scale GUI, Support tools, > Install Toolkit,..) and therefore disabling mmsysmon will impact them. > > > > It?s a huge problem. I don?t understand why it hasn?t been given > > > much credit by dev or support. > > Over the last couple of month, the development team has put a > strong focus on this topic. > > In order to monitor the health of the individual components, > mmsysmon listens for notifications/callback but also has to do some polling. > We are trying to reduce the polling overhead constantly and > replace polling with notifications when possible. > > > Several improvements have been added to 4.2.3, including the > ability to configure the polling frequency to reduce the overhead. > (mmhealth config interval) > > See https://www.ibm.com/support/knowledgecenter/en/STXKQY_4.2.3/ > com.ibm.spectrum.scale.v4r23.doc/bl1adm_mmhealth.htm > In addition a new option has been introduced to clock align the > monitoring threads in order to reduce CPU jitter. > > > Nevertheless, we don't see significant CPU consumption by > mmsysmon on our test systems. > > It might be a problem specific to your system environment or a > wrong configuration therefore please get in contact with IBM support > to analyze the root cause of the high usage. > > Kind regards > > Mathias Dietz > > IBM Spectrum Scale - Release Lead Architect and RAS Architect > > > > gpfsug-discuss-bounces at spectrumscale.org wrote on 07/18/2017 07:51:21 PM: > > > From: Jonathon A Anderson > > To: gpfsug main discussion list > > Date: 07/18/2017 07:51 PM > > Subject: Re: [gpfsug-discuss] mmsysmon.py revisited > > Sent by: gpfsug-discuss-bounces at spectrumscale.org > > > > There?s no official way to cleanly disable it so far as I know yet; > > but you can defacto disable it by deleting /var/mmfs/mmsysmon/ > > mmsysmonitor.conf. > > > > It?s a huge problem. I don?t understand why it hasn?t been given > > much credit by dev or support. > > > > ~jonathon > > > > > > On 7/18/17, 11:21 AM, "gpfsug-discuss-bounces at spectrumscale.org on > > behalf of David Johnson" > on behalf of david_johnson at brown.edu> wrote: > > > > > > > > > > We also noticed a fair amount of CPU time accumulated by > mmsysmon.py on > > our diskless compute nodes. I read the earlier query, where it > > was answered: > > > > > > > > > > ces == Cluster Export Services, mmsysmon.py comes from > > mmcesmon. It is used for managing export services of GPFS. If it is > > killed, your nfs/smb etc will be out of work. > > Their overhead is small and they are very important. Don't > > attempt to kill them. > > > > > > > > > > > > > > Our question is this ? we don?t run the latest ?protocols", our > > NFS is CNFS, and our CIFS is clustered CIFS. > > I can understand it might be needed with Ganesha, but on > every node? > > > > > > Why in the world would I be getting this daemon running on all > > client nodes, when I didn?t install the ?protocols" version > > of the distribution? We have release 4.2.2 at the moment. How > > can we disable this? > > > > > > Thanks, > > ? ddj > > > > > > _______________________________________________ > > gpfsug-discuss mailing list > > gpfsug-discuss at spectrumscale.org > > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From cgirda at wustl.edu Thu Jul 20 15:57:14 2017 From: cgirda at wustl.edu (Chakravarthy Girda) Date: Thu, 20 Jul 2017 09:57:14 -0500 Subject: [gpfsug-discuss] pmmonitor - ERROR Message-ID: <40d1f871-6081-4db3-8d6a-cec816266a00@wustl.edu> Hi There, I was running a bridge port services to push my stats to grafana. It was running fine until we started some rigorous IOPS testing on the cluster. Now its failing to start with the following error. Questions: 1. Any clues on it fix? 2. Is there anyway I can run this in a service/daemon mode rather than running in a screen session? [root at linuscs107 zimonGrafanaIntf]# python zimonGrafanaIntf.py -s linuscs107.gsc.wustl.edu Failed to initialize MetadataHandler, please check log file for reason #cat pmmonitor.log 2017-07-20 09:41:29,384 - pmmonitor - ERROR - GPFSCapacityUtil::execute: Error sending query in execute, quitting 2017-07-20 09:41:29,552 - pmmonitor - ERROR - GPFSCapacityUtil::execute: Error sending query in execute, quitting 2017-07-20 09:41:30,047 - pmmonitor - ERROR - GPFSCapacityUtil::execute: Error sending query in execute, quitting Thank you Chakri From Robert.Oesterlin at nuance.com Thu Jul 20 16:06:48 2017 From: Robert.Oesterlin at nuance.com (Oesterlin, Robert) Date: Thu, 20 Jul 2017 15:06:48 +0000 Subject: [gpfsug-discuss] mmsysmon and CCR Message-ID: I recently ran into an issue where the frequency of mmsysmon polling (GPFS 4.2.2) was causing issues with CCR updates. I eventually ended decreasing the polling interval to 30 mins (I don?t have any CES) which seemed to solve the issue. So, if you have a large cluster, be on the lookout for CCR issues, if you have that configured. Bob Oesterlin Sr Principal Storage Engineer, Nuance -------------- next part -------------- An HTML attachment was scrubbed... URL: From cgirda at wustl.edu Thu Jul 20 17:38:25 2017 From: cgirda at wustl.edu (Chakravarthy Girda) Date: Thu, 20 Jul 2017 11:38:25 -0500 Subject: [gpfsug-discuss] pmmonitor - ERROR In-Reply-To: <40d1f871-6081-4db3-8d6a-cec816266a00@wustl.edu> References: <40d1f871-6081-4db3-8d6a-cec816266a00@wustl.edu> Message-ID: <31b9b441-f51c-c0d1-11e0-b01a070f9e4e@wustl.edu> cat zserver.log 2017-07-20 11:21:59,001 - zimonGrafanaIntf - ERROR - Could not initialize the QueryHandler, GetHandler::initializeTables failed (errno: None, errmsg: Unable to connect to 10.100.3.150 on port 9084, error number: 111, error code: ECONNREFUSED) 2017-07-20 11:32:29,090 - zimonGrafanaIntf - ERROR - Could not initialize the QueryHandler, GetHandler::initializeTables failed (errno: None, errmsg: Unable to connect to 10.100.3.150 on port 9084, error number: 111, error code: ECONNREFUSED) Thank you Chakri On 7/20/17 9:57 AM, Chakravarthy Girda wrote: > Hi There, > > I was running a bridge port services to push my stats to grafana. It > was running fine until we started some rigorous IOPS testing on the > cluster. Now its failing to start with the following error. > > Questions: > > 1. Any clues on it fix? > 2. Is there anyway I can run this in a service/daemon mode rather than > running in a screen session? > > > [root at linuscs107 zimonGrafanaIntf]# python zimonGrafanaIntf.py -s > linuscs107.gsc.wustl.edu > Failed to initialize MetadataHandler, please check log file for reason > > #cat pmmonitor.log > > 2017-07-20 09:41:29,384 - pmmonitor - ERROR - GPFSCapacityUtil::execute: > Error sending query in execute, quitting > 2017-07-20 09:41:29,552 - pmmonitor - ERROR - GPFSCapacityUtil::execute: > Error sending query in execute, quitting > 2017-07-20 09:41:30,047 - pmmonitor - ERROR - GPFSCapacityUtil::execute: > Error sending query in execute, quitting > > > Thank you > Chakri > > > > > From Robert.Oesterlin at nuance.com Thu Jul 20 17:50:12 2017 From: Robert.Oesterlin at nuance.com (Oesterlin, Robert) Date: Thu, 20 Jul 2017 16:50:12 +0000 Subject: [gpfsug-discuss] pmmonitor - ERROR Message-ID: <3377EEBD-7F1A-4269-83EE-458362762457@nuance.com> This looks like the Grafana bridge could not connect to the pmcollector process - is it running normally? See if some of the normal ?mmprefmon? commands work and/or look at the log file on the pmcollector node. (under /var/log/zimon) You will also see this node when the pmcollector process is still initializing. (reading in the existing database, not ready to service requests) Bob Oesterlin Sr Principal Storage Engineer, Nuance On 7/20/17, 11:38 AM, "gpfsug-discuss-bounces at spectrumscale.org on behalf of Chakravarthy Girda" wrote: 2017-07-20 11:32:29,090 - zimonGrafanaIntf - ERROR - Could not initialize the QueryHandler, GetHandler::initializeTables failed (errno: None, errmsg: Unable to connect to 10.100.3.150 on port 9084, error number: 111, error code: ECONNREFUSED) From mdharris at us.ibm.com Thu Jul 20 17:55:56 2017 From: mdharris at us.ibm.com (Michael D Harris) Date: Thu, 20 Jul 2017 12:55:56 -0400 Subject: [gpfsug-discuss] gpfsug-discuss Digest, Vol 66, Issue 34 In-Reply-To: References: Message-ID: Hi Bob, The CCR monitor interval is addressed in 4.2.3 or 4.2.3 ptf1 Regards, Mike Harris Spectrum Scale Development - Core Team -------------- next part -------------- An HTML attachment was scrubbed... URL: From cgirda at wustl.edu Thu Jul 20 18:12:09 2017 From: cgirda at wustl.edu (Chakravarthy Girda) Date: Thu, 20 Jul 2017 12:12:09 -0500 Subject: [gpfsug-discuss] pmmonitor - ERROR In-Reply-To: <3377EEBD-7F1A-4269-83EE-458362762457@nuance.com> References: <3377EEBD-7F1A-4269-83EE-458362762457@nuance.com> Message-ID: Bob, Your correct. Found the issues with pmcollector services. Fixed issues with pmcollector, resolved the issues. Thank you Chakri On 7/20/17 11:50 AM, Oesterlin, Robert wrote: > You will also see this node when the pmcollector process is still initializing. (reading in the existing database, not ready to service requests) From cgirda at wustl.edu Thu Jul 20 18:30:03 2017 From: cgirda at wustl.edu (Chakravarthy Girda) Date: Thu, 20 Jul 2017 12:30:03 -0500 Subject: [gpfsug-discuss] pmmonitor - ERROR In-Reply-To: References: <3377EEBD-7F1A-4269-83EE-458362762457@nuance.com> Message-ID: <576fe053-e356-dc5c-e71d-18ef96f7ccaa@wustl.edu> Bob, Actually the pmcollector service died in 5min. 2017-07-20 12:11:29,639 - pmmonitor - WARNING - GPFSCapacityUtil: received zero inode/pool total size 2017-07-20 12:16:29,470 - pmmonitor - WARNING - GPFSCapacityUtil: received zero inode/pool total size 2017-07-20 12:16:29,639 - pmmonitor - WARNING - GPFSCapacityUtil: received zero inode/pool total size 2017-07-20 12:21:29,384 - pmmonitor - ERROR - QueryHandler: Socket connection broken, received no data 2017-07-20 12:21:29,384 - pmmonitor - ERROR - GPFSCapacityUtil::execute: Error sending query in execute, quitting 2017-07-20 12:21:29,552 - pmmonitor - ERROR - GPFSCapacityUtil::execute: Error sending query in execute, quitting 2017-07-20 12:21:30,047 - pmmonitor - ERROR - GPFSCapacityUtil::execute: Error sending query in execute, quitting Thank you Chakri On 7/20/17 12:12 PM, Chakravarthy Girda wrote: > Bob, > > Your correct. Found the issues with pmcollector services. Fixed issues > with pmcollector, resolved the issues. > > > Thank you > > Chakri > > > On 7/20/17 11:50 AM, Oesterlin, Robert wrote: >> You will also see this node when the pmcollector process is still initializing. (reading in the existing database, not ready to service requests) From cgirda at wustl.edu Thu Jul 20 21:03:56 2017 From: cgirda at wustl.edu (Chakravarthy Girda) Date: Thu, 20 Jul 2017 15:03:56 -0500 Subject: [gpfsug-discuss] pmmonitor - ERROR In-Reply-To: <576fe053-e356-dc5c-e71d-18ef96f7ccaa@wustl.edu> References: <3377EEBD-7F1A-4269-83EE-458362762457@nuance.com> <576fe053-e356-dc5c-e71d-18ef96f7ccaa@wustl.edu> Message-ID: For now I switched the "zimonGrafanaIntf" to port "4262". So far it didn't crash the pmcollector. Will wait for some more time to ensure its working. * Can we start this process in a daemon or service mode? Thank you Chakri On 7/20/17 12:30 PM, Chakravarthy Girda wrote: > Bob, > > Actually the pmcollector service died in 5min. > > 2017-07-20 12:11:29,639 - pmmonitor - WARNING - GPFSCapacityUtil: > received zero inode/pool total size > 2017-07-20 12:16:29,470 - pmmonitor - WARNING - GPFSCapacityUtil: > received zero inode/pool total size > 2017-07-20 12:16:29,639 - pmmonitor - WARNING - GPFSCapacityUtil: > received zero inode/pool total size > 2017-07-20 12:21:29,384 - pmmonitor - ERROR - QueryHandler: Socket > connection broken, received no data > 2017-07-20 12:21:29,384 - pmmonitor - ERROR - GPFSCapacityUtil::execute: > Error sending query in execute, quitting > 2017-07-20 12:21:29,552 - pmmonitor - ERROR - GPFSCapacityUtil::execute: > Error sending query in execute, quitting > 2017-07-20 12:21:30,047 - pmmonitor - ERROR - GPFSCapacityUtil::execute: > Error sending query in execute, quitting > > Thank you > Chakri > > > On 7/20/17 12:12 PM, Chakravarthy Girda wrote: >> Bob, >> >> Your correct. Found the issues with pmcollector services. Fixed issues >> with pmcollector, resolved the issues. >> >> >> Thank you >> >> Chakri >> >> >> On 7/20/17 11:50 AM, Oesterlin, Robert wrote: >>> You will also see this node when the pmcollector process is still initializing. (reading in the existing database, not ready to service requests) > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss From cgirda at wustl.edu Thu Jul 20 21:42:09 2017 From: cgirda at wustl.edu (Chakravarthy Girda) Date: Thu, 20 Jul 2017 15:42:09 -0500 Subject: [gpfsug-discuss] zimonGrafanaIntf template variable Message-ID: <00372fdc-a0b7-26ac-84c1-aa32c78e4261@wustl.edu> Hi, I imported the pre-built grafana dashboard. https://www.ibm.com/developerworks/community/wikis/form/anonymous/api/wiki/fa32927c-e904-49cc-a4cc-870bcc8e307c/page/a180eb7e-9161-4e07-a6e4-35a0a076f7b3/attachment/5e9a5886-5bd9-4a6f-919e-bc66d16760cf/media/default%20dashboards%20set.zip Get updates from few graphs but not all. I realize that I need to update the template variables. Eg:- I get into the "File Systems View" Variable ( gpfsMetrics_fs1 ) --> Query ( gpfsMetrics_fs1 ) Regex ( /.*[^gpfs_fs_inode_used|gpfs_fs_inode_alloc|gpfs_fs_inode_free|gpfs_fs_inode_max]/ ) Question: * How can I execute the above Query and regex to fix the issues. * Is there any document on CLI options? Thank you Chakri -------------- next part -------------- An HTML attachment was scrubbed... URL: From valdis.kletnieks at vt.edu Fri Jul 21 22:13:17 2017 From: valdis.kletnieks at vt.edu (valdis.kletnieks at vt.edu) Date: Fri, 21 Jul 2017 17:13:17 -0400 Subject: [gpfsug-discuss] GPFS, LTFS/EE and data-in-inode? Message-ID: <28986.1500671597@turing-police.cc.vt.edu> So we're running GPFS 4.2.2.3 and LTFS/EE 1.2.3 to use as an archive service. Inode size is 4K, and we had a requirement to encrypt-at-rest, so encryption is in play as well. Data is replicated 2x and fragment size is 32K. I was investigating how much data-in-inode would help deal with users who put large trees of small files into the archive (yes, I know we can use applypolicy with external programs to tarball offending directories, but that's a separate discussion ;) ## ls -ls * 64 -rw-r--r-- 1 root root 2048 Jul 21 14:47 random.data 64 -rw-r--r-- 1 root root 512 Jul 21 14:48 random.data.1 64 -rw-r--r-- 1 root root 128 Jul 21 14:50 random.data.2 64 -rw-r--r-- 1 root root 32 Jul 21 14:50 random.data.3 64 -rw-r--r-- 1 root root 16 Jul 21 14:50 random.data.4 Hmm.. I was expecting at least *some* of these to fit in the inode, and not take 2 32K blocks... ## mmlsattr -d -L random.data.4 file name: random.data.4 metadata replication: 2 max 2 data replication: 2 max 2 immutable: no appendOnly: no flags: storage pool name: system fileset name: root snapshot name: creation time: Fri Jul 21 14:50:51 2017 Misc attributes: ARCHIVE Encrypted: yes gpfs.Encryption: 0x4541 (... another 296 hex digits) EncPar 'AES:256:XTS:FEK:HMACSHA512' type: wrapped FEK WrpPar 'AES:KWRAP' CmbPar 'XORHMACSHA512' KEY-97c7f4b7-06cb-4a53-b317-1c187432dc62:archKEY1_gpfsG1 Hmm.. Doesn't *look* like enough extended attributes to prevent storing even 16 bytes in the inode, should be room for around 3.5K minus the above 250 bytes or so of attributes.... What am I missing here? Does "encrypted" or LTFS/EE disable data-in-inode? -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 486 bytes Desc: not available URL: From oehmes at gmail.com Fri Jul 21 23:04:32 2017 From: oehmes at gmail.com (Sven Oehme) Date: Fri, 21 Jul 2017 22:04:32 +0000 Subject: [gpfsug-discuss] GPFS, LTFS/EE and data-in-inode? In-Reply-To: <28986.1500671597@turing-police.cc.vt.edu> References: <28986.1500671597@turing-police.cc.vt.edu> Message-ID: Hi, i talked with a few others to confirm this, but unfortunate this is a limitation of the code today (maybe not well documented which we will look into). Encryption only encrypts data blocks, it doesn't encrypt metadata. Hence, if encryption is enabled, we don't store data in the inode, because then it wouldn't be encrypted. For the same reason HAWC and encryption are incompatible. Sven On Fri, Jul 21, 2017 at 2:13 PM wrote: > So we're running GPFS 4.2.2.3 and LTFS/EE 1.2.3 to use as an archive > service. > Inode size is 4K, and we had a requirement to encrypt-at-rest, so > encryption > is in play as well. Data is replicated 2x and fragment size is 32K. > > I was investigating how much data-in-inode would help deal with users who > put > large trees of small files into the archive (yes, I know we can use > applypolicy > with external programs to tarball offending directories, but that's a > separate > discussion ;) > > ## ls -ls * > 64 -rw-r--r-- 1 root root 2048 Jul 21 14:47 random.data > 64 -rw-r--r-- 1 root root 512 Jul 21 14:48 random.data.1 > 64 -rw-r--r-- 1 root root 128 Jul 21 14:50 random.data.2 > 64 -rw-r--r-- 1 root root 32 Jul 21 14:50 random.data.3 > 64 -rw-r--r-- 1 root root 16 Jul 21 14:50 random.data.4 > > Hmm.. I was expecting at least *some* of these to fit in the inode, and > not take 2 32K blocks... > > ## mmlsattr -d -L random.data.4 > file name: random.data.4 > metadata replication: 2 max 2 > data replication: 2 max 2 > immutable: no > appendOnly: no > flags: > storage pool name: system > fileset name: root > snapshot name: > creation time: Fri Jul 21 14:50:51 2017 > Misc attributes: ARCHIVE > Encrypted: yes > gpfs.Encryption: 0x4541 (... another 296 hex digits) > EncPar 'AES:256:XTS:FEK:HMACSHA512' > type: wrapped FEK WrpPar 'AES:KWRAP' CmbPar 'XORHMACSHA512' > KEY-97c7f4b7-06cb-4a53-b317-1c187432dc62:archKEY1_gpfsG1 > > Hmm.. Doesn't *look* like enough extended attributes to prevent storing > even > 16 bytes in the inode, should be room for around 3.5K minus the above 250 > bytes > or so of attributes.... > > What am I missing here? Does "encrypted" or LTFS/EE disable data-in-inode? > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: From valdis.kletnieks at vt.edu Fri Jul 21 23:24:13 2017 From: valdis.kletnieks at vt.edu (valdis.kletnieks at vt.edu) Date: Fri, 21 Jul 2017 18:24:13 -0400 Subject: [gpfsug-discuss] GPFS, LTFS/EE and data-in-inode? In-Reply-To: References: <28986.1500671597@turing-police.cc.vt.edu> Message-ID: <33069.1500675853@turing-police.cc.vt.edu> On Fri, 21 Jul 2017 22:04:32 -0000, Sven Oehme said: > i talked with a few others to confirm this, but unfortunate this is a > limitation of the code today (maybe not well documented which we will look > into). Encryption only encrypts data blocks, it doesn't encrypt metadata. > Hence, if encryption is enabled, we don't store data in the inode, because > then it wouldn't be encrypted. For the same reason HAWC and encryption are > incompatible. I can live with that restriction if it's documented better, thanks... -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 486 bytes Desc: not available URL: From p.childs at qmul.ac.uk Mon Jul 24 10:29:49 2017 From: p.childs at qmul.ac.uk (Peter Childs) Date: Mon, 24 Jul 2017 09:29:49 +0000 Subject: [gpfsug-discuss] Gpfs Memory Usaage Keeps going up and we don't know why. Message-ID: <1500888588.571.3.camel@qmul.ac.uk> We have two GPFS clusters. One is fairly old and running 4.2.1-2 and non CCR and the nodes run fine using up about 1.5G of memory and is consistent (GPFS pagepool is set to 1G, so that looks about right.) The other one is "newer" running 4.2.1-3 with CCR and the nodes keep increasing in there memory usage, starting at about 1.1G and are find for a few days however after a while they grow to 4.2G which when the node need to run real work, means the work can't be done. I'm losing track of what maybe different other than CCR, and I'm trying to find some more ideas of where to look. I'm checked all the standard things like pagepool and maxFilesToCache (set to the default of 4000), workerThreads is set to 128 on the new gpfs cluster (against default 48 on the old) I'm not sure what else to look at on this one hence why I'm asking the community. Thanks in advance Peter Childs ITS Research Storage Queen Mary University of London. From ilan84 at gmail.com Mon Jul 24 11:36:41 2017 From: ilan84 at gmail.com (Ilan Schwarts) Date: Mon, 24 Jul 2017 13:36:41 +0300 Subject: [gpfsug-discuss] export nfs share on gpfs with no authentication Message-ID: Hi, I have gpfs with 2 Nodes (redhat). I am trying to create NFS share - So I would be able to mount and access it from another linux machine. I receive error: Current authentication: none is invalid. What do i need to configure ? PLEASE NOTE: I dont have the SMB package at the moment, I dont want authentication on the NFS export.. While trying to create NFS (I execute the following): [root at LH20-GPFS1 ~]# mmnfs export add /fs_gpfs01 -c "* Access_Type=RW,Protocols=3:4,Squash=no_root_squash)" I receive the following error: [root at LH20-GPFS1 ~]# mmnfs export add /fs_gpfs01 -c "*(Access_Type=RW,Protocols=3:4,Squash=no_root_squash)" mmcesfuncs.sh: Current authentication: none is invalid. This operation can not be completed without correct Authentication configuration. Configure authentication using: mmuserauth mmnfs export add: Command failed. Examine previous error messages to determine cause. [root at LH20-GPFS1 ~]# mmuserauth service list FILE access not configured PARAMETERS VALUES ------------------------------------------------- OBJECT access not configured PARAMETERS VALUES ------------------------------------------------- [root at LH20-GPFS1 ~]# Some additional information on cluster: ============================== [root at LH20-GPFS1 ~]# mmlsmgr file system manager node ---------------- ------------------ fs_gpfs01 10.10.158.61 (LH20-GPFS1) Cluster manager node: 10.10.158.61 (LH20-GPFS1) [root at LH20-GPFS1 ~]# mmgetstate -a Node number Node name GPFS state ------------------------------------------ 1 LH20-GPFS1 active 3 LH20-GPFS2 active [root at LH20-GPFS1 ~]# mmlscluster GPFS cluster information ======================== GPFS cluster name: LH20-GPFS1 GPFS cluster id: 10777108240438931454 GPFS UID domain: LH20-GPFS1 Remote shell command: /usr/bin/ssh Remote file copy command: /usr/bin/scp Repository type: CCR Node Daemon node name IP address Admin node name Designation -------------------------------------------------------------------- 1 LH20-GPFS1 10.10.158.61 LH20-GPFS1 quorum-manager 3 LH20-GPFS2 10.10.158.62 LH20-GPFS2 quorum From jonathan at buzzard.me.uk Mon Jul 24 12:43:10 2017 From: jonathan at buzzard.me.uk (Jonathan Buzzard) Date: Mon, 24 Jul 2017 12:43:10 +0100 Subject: [gpfsug-discuss] GPFS, LTFS/EE and data-in-inode? In-Reply-To: <28986.1500671597@turing-police.cc.vt.edu> References: <28986.1500671597@turing-police.cc.vt.edu> Message-ID: <1500896590.4387.167.camel@buzzard.me.uk> On Fri, 2017-07-21 at 17:13 -0400, valdis.kletnieks at vt.edu wrote: > So we're running GPFS 4.2.2.3 and LTFS/EE 1.2.3 to use as an archive service. > Inode size is 4K, and we had a requirement to encrypt-at-rest, so encryption > is in play as well. Data is replicated 2x and fragment size is 32K. > For an archive service how about only accepting files in actual "archive" formats and then severely restricting the number of files a user can have? By archive files I am thinking like a .zip, tar.gz, tar.bz or similar. Has a number of effects. Firstly it makes the files "big" so they move to tape efficiently. It also makes it less likely the end user will try and use it as an general purpose file server. As it's an archive there should be no problem for the user to bundle all the files into a .zip file or similar. Noting that Windows Vista and up handle ZIP64 files getting around the older 4GB and 65k files limit. JAB. -- Jonathan A. Buzzard Email: jonathan (at) buzzard.me.uk Fife, United Kingdom. From stefan.dietrich at desy.de Mon Jul 24 13:19:47 2017 From: stefan.dietrich at desy.de (Dietrich, Stefan) Date: Mon, 24 Jul 2017 14:19:47 +0200 (CEST) Subject: [gpfsug-discuss] Scale / Perfmon / Grafana - services running but no data In-Reply-To: References: Message-ID: <1981958989.2609398.1500898787132.JavaMail.zimbra@desy.de> Yep, have look at this Gist [1] The unit files assumes some paths and users, which are created during the installation of my RPM. [1] https://gist.github.com/stdietrich/b3b985f872ea648d6c03bb6249c44e72 Regards, Stefan ----- Original Message ----- > From: "Greg Lehmann" > To: gpfsug-discuss at spectrumscale.org > Sent: Wednesday, July 19, 2017 9:53:58 AM > Subject: Re: [gpfsug-discuss] Scale / Perfmon / Grafana - services running but no data > I?m having a play with this now too. Has anybody coded a systemd unit to handle > step 2b in the knowledge centre article ? bridge creation on the gpfs side? It > would save me a bit of effort. > > > > I?m also wondering about the CherryPy version. It looks like this has been > developed on SLES which has the newer version mentioned as a standard package > and yet RHEL with an older version of CherryPy is perhaps more common as it > seems to have the best support for features of GPFS, like object and block > protocols. Maybe SLES is in favour now? > > > > Cheers, > > > > Greg > > > > From: gpfsug-discuss-bounces at spectrumscale.org > [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Andrew Beattie > Sent: Thursday, 6 July 2017 3:07 PM > To: gpfsug-discuss at spectrumscale.org > Subject: [gpfsug-discuss] Scale / Perfmon / Grafana - services running but no > data > > > > > Greetings, > > > > > > > > > I'm currently setting up Grafana to interact with one of our Scale Clusters > > > and i've followed the knowledge centre link in terms of setup. > > > > > > [ > https://www.ibm.com/support/knowledgecenter/en/STXKQY_4.2.3/com.ibm.spectrum.scale.v4r23.doc/bl1adv_setuppmbridgeforgrafana.htm > | > https://www.ibm.com/support/knowledgecenter/en/STXKQY_4.2.3/com.ibm.spectrum.scale.v4r23.doc/bl1adv_setuppmbridgeforgrafana.htm > ] > > > > > > However while everything appears to be working i'm not seeing any data coming > through the reports within the grafana server, even though I can see data in > the Scale GUI > > > > > > The current environment: > > > > > > [root at sc01n02 ~]# mmlscluster > > > GPFS cluster information > ======================== > GPFS cluster name: sc01.spectrum > GPFS cluster id: 18085710661892594990 > GPFS UID domain: sc01.spectrum > Remote shell command: /usr/bin/ssh > Remote file copy command: /usr/bin/scp > Repository type: CCR > > > Node Daemon node name IP address Admin node name Designation > ------------------------------------------------------------------ > 1 sc01n01 10.2.12.11 sc01n01 quorum-manager-perfmon > 2 sc01n02 10.2.12.12 sc01n02 quorum-manager-perfmon > 3 sc01n03 10.2.12.13 sc01n03 quorum-manager-perfmon > > > [root at sc01n02 ~]# > > > > > > > > > [root at sc01n02 ~]# mmlsconfig > Configuration data for cluster sc01.spectrum: > --------------------------------------------- > clusterName sc01.spectrum > clusterId 18085710661892594990 > autoload yes > profile gpfsProtocolDefaults > dmapiFileHandleSize 32 > minReleaseLevel 4.2.2.0 > ccrEnabled yes > cipherList AUTHONLY > maxblocksize 16M > [cesNodes] > maxMBpS 5000 > numaMemoryInterleave yes > enforceFilesetQuotaOnRoot yes > workerThreads 512 > [common] > tscCmdPortRange 60000-61000 > cesSharedRoot /ibm/cesSharedRoot/ces > cifsBypassTraversalChecking yes > syncSambaMetadataOps yes > cifsBypassShareLocksOnRename yes > adminMode central > > > File systems in cluster sc01.spectrum: > -------------------------------------- > /dev/cesSharedRoot > /dev/icos_demo > /dev/scale01 > [root at sc01n02 ~]# > > > > > > > > > [root at sc01n02 ~]# systemctl status pmcollector > ? pmcollector.service - LSB: Start the ZIMon performance monitor collector. > Loaded: loaded (/etc/rc.d/init.d/pmcollector) > Active: active (running) since Tue 2017-05-30 08:46:32 AEST; 1 months 6 days ago > Docs: man:systemd-sysv-generator(8) > Main PID: 2693 (ZIMonCollector) > CGroup: /system.slice/pmcollector.service > ??2693 /opt/IBM/zimon/ZIMonCollector -C /opt/IBM/zimon/ZIMonCollector.cfg... > ??2698 python /opt/IBM/zimon/bin/pmmonitor.py -f /opt/IBM/zimon/syshealth... > > > May 30 08:46:32 sc01n02 systemd[1]: Starting LSB: Start the ZIMon performance > mon...... > May 30 08:46:32 sc01n02 pmcollector[2584]: Starting performance monitor > collector... > May 30 08:46:32 sc01n02 systemd[1]: Started LSB: Start the ZIMon performance > moni...r.. > Hint: Some lines were ellipsized, use -l to show in full. > > > > > > From Grafana Server: > > > > > > > > > > > > > > > when I send a set of files to the cluster (3.8GB) I can see performance metrics > within the Scale GUI > > > > > > > > > > > > yet from the Grafana Dashboard im not seeing any data points > > > > > > > > > > > > Can anyone provide some hints as to what might be happening? > > > > > > > > > > > > Regards, > > > > > > > > > Andrew Beattie > > > Software Defined Storage - IT Specialist > > > Phone: 614-2133-7927 > > > E-mail: [ mailto:abeattie at au1.ibm.com | abeattie at au1.ibm.com ] > > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss From jjdoherty at yahoo.com Mon Jul 24 14:11:12 2017 From: jjdoherty at yahoo.com (Jim Doherty) Date: Mon, 24 Jul 2017 13:11:12 +0000 (UTC) Subject: [gpfsug-discuss] Gpfs Memory Usaage Keeps going up and we don't know why. References: <261384244.3866909.1500901872347.ref@mail.yahoo.com> Message-ID: <261384244.3866909.1500901872347@mail.yahoo.com> There are 3 places that the GPFS mmfsd uses memory? the pagepool? plus 2 shared memory segments.?? To see the memory utilization of the shared memory segments run the command?? mmfsadm dump malloc .??? The statistics for memory pool id 2 is where? maxFilesToCache/maxStatCache objects are? and the manager nodes use memory pool id 3 to track the MFTC/MSC objects.?? You might want to upgrade to later PTF? as there was a PTF to fix a memory leak that occurred in tscomm associated with network connection drops.?? On Monday, July 24, 2017 5:29 AM, Peter Childs wrote: We have two GPFS clusters. One is fairly old and running 4.2.1-2 and non CCR and the nodes run fine using up about 1.5G of memory and is consistent (GPFS pagepool is set to 1G, so that looks about right.) The other one is "newer" running 4.2.1-3 with CCR and the nodes keep increasing in there memory usage, starting at about 1.1G and are find for a few days however after a while they grow to 4.2G which when the node need to run real work, means the work can't be done. I'm losing track of what maybe different other than CCR, and I'm trying to find some more ideas of where to look. I'm checked all the standard things like pagepool and maxFilesToCache (set to the default of 4000), workerThreads is set to 128 on the new gpfs cluster (against default 48 on the old) I'm not sure what else to look at on this one hence why I'm asking the community. Thanks in advance Peter Childs ITS Research Storage Queen Mary University of London. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From p.childs at qmul.ac.uk Mon Jul 24 14:30:49 2017 From: p.childs at qmul.ac.uk (Peter Childs) Date: Mon, 24 Jul 2017 13:30:49 +0000 Subject: [gpfsug-discuss] Gpfs Memory Usaage Keeps going up and we don't know why. In-Reply-To: <261384244.3866909.1500901872347@mail.yahoo.com> References: <261384244.3866909.1500901872347.ref@mail.yahoo.com> <261384244.3866909.1500901872347@mail.yahoo.com> Message-ID: <1500903047.571.7.camel@qmul.ac.uk> I've had a look at mmfsadm dump malloc and it looks to agree with the output from mmdiag --memory. and does not seam to account for the excessive memory usage. The new machines do have idleSocketTimout set to 0 from what your saying it could be related to keeping that many connections between nodes working. Thanks in advance Peter. [root at dn29 ~]# mmdiag --memory === mmdiag: memory === mmfsd heap size: 2039808 bytes Statistics for MemoryPool id 1 ("Shared Segment (EPHEMERAL)") 128 bytes in use 17500049370 hard limit on memory usage 1048576 bytes committed to regions 1 number of regions 555 allocations 555 frees 0 allocation failures Statistics for MemoryPool id 2 ("Shared Segment") 42179592 bytes in use 17500049370 hard limit on memory usage 56623104 bytes committed to regions 9 number of regions 100027 allocations 79624 frees 0 allocation failures Statistics for MemoryPool id 3 ("Token Manager") 2099520 bytes in use 17500049370 hard limit on memory usage 16778240 bytes committed to regions 1 number of regions 4 allocations 0 frees 0 allocation failures On Mon, 2017-07-24 at 13:11 +0000, Jim Doherty wrote: There are 3 places that the GPFS mmfsd uses memory the pagepool plus 2 shared memory segments. To see the memory utilization of the shared memory segments run the command mmfsadm dump malloc . The statistics for memory pool id 2 is where maxFilesToCache/maxStatCache objects are and the manager nodes use memory pool id 3 to track the MFTC/MSC objects. You might want to upgrade to later PTF as there was a PTF to fix a memory leak that occurred in tscomm associated with network connection drops. On Monday, July 24, 2017 5:29 AM, Peter Childs wrote: We have two GPFS clusters. One is fairly old and running 4.2.1-2 and non CCR and the nodes run fine using up about 1.5G of memory and is consistent (GPFS pagepool is set to 1G, so that looks about right.) The other one is "newer" running 4.2.1-3 with CCR and the nodes keep increasing in there memory usage, starting at about 1.1G and are find for a few days however after a while they grow to 4.2G which when the node need to run real work, means the work can't be done. I'm losing track of what maybe different other than CCR, and I'm trying to find some more ideas of where to look. I'm checked all the standard things like pagepool and maxFilesToCache (set to the default of 4000), workerThreads is set to 128 on the new gpfs cluster (against default 48 on the old) I'm not sure what else to look at on this one hence why I'm asking the community. Thanks in advance Peter Childs ITS Research Storage Queen Mary University of London. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -- Peter Childs ITS Research Storage Queen Mary, University of London -------------- next part -------------- An HTML attachment was scrubbed... URL: From jjdoherty at yahoo.com Mon Jul 24 15:10:45 2017 From: jjdoherty at yahoo.com (Jim Doherty) Date: Mon, 24 Jul 2017 14:10:45 +0000 (UTC) Subject: [gpfsug-discuss] Gpfs Memory Usaage Keeps going up and we don't know why. In-Reply-To: <1500903047.571.7.camel@qmul.ac.uk> References: <261384244.3866909.1500901872347.ref@mail.yahoo.com> <261384244.3866909.1500901872347@mail.yahoo.com> <1500903047.571.7.camel@qmul.ac.uk> Message-ID: <1770436429.3911327.1500905445052@mail.yahoo.com> How are you identifying? the high memory usage???? On Monday, July 24, 2017 9:30 AM, Peter Childs wrote: I've had a look at mmfsadm dump malloc and it looks to agree with the output from mmdiag --memory. and does not seam to account for the excessive memory usage. The new machines do have idleSocketTimout set to 0 from what your saying it could be related to keeping that many connections between nodes working. Thanks in advance Peter. [root at dn29 ~]# mmdiag --memory === mmdiag: memory ===mmfsd heap size: 2039808 bytes Statistics for MemoryPool id 1 ("Shared Segment (EPHEMERAL)")???????????128 bytes in use???17500049370 hard limit on memory usage???????1048576 bytes committed to regions?????????????1 number of regions???????????555 allocations???????????555 frees?????????????0 allocation failures Statistics for MemoryPool id 2 ("Shared Segment")??????42179592 bytes in use???17500049370 hard limit on memory usage??????56623104 bytes committed to regions?????????????9 number of regions????????100027 allocations?????????79624 frees?????????????0 allocation failures Statistics for MemoryPool id 3 ("Token Manager")???????2099520 bytes in use???17500049370 hard limit on memory usage??????16778240 bytes committed to regions?????????????1 number of regions?????????????4 allocations?????????????0 frees?????????????0 allocation failures On Mon, 2017-07-24 at 13:11 +0000, Jim Doherty wrote: There are 3 places that the GPFS mmfsd uses memory? the pagepool? plus 2 shared memory segments.?? To see the memory utilization of the shared memory segments run the command?? mmfsadm dump malloc .??? The statistics for memory pool id 2 is where? maxFilesToCache/maxStatCache objects are? and the manager nodes use memory pool id 3 to track the MFTC/MSC objects.?? You might want to upgrade to later PTF? as there was a PTF to fix a memory leak that occurred in tscomm associated with network connection drops.?? On Monday, July 24, 2017 5:29 AM, Peter Childs wrote: We have two GPFS clusters. One is fairly old and running 4.2.1-2 and non CCR and the nodes run fine using up about 1.5G of memory and is consistent (GPFS pagepool is set to 1G, so that looks about right.) The other one is "newer" running 4.2.1-3 with CCR and the nodes keep increasing in there memory usage, starting at about 1.1G and are find for a few days however after a while they grow to 4.2G which when the node need to run real work, means the work can't be done. I'm losing track of what maybe different other than CCR, and I'm trying to find some more ideas of where to look. I'm checked all the standard things like pagepool and maxFilesToCache (set to the default of 4000), workerThreads is set to 128 on the new gpfs cluster (against default 48 on the old) I'm not sure what else to look at on this one hence why I'm asking the community. Thanks in advance Peter Childs ITS Research Storage Queen Mary University of London. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -- Peter ChildsITS Research StorageQueen Mary, University of London _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From p.childs at qmul.ac.uk Mon Jul 24 15:21:27 2017 From: p.childs at qmul.ac.uk (Peter Childs) Date: Mon, 24 Jul 2017 14:21:27 +0000 Subject: [gpfsug-discuss] Gpfs Memory Usaage Keeps going up and we don't know why. In-Reply-To: <1770436429.3911327.1500905445052@mail.yahoo.com> References: <261384244.3866909.1500901872347.ref@mail.yahoo.com> <261384244.3866909.1500901872347@mail.yahoo.com> <1500903047.571.7.camel@qmul.ac.uk> <1770436429.3911327.1500905445052@mail.yahoo.com> Message-ID: <1500906086.571.9.camel@qmul.ac.uk> top but ps gives the same value. [root at dn29 ~]# ps auww -q 4444 USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND root 4444 2.7 22.3 10537600 5472580 ? S wrote: I've had a look at mmfsadm dump malloc and it looks to agree with the output from mmdiag --memory. and does not seam to account for the excessive memory usage. The new machines do have idleSocketTimout set to 0 from what your saying it could be related to keeping that many connections between nodes working. Thanks in advance Peter. [root at dn29 ~]# mmdiag --memory === mmdiag: memory === mmfsd heap size: 2039808 bytes Statistics for MemoryPool id 1 ("Shared Segment (EPHEMERAL)") 128 bytes in use 17500049370 hard limit on memory usage 1048576 bytes committed to regions 1 number of regions 555 allocations 555 frees 0 allocation failures Statistics for MemoryPool id 2 ("Shared Segment") 42179592 bytes in use 17500049370 hard limit on memory usage 56623104 bytes committed to regions 9 number of regions 100027 allocations 79624 frees 0 allocation failures Statistics for MemoryPool id 3 ("Token Manager") 2099520 bytes in use 17500049370 hard limit on memory usage 16778240 bytes committed to regions 1 number of regions 4 allocations 0 frees 0 allocation failures On Mon, 2017-07-24 at 13:11 +0000, Jim Doherty wrote: There are 3 places that the GPFS mmfsd uses memory the pagepool plus 2 shared memory segments. To see the memory utilization of the shared memory segments run the command mmfsadm dump malloc . The statistics for memory pool id 2 is where maxFilesToCache/maxStatCache objects are and the manager nodes use memory pool id 3 to track the MFTC/MSC objects. You might want to upgrade to later PTF as there was a PTF to fix a memory leak that occurred in tscomm associated with network connection drops. On Monday, July 24, 2017 5:29 AM, Peter Childs wrote: We have two GPFS clusters. One is fairly old and running 4.2.1-2 and non CCR and the nodes run fine using up about 1.5G of memory and is consistent (GPFS pagepool is set to 1G, so that looks about right.) The other one is "newer" running 4.2.1-3 with CCR and the nodes keep increasing in there memory usage, starting at about 1.1G and are find for a few days however after a while they grow to 4.2G which when the node need to run real work, means the work can't be done. I'm losing track of what maybe different other than CCR, and I'm trying to find some more ideas of where to look. I'm checked all the standard things like pagepool and maxFilesToCache (set to the default of 4000), workerThreads is set to 128 on the new gpfs cluster (against default 48 on the old) I'm not sure what else to look at on this one hence why I'm asking the community. Thanks in advance Peter Childs ITS Research Storage Queen Mary University of London. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -- Peter Childs ITS Research Storage Queen Mary, University of London _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -- Peter Childs ITS Research Storage Queen Mary, University of London -------------- next part -------------- An HTML attachment was scrubbed... URL: From adam.huffman at crick.ac.uk Mon Jul 24 15:40:51 2017 From: adam.huffman at crick.ac.uk (Adam Huffman) Date: Mon, 24 Jul 2017 14:40:51 +0000 Subject: [gpfsug-discuss] Gpfs Memory Usaage Keeps going up and we don't know why. In-Reply-To: <1500906086.571.9.camel@qmul.ac.uk> References: <261384244.3866909.1500901872347.ref@mail.yahoo.com> <261384244.3866909.1500901872347@mail.yahoo.com> <1500903047.571.7.camel@qmul.ac.uk> <1770436429.3911327.1500905445052@mail.yahoo.com> <1500906086.571.9.camel@qmul.ac.uk> Message-ID: <1CC632F0-55DB-4185-8177-B814A2F8A874@crick.ac.uk> smem is recommended here Cheers, Adam -- Adam Huffman Senior HPC and Cloud Systems Engineer The Francis Crick Institute 1 Midland Road London NW1 1AT T: 020 3796 1175 E: adam.huffman at crick.ac.uk W: www.crick.ac.uk On 24 Jul 2017, at 15:21, Peter Childs > wrote: top but ps gives the same value. [root at dn29 ~]# ps auww -q 4444 USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND root 4444 2.7 22.3 10537600 5472580 ? S> wrote: I've had a look at mmfsadm dump malloc and it looks to agree with the output from mmdiag --memory. and does not seam to account for the excessive memory usage. The new machines do have idleSocketTimout set to 0 from what your saying it could be related to keeping that many connections between nodes working. Thanks in advance Peter. [root at dn29 ~]# mmdiag --memory === mmdiag: memory === mmfsd heap size: 2039808 bytes Statistics for MemoryPool id 1 ("Shared Segment (EPHEMERAL)") 128 bytes in use 17500049370 hard limit on memory usage 1048576 bytes committed to regions 1 number of regions 555 allocations 555 frees 0 allocation failures Statistics for MemoryPool id 2 ("Shared Segment") 42179592 bytes in use 17500049370 hard limit on memory usage 56623104 bytes committed to regions 9 number of regions 100027 allocations 79624 frees 0 allocation failures Statistics for MemoryPool id 3 ("Token Manager") 2099520 bytes in use 17500049370 hard limit on memory usage 16778240 bytes committed to regions 1 number of regions 4 allocations 0 frees 0 allocation failures On Mon, 2017-07-24 at 13:11 +0000, Jim Doherty wrote: There are 3 places that the GPFS mmfsd uses memory the pagepool plus 2 shared memory segments. To see the memory utilization of the shared memory segments run the command mmfsadm dump malloc . The statistics for memory pool id 2 is where maxFilesToCache/maxStatCache objects are and the manager nodes use memory pool id 3 to track the MFTC/MSC objects. You might want to upgrade to later PTF as there was a PTF to fix a memory leak that occurred in tscomm associated with network connection drops. On Monday, July 24, 2017 5:29 AM, Peter Childs > wrote: We have two GPFS clusters. One is fairly old and running 4.2.1-2 and non CCR and the nodes run fine using up about 1.5G of memory and is consistent (GPFS pagepool is set to 1G, so that looks about right.) The other one is "newer" running 4.2.1-3 with CCR and the nodes keep increasing in there memory usage, starting at about 1.1G and are find for a few days however after a while they grow to 4.2G which when the node need to run real work, means the work can't be done. I'm losing track of what maybe different other than CCR, and I'm trying to find some more ideas of where to look. I'm checked all the standard things like pagepool and maxFilesToCache (set to the default of 4000), workerThreads is set to 128 on the new gpfs cluster (against default 48 on the old) I'm not sure what else to look at on this one hence why I'm asking the community. Thanks in advance Peter Childs ITS Research Storage Queen Mary University of London. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -- Peter Childs ITS Research Storage Queen Mary, University of London _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -- Peter Childs ITS Research Storage Queen Mary, University of London _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss The Francis Crick Institute Limited is a registered charity in England and Wales no. 1140062 and a company registered in England and Wales no. 06885462, with its registered office at 1 Midland Road London NW1 1AT -------------- next part -------------- An HTML attachment was scrubbed... URL: From jamiedavis at us.ibm.com Mon Jul 24 15:45:26 2017 From: jamiedavis at us.ibm.com (James Davis) Date: Mon, 24 Jul 2017 14:45:26 +0000 Subject: [gpfsug-discuss] GPFS, LTFS/EE and data-in-inode? In-Reply-To: <33069.1500675853@turing-police.cc.vt.edu> References: <33069.1500675853@turing-police.cc.vt.edu>, <28986.1500671597@turing-police.cc.vt.edu> Message-ID: An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: attlisjw.dat Type: application/octet-stream Size: 497 bytes Desc: not available URL: From r.sobey at imperial.ac.uk Mon Jul 24 15:50:57 2017 From: r.sobey at imperial.ac.uk (Sobey, Richard A) Date: Mon, 24 Jul 2017 14:50:57 +0000 Subject: [gpfsug-discuss] GPFS, LTFS/EE and data-in-inode? In-Reply-To: References: <33069.1500675853@turing-police.cc.vt.edu>, <28986.1500671597@turing-police.cc.vt.edu> Message-ID: I suppose the distinction between data, metadata and data IN metadata could be made. Whilst it is clear to me (us) now, perhaps the thought was that the data would be encrypted even if it was stored inside the metadata. My two pence. Richard From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of James Davis Sent: 24 July 2017 15:45 To: gpfsug-discuss at spectrumscale.org Subject: Re: [gpfsug-discuss] GPFS, LTFS/EE and data-in-inode? Hey all, On the documentation of encryption restrictions and encryption/HAWC interplay... The encryption documentation currently states: "Secure storage uses encryption to make data unreadable to anyone who does not possess the necessary encryption keys...Only data, not metadata, is encrypted." The HAWC restrictions include: "Encrypted data is never stored in the recovery log..." If this is unclear, I'm open to suggestions for improvements. Cordially, Jamie ----- Original message ----- From: valdis.kletnieks at vt.edu Sent by: gpfsug-discuss-bounces at spectrumscale.org To: gpfsug main discussion list > Cc: Subject: Re: [gpfsug-discuss] GPFS, LTFS/EE and data-in-inode? Date: Fri, Jul 21, 2017 6:24 PM On Fri, 21 Jul 2017 22:04:32 -0000, Sven Oehme said: > i talked with a few others to confirm this, but unfortunate this is a > limitation of the code today (maybe not well documented which we will look > into). Encryption only encrypts data blocks, it doesn't encrypt metadata. > Hence, if encryption is enabled, we don't store data in the inode, because > then it wouldn't be encrypted. For the same reason HAWC and encryption are > incompatible. I can live with that restriction if it's documented better, thanks... [Document Icon]attq4saq.dat Type: application/pgp-signature Name: attq4saq.dat _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From jonathan at buzzard.me.uk Mon Jul 24 15:57:13 2017 From: jonathan at buzzard.me.uk (Jonathan Buzzard) Date: Mon, 24 Jul 2017 15:57:13 +0100 Subject: [gpfsug-discuss] GPFS, LTFS/EE and data-in-inode? In-Reply-To: References: <33069.1500675853@turing-police.cc.vt.edu> , <28986.1500671597@turing-police.cc.vt.edu> Message-ID: <1500908233.4387.194.camel@buzzard.me.uk> On Mon, 2017-07-24 at 14:45 +0000, James Davis wrote: > Hey all, > > On the documentation of encryption restrictions and encryption/HAWC > interplay... > > The encryption documentation currently states: > > "Secure storage uses encryption to make data unreadable to anyone who > does not possess the necessary encryption keys...Only data, not > metadata, is encrypted." > > The HAWC restrictions include: > > "Encrypted data is never stored in the recovery log..." > > If this is unclear, I'm open to suggestions for improvements. > Just because *DATA* is stored in the metadata does not make it magically metadata. It's still data so you could quite reasonably conclude that it is encrypted. We have now been disabused of this, but the documentation is not clear and needs clarifying. Perhaps say metadata blocks are not encrypted. Or just a simple data stored in inodes is not encrypted would suffice. JAB. -- Jonathan A. Buzzard Email: jonathan (at) buzzard.me.uk Fife, United Kingdom. From valdis.kletnieks at vt.edu Mon Jul 24 16:49:07 2017 From: valdis.kletnieks at vt.edu (valdis.kletnieks at vt.edu) Date: Mon, 24 Jul 2017 11:49:07 -0400 Subject: [gpfsug-discuss] GPFS, LTFS/EE and data-in-inode? In-Reply-To: <1500896590.4387.167.camel@buzzard.me.uk> References: <28986.1500671597@turing-police.cc.vt.edu> <1500896590.4387.167.camel@buzzard.me.uk> Message-ID: <17702.1500911347@turing-police.cc.vt.edu> On Mon, 24 Jul 2017 12:43:10 +0100, Jonathan Buzzard said: > For an archive service how about only accepting files in actual > "archive" formats and then severely restricting the number of files a > user can have? > > By archive files I am thinking like a .zip, tar.gz, tar.bz or similar. After having dealt with users who fill up disk storage for almost 4 decades now, I'm fully aware of those advantages. :) ( /me ponders when an IBM 2314 disk pack with 27M of space was "a lot" in 1978, and when we moved 2 IBM mainframes in 1989, 400G took 2,500+ square feet, and now 8T drives are all over the place...) On the flip side, my current project is migrating 5 petabytes of data from our old archive system that didn't have such rules (mostly due to politics and the fact that the underlying XFS filesystem uses a 4K blocksize so it wasn't as big an issue), so I'm stuck with what people put in there years ago. -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 486 bytes Desc: not available URL: From Mark.Bush at siriuscom.com Mon Jul 24 16:49:26 2017 From: Mark.Bush at siriuscom.com (Mark Bush) Date: Mon, 24 Jul 2017 15:49:26 +0000 Subject: [gpfsug-discuss] export nfs share on gpfs with no authentication In-Reply-To: References: Message-ID: Ilan, you must create some type of authentication mechanism for CES to work properly first. If you want a quick and dirty way that would just use your local /etc/passwd try this. /usr/lpp/mmfs/bin/mmuserauth service create --data-access-method file --type userdefined Mark -----Original Message----- From: Ilan Schwarts [mailto:ilan84 at gmail.com] Sent: Monday, July 24, 2017 5:37 AM To: gpfsug main discussion list Subject: [gpfsug-discuss] export nfs share on gpfs with no authentication Hi, I have gpfs with 2 Nodes (redhat). I am trying to create NFS share - So I would be able to mount and access it from another linux machine. I receive error: Current authentication: none is invalid. What do i need to configure ? PLEASE NOTE: I dont have the SMB package at the moment, I dont want authentication on the NFS export.. While trying to create NFS (I execute the following): [root at LH20-GPFS1 ~]# mmnfs export add /fs_gpfs01 -c "* Access_Type=RW,Protocols=3:4,Squash=no_root_squash)" I receive the following error: [root at LH20-GPFS1 ~]# mmnfs export add /fs_gpfs01 -c "*(Access_Type=RW,Protocols=3:4,Squash=no_root_squash)" mmcesfuncs.sh: Current authentication: none is invalid. This operation can not be completed without correct Authentication configuration. Configure authentication using: mmuserauth mmnfs export add: Command failed. Examine previous error messages to determine cause. [root at LH20-GPFS1 ~]# mmuserauth service list FILE access not configured PARAMETERS VALUES ------------------------------------------------- OBJECT access not configured PARAMETERS VALUES ------------------------------------------------- [root at LH20-GPFS1 ~]# Some additional information on cluster: ============================== [root at LH20-GPFS1 ~]# mmlsmgr file system manager node ---------------- ------------------ fs_gpfs01 10.10.158.61 (LH20-GPFS1) Cluster manager node: 10.10.158.61 (LH20-GPFS1) [root at LH20-GPFS1 ~]# mmgetstate -a Node number Node name GPFS state ------------------------------------------ 1 LH20-GPFS1 active 3 LH20-GPFS2 active [root at LH20-GPFS1 ~]# mmlscluster GPFS cluster information ======================== GPFS cluster name: LH20-GPFS1 GPFS cluster id: 10777108240438931454 GPFS UID domain: LH20-GPFS1 Remote shell command: /usr/bin/ssh Remote file copy command: /usr/bin/scp Repository type: CCR Node Daemon node name IP address Admin node name Designation -------------------------------------------------------------------- 1 LH20-GPFS1 10.10.158.61 LH20-GPFS1 quorum-manager 3 LH20-GPFS2 10.10.158.62 LH20-GPFS2 quorum _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss This message (including any attachments) is intended only for the use of the individual or entity to which it is addressed and may contain information that is non-public, proprietary, privileged, confidential, and exempt from disclosure under applicable law. If you are not the intended recipient, you are hereby notified that any use, dissemination, distribution, or copying of this communication is strictly prohibited. This message may be viewed by parties at Sirius Computer Solutions other than those named in the message header. This message does not contain an official representation of Sirius Computer Solutions. If you have received this communication in error, notify Sirius Computer Solutions immediately and (i) destroy this message if a facsimile or (ii) delete this message immediately if this is an electronic communication. Thank you. Sirius Computer Solutions From valdis.kletnieks at vt.edu Mon Jul 24 17:35:34 2017 From: valdis.kletnieks at vt.edu (valdis.kletnieks at vt.edu) Date: Mon, 24 Jul 2017 12:35:34 -0400 Subject: [gpfsug-discuss] export nfs share on gpfs with no authentication In-Reply-To: References: Message-ID: <27469.1500914134@turing-police.cc.vt.edu> On Mon, 24 Jul 2017 13:36:41 +0300, Ilan Schwarts said: > Hi, > I have gpfs with 2 Nodes (redhat). > I am trying to create NFS share - So I would be able to mount and > access it from another linux machine. > While trying to create NFS (I execute the following): > [root at LH20-GPFS1 ~]# mmnfs export add /fs_gpfs01 -c "* > Access_Type=RW,Protocols=3:4,Squash=no_root_squash)" You can get away with little to no authentication for NFSv3, but not for NFSv4. Try with Protocols=3 only and mmuserauth service create --type userdefined that should get you Unix-y NFSv3 UID/GID support and "trust what the NFS client tells you". This of course only works sanely if each NFS export is only to a set of machines in the same administrative domain that manages their UID/GIDs. Exporting to two sets of machines that don't coordinate their UID/GID space is, of course, where hilarity and hijinks ensue.... -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 486 bytes Desc: not available URL: From luke.raimbach at googlemail.com Mon Jul 24 23:23:03 2017 From: luke.raimbach at googlemail.com (Luke Raimbach) Date: Mon, 24 Jul 2017 22:23:03 +0000 Subject: [gpfsug-discuss] Gpfs Memory Usaage Keeps going up and we don't know why. In-Reply-To: <1CC632F0-55DB-4185-8177-B814A2F8A874@crick.ac.uk> References: <261384244.3866909.1500901872347.ref@mail.yahoo.com> <261384244.3866909.1500901872347@mail.yahoo.com> <1500903047.571.7.camel@qmul.ac.uk> <1770436429.3911327.1500905445052@mail.yahoo.com> <1500906086.571.9.camel@qmul.ac.uk> <1CC632F0-55DB-4185-8177-B814A2F8A874@crick.ac.uk> Message-ID: Switch of CCR and see what happens. On Mon, 24 Jul 2017, 15:40 Adam Huffman, wrote: > smem is recommended here > > Cheers, > Adam > > -- > > Adam Huffman > Senior HPC and Cloud Systems Engineer > The Francis Crick Institute > 1 Midland Road > London NW1 1AT > > T: 020 3796 1175 > E: adam.huffman at crick.ac.uk > W: www.crick.ac.uk > > > > > > On 24 Jul 2017, at 15:21, Peter Childs wrote: > > > top > > but ps gives the same value. > > [root at dn29 ~]# ps auww -q 4444 > USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND > root 4444 2.7 22.3 10537600 5472580 ? S /usr/lpp/mmfs/bin/mmfsd > > Thanks for the help > > Peter. > > > On Mon, 2017-07-24 at 14:10 +0000, Jim Doherty wrote: > > How are you identifying the high memory usage? > > > On Monday, July 24, 2017 9:30 AM, Peter Childs > wrote: > > > I've had a look at mmfsadm dump malloc and it looks to agree with the > output from mmdiag --memory. and does not seam to account for the excessive > memory usage. > > The new machines do have idleSocketTimout set to 0 from what your saying > it could be related to keeping that many connections between nodes working. > > Thanks in advance > > Peter. > > > > > [root at dn29 ~]# mmdiag --memory > > === mmdiag: memory === > mmfsd heap size: 2039808 bytes > > > Statistics for MemoryPool id 1 ("Shared Segment (EPHEMERAL)") > 128 bytes in use > 17500049370 hard limit on memory usage > 1048576 bytes committed to regions > 1 number of regions > 555 allocations > 555 frees > 0 allocation failures > > > Statistics for MemoryPool id 2 ("Shared Segment") > 42179592 bytes in use > 17500049370 hard limit on memory usage > 56623104 bytes committed to regions > 9 number of regions > 100027 allocations > 79624 frees > 0 allocation failures > > > Statistics for MemoryPool id 3 ("Token Manager") > 2099520 bytes in use > 17500049370 hard limit on memory usage > 16778240 bytes committed to regions > 1 number of regions > 4 allocations > 0 frees > 0 allocation failures > > > On Mon, 2017-07-24 at 13:11 +0000, Jim Doherty wrote: > > There are 3 places that the GPFS mmfsd uses memory the pagepool plus 2 > shared memory segments. To see the memory utilization of the shared > memory segments run the command mmfsadm dump malloc . The statistics > for memory pool id 2 is where maxFilesToCache/maxStatCache objects are > and the manager nodes use memory pool id 3 to track the MFTC/MSC objects. > > You might want to upgrade to later PTF as there was a PTF to fix a memory > leak that occurred in tscomm associated with network connection drops. > > > On Monday, July 24, 2017 5:29 AM, Peter Childs > wrote: > > > We have two GPFS clusters. > > One is fairly old and running 4.2.1-2 and non CCR and the nodes run > fine using up about 1.5G of memory and is consistent (GPFS pagepool is > set to 1G, so that looks about right.) > > The other one is "newer" running 4.2.1-3 with CCR and the nodes keep > increasing in there memory usage, starting at about 1.1G and are find > for a few days however after a while they grow to 4.2G which when the > node need to run real work, means the work can't be done. > > I'm losing track of what maybe different other than CCR, and I'm trying > to find some more ideas of where to look. > > I'm checked all the standard things like pagepool and maxFilesToCache > (set to the default of 4000), workerThreads is set to 128 on the new > gpfs cluster (against default 48 on the old) > > I'm not sure what else to look at on this one hence why I'm asking the > community. > > Thanks in advance > > Peter Childs > ITS Research Storage > Queen Mary University of London. > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.orghttp://gpfsug.org/mailman/listinfo/gpfsug-discuss > > -- > > Peter Childs > ITS Research Storage > Queen Mary, University of London > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.orghttp://gpfsug.org/mailman/listinfo/gpfsug-discuss > > -- > > Peter Childs > ITS Research Storage > Queen Mary, University of London > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > The Francis Crick Institute Limited is a registered charity in England and > Wales no. 1140062 and a company registered in England and Wales no. > 06885462, with its registered office at 1 Midland Road London NW1 1AT > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ilan84 at gmail.com Tue Jul 25 05:52:11 2017 From: ilan84 at gmail.com (Ilan Schwarts) Date: Tue, 25 Jul 2017 07:52:11 +0300 Subject: [gpfsug-discuss] export nfs share on gpfs with no authentication In-Reply-To: <27469.1500914134@turing-police.cc.vt.edu> References: <27469.1500914134@turing-police.cc.vt.edu> Message-ID: Hi, While trying to add the userdefined auth, I receive error that SMB service not enabled. I am currently working on a spectrum scale cluster, and i dont have the SMB package, I am waiting for it.. is there a way to export NFSv3 using the spectrum scale tools without SMB package ? [root at LH20-GPFS1 ~]# mmuserauth service create --type userdefined : SMB service not enabled. Enable SMB service first. mmcesuserauthcrservice: Command failed. Examine previous error messages to determine cause. I exported the NFS via /etc/exports and than ./exportfs -a .. It works fine, I was able to mount the gpfs export from another machine.. this was my work-around since the spectrum scale tools failed to export NFSv3 On Mon, Jul 24, 2017 at 7:35 PM, wrote: > On Mon, 24 Jul 2017 13:36:41 +0300, Ilan Schwarts said: >> Hi, >> I have gpfs with 2 Nodes (redhat). >> I am trying to create NFS share - So I would be able to mount and >> access it from another linux machine. > >> While trying to create NFS (I execute the following): >> [root at LH20-GPFS1 ~]# mmnfs export add /fs_gpfs01 -c "* >> Access_Type=RW,Protocols=3:4,Squash=no_root_squash)" > > You can get away with little to no authentication for NFSv3, but > not for NFSv4. Try with Protocols=3 only and > > mmuserauth service create --type userdefined > > that should get you Unix-y NFSv3 UID/GID support and "trust what the NFS > client tells you". This of course only works sanely if each NFS export is > only to a set of machines in the same administrative domain that manages their > UID/GIDs. Exporting to two sets of machines that don't coordinate their > UID/GID space is, of course, where hilarity and hijinks ensue.... > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -- - Ilan Schwarts From ulmer at ulmer.org Tue Jul 25 06:33:13 2017 From: ulmer at ulmer.org (Stephen Ulmer) Date: Tue, 25 Jul 2017 01:33:13 -0400 Subject: [gpfsug-discuss] GPFS, LTFS/EE and data-in-inode? In-Reply-To: <1500908233.4387.194.camel@buzzard.me.uk> References: <33069.1500675853@turing-police.cc.vt.edu> <28986.1500671597@turing-police.cc.vt.edu> <1500908233.4387.194.camel@buzzard.me.uk> Message-ID: <1233C5A4-A8C9-4A56-AEC3-AE65DBB5D346@ulmer.org> > On Jul 24, 2017, at 10:57 AM, Jonathan Buzzard > wrote: > > On Mon, 2017-07-24 at 14:45 +0000, James Davis wrote: >> Hey all, >> >> On the documentation of encryption restrictions and encryption/HAWC >> interplay... >> >> The encryption documentation currently states: >> >> "Secure storage uses encryption to make data unreadable to anyone who >> does not possess the necessary encryption keys...Only data, not >> metadata, is encrypted." >> >> The HAWC restrictions include: >> >> "Encrypted data is never stored in the recovery log..." >> >> If this is unclear, I'm open to suggestions for improvements. >> > > Just because *DATA* is stored in the metadata does not make it magically > metadata. It's still data so you could quite reasonably conclude that it > is encrypted. > [?] > JAB. +1. Also, "Encrypted data is never stored in the recovery log?" does not make it clear whether: The data that is supposed to be encrypted is not written to the recovery log. The data that is supposed to be encrypted is written to the recovery log, but is not encrypted there. Thanks, -- Stephen -------------- next part -------------- An HTML attachment was scrubbed... URL: From jonathan at buzzard.me.uk Tue Jul 25 10:02:14 2017 From: jonathan at buzzard.me.uk (Jonathan Buzzard) Date: Tue, 25 Jul 2017 10:02:14 +0100 Subject: [gpfsug-discuss] GPFS, LTFS/EE and data-in-inode? In-Reply-To: <17702.1500911347@turing-police.cc.vt.edu> References: <28986.1500671597@turing-police.cc.vt.edu> <1500896590.4387.167.camel@buzzard.me.uk> <17702.1500911347@turing-police.cc.vt.edu> Message-ID: <1500973334.4387.201.camel@buzzard.me.uk> On Mon, 2017-07-24 at 11:49 -0400, valdis.kletnieks at vt.edu wrote: > On Mon, 24 Jul 2017 12:43:10 +0100, Jonathan Buzzard said: > > > For an archive service how about only accepting files in actual > > "archive" formats and then severely restricting the number of files a > > user can have? > > > > By archive files I am thinking like a .zip, tar.gz, tar.bz or similar. > > After having dealt with users who fill up disk storage for almost 4 decades > now, I'm fully aware of those advantages. :) > > ( /me ponders when an IBM 2314 disk pack with 27M of space was "a lot" in 1978, > and when we moved 2 IBM mainframes in 1989, 400G took 2,500+ square feet, and > now 8T drives are all over the place...) > > On the flip side, my current project is migrating 5 petabytes of data from our > old archive system that didn't have such rules (mostly due to politics and the > fact that the underlying XFS filesystem uses a 4K blocksize so it wasn't as big > an issue), so I'm stuck with what people put in there years ago. I would be tempted to zip up the directories and move them ziped ;-) JAB. -- Jonathan A. Buzzard Email: jonathan (at) buzzard.me.uk Fife, United Kingdom. From john.hearns at asml.com Tue Jul 25 10:30:28 2017 From: john.hearns at asml.com (John Hearns) Date: Tue, 25 Jul 2017 09:30:28 +0000 Subject: [gpfsug-discuss] GPFS, LTFS/EE and data-in-inode? In-Reply-To: <1500973334.4387.201.camel@buzzard.me.uk> References: <28986.1500671597@turing-police.cc.vt.edu> <1500896590.4387.167.camel@buzzard.me.uk> <17702.1500911347@turing-police.cc.vt.edu> <1500973334.4387.201.camel@buzzard.me.uk> Message-ID: I agree with Jonathan. In my experience, if you look at why there are many small files being stored by researchers, these are either the results of data acquisition - high speed cameras, microscopes, or in my experience a wind tunnel. Or the images are a sequence of images produced by a simulation which are later post-processed into a movie or Ensight/Paraview format. When questioned, the resaechers will always say "but I would like to keep this data available just in case". In reality those files are never looked at again. And as has been said if you have a tape based archiving system you could end up with thousands of small files being spread all over your tapes. So it is legitimate to make zips / tars of directories like that. I am intrigued to see that GPFS has a policy facility which can call an external program. That is useful. -----Original Message----- From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Jonathan Buzzard Sent: Tuesday, July 25, 2017 11:02 AM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] GPFS, LTFS/EE and data-in-inode? On Mon, 2017-07-24 at 11:49 -0400, valdis.kletnieks at vt.edu wrote: > On Mon, 24 Jul 2017 12:43:10 +0100, Jonathan Buzzard said: > > > For an archive service how about only accepting files in actual > > "archive" formats and then severely restricting the number of files > > a user can have? > > > > By archive files I am thinking like a .zip, tar.gz, tar.bz or similar. > > After having dealt with users who fill up disk storage for almost 4 > decades now, I'm fully aware of those advantages. :) > > ( /me ponders when an IBM 2314 disk pack with 27M of space was "a lot" > in 1978, and when we moved 2 IBM mainframes in 1989, 400G took 2,500+ > square feet, and now 8T drives are all over the place...) > > On the flip side, my current project is migrating 5 petabytes of data > from our old archive system that didn't have such rules (mostly due to > politics and the fact that the underlying XFS filesystem uses a 4K > blocksize so it wasn't as big an issue), so I'm stuck with what people put in there years ago. I would be tempted to zip up the directories and move them ziped ;-) JAB. -- Jonathan A. Buzzard Email: jonathan (at) buzzard.me.uk Fife, United Kingdom. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://emea01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fgpfsug.org%2Fmailman%2Flistinfo%2Fgpfsug-discuss&data=01%7C01%7Cjohn.hearns%40asml.com%7Ce8a4016223414177bf9408d4d33bdb31%7Caf73baa8f5944eb2a39d93e96cad61fc%7C1&sdata=pean0PRBgJJmtbZ7TwO%2BxiSvhKsba%2FRGI9VUCxhp6kM%3D&reserved=0 -- The information contained in this communication and any attachments is confidential and may be privileged, and is for the sole use of the intended recipient(s). Any unauthorized review, use, disclosure or distribution is prohibited. Unless explicitly stated otherwise in the body of this communication or the attachment thereto (if any), the information is provided on an AS-IS basis without any express or implied warranties or liabilities. To the extent you are relying on this information, you are doing so at your own risk. If you are not the intended recipient, please notify the sender immediately by replying to this message and destroy all copies of this message and any attachments. Neither the sender nor the company/group of companies he or she represents shall be liable for the proper and complete transmission of the information contained in this communication, or for any delay in its receipt. From jonathan at buzzard.me.uk Tue Jul 25 12:22:49 2017 From: jonathan at buzzard.me.uk (Jonathan Buzzard) Date: Tue, 25 Jul 2017 12:22:49 +0100 Subject: [gpfsug-discuss] GPFS, LTFS/EE and data-in-inode? In-Reply-To: References: <28986.1500671597@turing-police.cc.vt.edu> <1500896590.4387.167.camel@buzzard.me.uk> <17702.1500911347@turing-police.cc.vt.edu> <1500973334.4387.201.camel@buzzard.me.uk> Message-ID: <1500981769.4387.222.camel@buzzard.me.uk> On Tue, 2017-07-25 at 09:30 +0000, John Hearns wrote: > I agree with Jonathan. > > In my experience, if you look at why there are many small files being > stored by researchers, these are either the results of data acquisition > - high speed cameras, microscopes, or in my experience a wind tunnel. > Or the images are a sequence of images produced by a simulation which > are later post-processed into a movie or Ensight/Paraview format. When > questioned, the resaechers will always say "but I would like to keep > this data available just in case". In reality those files are never > looked at again. And as has been said if you have a tape based > archiving system you could end up with thousands of small files being > spread all over your tapes. So it is legitimate to make zips / tars of > directories like that. > Note that rules on data retention may require them to keep them for 10 years, so it is not unreasonable. Letting them spew thousands of files into an "archive" is not sensible. I was thinking of ways of getting the users to do it, and I guess leaving them with zero available file number quota in the new system would force them to zip up their data so they could add new stuff ;-) Archives in my view should have no quota on the space, only quota's on the number of files. Of course that might not be very popular. On reflection I think I would use a policy to restrict to files ending with .zip/.ZIP only. It's an archive and this format is effectively open source, widely understood and cross platform, and with the ZIP64 version will now stand the test of time too. Given it's an archive I would have a script that ran around setting all the files to immutable 7 days after creation too. Or maybe change the ownership and set a readonly ACL to the original user. Need to stop them changing stuff after the event if you are going to use to as part of your anti research fraud measures. JAB. -- Jonathan A. Buzzard Email: jonathan (at) buzzard.me.uk Fife, United Kingdom. From valdis.kletnieks at vt.edu Tue Jul 25 17:11:45 2017 From: valdis.kletnieks at vt.edu (valdis.kletnieks at vt.edu) Date: Tue, 25 Jul 2017 12:11:45 -0400 Subject: [gpfsug-discuss] GPFS, LTFS/EE and data-in-inode? In-Reply-To: <1500973334.4387.201.camel@buzzard.me.uk> References: <28986.1500671597@turing-police.cc.vt.edu> <1500896590.4387.167.camel@buzzard.me.uk> <17702.1500911347@turing-police.cc.vt.edu> <1500973334.4387.201.camel@buzzard.me.uk> Message-ID: <88035.1500999105@turing-police.cc.vt.edu> On Tue, 25 Jul 2017 10:02:14 +0100, Jonathan Buzzard said: > I would be tempted to zip up the directories and move them ziped ;-) Not an option, unless you want to come here and re-write the researcher's tracking systems that knows where they archived a given run, and teach it "Except now it's in a .tar.gz in that directory, or perhaps one or two directories higher up, under some name". Yes, researchers do that. And as the joke goes: "What's the difference between a tenured professor and a terrorist?" "You can negotiate with a terrorist..." Plus remember that most of these directories are currently scattered across multiple tapes, which means "zip up a directory" may mean reading as many as 10 to 20 tapes just to get the directory on disk so you can zip it up. As it is, I had to write code that recall and processes all the files on tape 1, *wherever they are in the file system*, free them from the source disk, recall and process all the files on tape 2, repeat until tape 3,857. (And due to funding issues 5 years ago which turned into a "who paid for what tapes" food fight, most of the tapes ended up with files from entirely different file systems on them, going into different filesets on the destination). (And in fact, the migration is currently hosed up because a researcher *is* doing pretty much that - recalling all the files from one directory, then the next, then the next, to get files they need urgently for a deliverable but haven't been moved to the new system. So rather than having 12 LTO-5 drives to multistream the tape recalls, I've got 12 recalls fighting for one drive while the researcher's processing is hogging the other 11, due to the way the previous system prioritizes in-line opens of files versus bulk recalls) -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 486 bytes Desc: not available URL: From scbatche at us.ibm.com Tue Jul 25 21:46:45 2017 From: scbatche at us.ibm.com (Scott C Batchelder) Date: Tue, 25 Jul 2017 15:46:45 -0500 Subject: [gpfsug-discuss] Baseline testing GPFS with gpfsperf Message-ID: Hello: I am wondering if I can get some more information on the gpfsperf tool for baseline testing GPFS. I want to record GPFS read and write performance for a file system on the cluster before I enable DMAPI and configure the HSM interface. The README for the tool does not offer much insight in how I should run this tool based on the cluster or file system settings. The cluster that I will be running this tool on will not have MPI installed and will have multiple file systems in the cluster. Are there some best practises for running this tool? For example: - Should the number of threads equal the number of NSDs for the file system? or equal to the number of nodes? - If I execute a large multi-threaded run of this tool from a single node in the cluster, will that give me an accurate result of the performance of the file system? Any feedback is appreciated. Thanks. Sincerely, Scott Batchelder Phone: 1-281-883-7926 E-mail: scbatche at us.ibm.com 12301 Kurland Dr Houston, TX 77034-4812 United States -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/gif Size: 2022 bytes Desc: not available URL: From valdis.kletnieks at vt.edu Wed Jul 26 00:59:08 2017 From: valdis.kletnieks at vt.edu (valdis.kletnieks at vt.edu) Date: Tue, 25 Jul 2017 19:59:08 -0400 Subject: [gpfsug-discuss] Baseline testing GPFS with gpfsperf In-Reply-To: References: Message-ID: <13777.1501027148@turing-police.cc.vt.edu> On Tue, 25 Jul 2017 15:46:45 -0500, "Scott C Batchelder" said: > - Should the number of threads equal the number of NSDs for the file > system? or equal to the number of nodes? Depends on what definition of "throughput" you are interested in. If your configuration has 50 clients banging on 5 NSD servers, your numbers for 5 threads and 50 threads are going to tell you subtly different things... (Basically, one thread per NSD is going to tell you the maximum that one client can expect to get with little to no contention, while one per client will tell you about the maximum *aggregate* that all 50 can get together - which is probably still giving each individual client less throughput than one-to-one....) We usually test with "exactly one thread total", "one thread per server", and "keep piling the clients on till the total number doesn't get any bigger". Also be aware that it only gives you insight to your workload performance if your workload is comprised of large file access - if your users are actually doing a lot of medium or small files, that changes the results dramatically as you end up possibly pounding on metadata more than the actual data.... -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 486 bytes Desc: not available URL: From varun.mittal at in.ibm.com Wed Jul 26 04:42:27 2017 From: varun.mittal at in.ibm.com (Varun Mittal3) Date: Wed, 26 Jul 2017 09:12:27 +0530 Subject: [gpfsug-discuss] export nfs share on gpfs with no authentication In-Reply-To: References: <27469.1500914134@turing-police.cc.vt.edu> Message-ID: Hi Did you try to run this command from a CES designated node ? If no, then try executing the command from a CES node: mmuserauth service create --data-access-type file --type userdefined Best regards, Varun Mittal Cloud/Object Scrum @ Spectrum Scale ETZ, Pune From: Ilan Schwarts To: gpfsug main discussion list Date: 25/07/2017 10:22 AM Subject: Re: [gpfsug-discuss] export nfs share on gpfs with no authentication Sent by: gpfsug-discuss-bounces at spectrumscale.org Hi, While trying to add the userdefined auth, I receive error that SMB service not enabled. I am currently working on a spectrum scale cluster, and i dont have the SMB package, I am waiting for it.. is there a way to export NFSv3 using the spectrum scale tools without SMB package ? [root at LH20-GPFS1 ~]# mmuserauth service create --type userdefined : SMB service not enabled. Enable SMB service first. mmcesuserauthcrservice: Command failed. Examine previous error messages to determine cause. I exported the NFS via /etc/exports and than ./exportfs -a .. It works fine, I was able to mount the gpfs export from another machine.. this was my work-around since the spectrum scale tools failed to export NFSv3 On Mon, Jul 24, 2017 at 7:35 PM, wrote: > On Mon, 24 Jul 2017 13:36:41 +0300, Ilan Schwarts said: >> Hi, >> I have gpfs with 2 Nodes (redhat). >> I am trying to create NFS share - So I would be able to mount and >> access it from another linux machine. > >> While trying to create NFS (I execute the following): >> [root at LH20-GPFS1 ~]# mmnfs export add /fs_gpfs01 -c "* >> Access_Type=RW,Protocols=3:4,Squash=no_root_squash)" > > You can get away with little to no authentication for NFSv3, but > not for NFSv4. Try with Protocols=3 only and > > mmuserauth service create --type userdefined > > that should get you Unix-y NFSv3 UID/GID support and "trust what the NFS > client tells you". This of course only works sanely if each NFS export is > only to a set of machines in the same administrative domain that manages their > UID/GIDs. Exporting to two sets of machines that don't coordinate their > UID/GID space is, of course, where hilarity and hijinks ensue.... > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -- - Ilan Schwarts _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: From varun.mittal at in.ibm.com Wed Jul 26 04:44:24 2017 From: varun.mittal at in.ibm.com (Varun Mittal3) Date: Wed, 26 Jul 2017 09:14:24 +0530 Subject: [gpfsug-discuss] export nfs share on gpfs with no authentication In-Reply-To: References: <27469.1500914134@turing-police.cc.vt.edu> Message-ID: Sorry a small typo: mmuserauth service create --data-access-method file --type userdefined Best regards, Varun Mittal Cloud/Object Scrum @ Spectrum Scale ETZ, Pune From: Varun Mittal3/India/IBM To: gpfsug main discussion list Date: 26/07/2017 09:12 AM Subject: Re: [gpfsug-discuss] export nfs share on gpfs with no authentication Hi Did you try to run this command from a CES designated node ? If no, then try executing the command from a CES node: mmuserauth service create --data-access-type file --type userdefined Best regards, Varun Mittal Cloud/Object Scrum @ Spectrum Scale ETZ, Pune From: Ilan Schwarts To: gpfsug main discussion list Date: 25/07/2017 10:22 AM Subject: Re: [gpfsug-discuss] export nfs share on gpfs with no authentication Sent by: gpfsug-discuss-bounces at spectrumscale.org Hi, While trying to add the userdefined auth, I receive error that SMB service not enabled. I am currently working on a spectrum scale cluster, and i dont have the SMB package, I am waiting for it.. is there a way to export NFSv3 using the spectrum scale tools without SMB package ? [root at LH20-GPFS1 ~]# mmuserauth service create --type userdefined : SMB service not enabled. Enable SMB service first. mmcesuserauthcrservice: Command failed. Examine previous error messages to determine cause. I exported the NFS via /etc/exports and than ./exportfs -a .. It works fine, I was able to mount the gpfs export from another machine.. this was my work-around since the spectrum scale tools failed to export NFSv3 On Mon, Jul 24, 2017 at 7:35 PM, wrote: > On Mon, 24 Jul 2017 13:36:41 +0300, Ilan Schwarts said: >> Hi, >> I have gpfs with 2 Nodes (redhat). >> I am trying to create NFS share - So I would be able to mount and >> access it from another linux machine. > >> While trying to create NFS (I execute the following): >> [root at LH20-GPFS1 ~]# mmnfs export add /fs_gpfs01 -c "* >> Access_Type=RW,Protocols=3:4,Squash=no_root_squash)" > > You can get away with little to no authentication for NFSv3, but > not for NFSv4. Try with Protocols=3 only and > > mmuserauth service create --type userdefined > > that should get you Unix-y NFSv3 UID/GID support and "trust what the NFS > client tells you". This of course only works sanely if each NFS export is > only to a set of machines in the same administrative domain that manages their > UID/GIDs. Exporting to two sets of machines that don't coordinate their > UID/GID space is, of course, where hilarity and hijinks ensue.... > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -- - Ilan Schwarts _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: From Mark.Bush at siriuscom.com Wed Jul 26 18:28:55 2017 From: Mark.Bush at siriuscom.com (Mark Bush) Date: Wed, 26 Jul 2017 17:28:55 +0000 Subject: [gpfsug-discuss] Lost disks Message-ID: I have a client has had an issue where all of the nsd disks disappeared in the cluster recently. Not sure if it's due to a back end disk issue or if it's a reboot that did it. But in their PMR they were told that all that data is lost now and that the disk headers didn't appear as GPFS disk headers. How on earth could something like that happen? Could it be a backend disk thing? They are confident that nobody tried to reformat disks but aren't 100% sure that something at the disk array couldn't have caused this. Is there an easy way to see if there is still data on these disks? Short of a full restore from backup what other options might they have? The mmlsnsd -X show's blanks for device and device type now. # mmlsnsd -X Disk name NSD volume ID Device Devtype Node name Remarks --------------------------------------------------------------------------------------------------- INGEST_FILEMGR_xis2301 0A23982E57FD995D - - ingest-filemgr01.a.fXXXXXXX.net (not found) server node INGEST_FILEMGR_xis2301 0A23982E57FD995D - - ingest-filemgr02.a.fXXXXXXX.net (not found) server node INGEST_FILEMGR_xis2302 0A23982E57FD9960 - - ingest-filemgr01.a.fXXXXXXX.net (not found) server node INGEST_FILEMGR_xis2302 0A23982E57FD9960 - - ingest-filemgr02.a.fXXXXXXX.net (not found) server node INGEST_FILEMGR_xis2303 0A23982E57FD9962 - - ingest-filemgr01.a.fXXXXXXX.net (not found) server node Mark This message (including any attachments) is intended only for the use of the individual or entity to which it is addressed and may contain information that is non-public, proprietary, privileged, confidential, and exempt from disclosure under applicable law. If you are not the intended recipient, you are hereby notified that any use, dissemination, distribution, or copying of this communication is strictly prohibited. This message may be viewed by parties at Sirius Computer Solutions other than those named in the message header. This message does not contain an official representation of Sirius Computer Solutions. If you have received this communication in error, notify Sirius Computer Solutions immediately and (i) destroy this message if a facsimile or (ii) delete this message immediately if this is an electronic communication. Thank you. Sirius Computer Solutions -------------- next part -------------- An HTML attachment was scrubbed... URL: From kums at us.ibm.com Wed Jul 26 18:37:45 2017 From: kums at us.ibm.com (Kumaran Rajaram) Date: Wed, 26 Jul 2017 13:37:45 -0400 Subject: [gpfsug-discuss] Baseline testing GPFS with gpfsperf In-Reply-To: <13777.1501027148@turing-police.cc.vt.edu> References: <13777.1501027148@turing-police.cc.vt.edu> Message-ID: Hi Scott, >>- Should the number of threads equal the number of NSDs for the file system? or equal to the number of nodes? >>- If I execute a large multi-threaded run of this tool from a single node in the cluster, will that give me an accurate result of the performance of the file system? To add to Valdis's note, the answer to above also depends on the node, network used for GPFS communication between client and server, as well as storage performance capabilities constituting the GPFS cluster/network/storage stack. As an example, if the storage subsystem (including controller + disks) hosting the file-system can deliver ~20 GB/s and the networking between NSD client and server is FDR 56Gb/s Infiniband (with verbsRdma = ~6GB/s). Assuming, one FDR-IB link (verbsPorts) is configured per NSD server as well as client, then you could need minimum of 4 x NSD servers (4 x 6GB/s ==> 24 GB/s) to saturate the backend storage. So, you would need to run gpfsperf (or anyother parallel I/O benchmark) across minimum of 4 x GPFS NSD clients to saturate the backend storage. You can scale the gpfsperf thread counts (-th parameter) depending on access pattern (buffered/dio etc) but this would only be able to drive load from single NSD client node. If you would like to drive I/O load from multiple NSD client nodes + synchronize the parallel runs across multiple nodes for accuracy, then gpfsperf-mpi would be strongly recommended. You would need to use MPI to launch gpfsperf-mpi across multiple NSD client nodes and scale the MPI processes (across NSD clients with 1 or more MPI process per NSD client) accordingly to drive the I/O load for good performance. >>The cluster that I will be running this tool on will not have MPI installed and will have multiple file systems in the cluster. Without MPI, alternative would be to use ssh or pdsh to launch gpfsperf across multiple nodes however if there are slow NSD clients then the performance may not be accurate (slow clients taking longer and after faster clients finished it will get all the network/storage resources skewing the performance analysis. You may also consider using parallel Iozone as it can be run across multiple node using rsh/ssh with combination of "-+m" and "-t" option. http://iozone.org/docs/IOzone_msword_98.pdf ## -+m filename Use this file to obtain the configuration informati on of the clients for cluster testing. The file contains one line for each client. Each line has th ree fields. The fields are space delimited. A # sign in column zero is a comment line. The first fi eld is the name of the client. The second field is the path, on the client, for the working directory where Iozone will execute. The third field is the path, on the client, for the executable Iozone. To use this option one must be able to execute comm ands on the clients without being challenged for a password. Iozone will start remote execution by using ?rsh" To use ssh, export RSH=/usr/bin/ssh -t # Run Iozone in a throughput mode. This option allows the user to specify how many threads or processes to have active during th e measurement. ## Hope this helps, -Kums From: valdis.kletnieks at vt.edu To: gpfsug main discussion list Date: 07/25/2017 07:59 PM Subject: Re: [gpfsug-discuss] Baseline testing GPFS with gpfsperf Sent by: gpfsug-discuss-bounces at spectrumscale.org On Tue, 25 Jul 2017 15:46:45 -0500, "Scott C Batchelder" said: > - Should the number of threads equal the number of NSDs for the file > system? or equal to the number of nodes? Depends on what definition of "throughput" you are interested in. If your configuration has 50 clients banging on 5 NSD servers, your numbers for 5 threads and 50 threads are going to tell you subtly different things... (Basically, one thread per NSD is going to tell you the maximum that one client can expect to get with little to no contention, while one per client will tell you about the maximum *aggregate* that all 50 can get together - which is probably still giving each individual client less throughput than one-to-one....) We usually test with "exactly one thread total", "one thread per server", and "keep piling the clients on till the total number doesn't get any bigger". Also be aware that it only gives you insight to your workload performance if your workload is comprised of large file access - if your users are actually doing a lot of medium or small files, that changes the results dramatically as you end up possibly pounding on metadata more than the actual data.... [attachment "att0twxd.dat" deleted by Kumaran Rajaram/Arlington/IBM] _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From Robert.Oesterlin at nuance.com Wed Jul 26 18:45:35 2017 From: Robert.Oesterlin at nuance.com (Oesterlin, Robert) Date: Wed, 26 Jul 2017 17:45:35 +0000 Subject: [gpfsug-discuss] Lost disks Message-ID: One way this could possible happen would be a system is being installed (I?m assuming this is Linux) and the FC adapter is active; then the OS install will see disks and wipe out the NSD descriptor on those disks. (Which is why the NSD V2 format was invented, to prevent this from happening) If you don?t lose all of the descriptors, it?s sometimes possible to manually re-construct the missing header information - I?m assuming since you opened a PMR, IBM has looked at this. This is a scenario I?ve had to recover from - twice. Back-end array issue seems unlikely to me, I?d keep looking at the systems with access to those LUNs and see what commands/operations could have been run. Bob Oesterlin Sr Principal Storage Engineer, Nuance From: on behalf of Mark Bush Reply-To: gpfsug main discussion list Date: Wednesday, July 26, 2017 at 12:29 PM To: "gpfsug-discuss at spectrumscale.org" Subject: [EXTERNAL] [gpfsug-discuss] Lost disks I have a client has had an issue where all of the nsd disks disappeared in the cluster recently. Not sure if it?s due to a back end disk issue or if it?s a reboot that did it. But in their PMR they were told that all that data is lost now and that the disk headers didn?t appear as GPFS disk headers. How on earth could something like that happen? Could it be a backend disk thing? They are confident that nobody tried to reformat disks but aren?t 100% sure that something at the disk array couldn?t have caused this. -------------- next part -------------- An HTML attachment was scrubbed... URL: From oehmes at gmail.com Wed Jul 26 19:18:38 2017 From: oehmes at gmail.com (Sven Oehme) Date: Wed, 26 Jul 2017 18:18:38 +0000 Subject: [gpfsug-discuss] Lost disks In-Reply-To: References: Message-ID: it can happen for multiple reasons , one is a linux install, unfortunate there are significant more simpler explanations. Linux as well as BIOS in servers from time to time looks for empty disks and puts a GPT label on it if the disk doesn't have one, etc. this thread is explaining a lot of this : https://www.ibm.com/developerworks/community/forums/html/topic?id=77777777-0000-0000-0000-000014439222 this is why we implemented NSD V2 format long time ago , unfortunate there is no way to convert an V1 NSD to a V2 nsd on an existing filesytem except you remove the NSDs one at a time and re-add them after you upgraded the system to at least GPFS 4.1 (i would recommend a later version like 4.2.3) some more details are here in this thread : https://www.ibm.com/developerworks/community/forums/html/threadTopic?id=5c1ee5bc-41b8-4318-a74e-4d962f82ce2e but a quick summary of the benefits of V2 are : - ? Support for GPT NSD ? - Adds a standard disk partition table (GPT type) to NSDs ? - Disk label support for Linux ? - New GPFS NSD v2 format provides the following benefits: ? - Includes a partition table so that the disk is recognized as a GPFS device ? - Adjusts data alignment to support disks with a 4 KB physical block size ? - Adds backup copies of some key GPFS data structures ? - Expands some reserved areas to allow for future growth the main reason we can't convert from V1 to V2 is the on disk format changed significant so we would have to move on disk data which is very risky. hope that explains this. Sven On Wed, Jul 26, 2017 at 10:29 AM Mark Bush wrote: > I have a client has had an issue where all of the nsd disks disappeared in > the cluster recently. Not sure if it?s due to a back end disk issue or if > it?s a reboot that did it. But in their PMR they were told that all that > data is lost now and that the disk headers didn?t appear as GPFS disk > headers. How on earth could something like that happen? Could it be a > backend disk thing? They are confident that nobody tried to reformat disks > but aren?t 100% sure that something at the disk array couldn?t have caused > this. > > > > Is there an easy way to see if there is still data on these disks? > > Short of a full restore from backup what other options might they have? > > > > The mmlsnsd -X show?s blanks for device and device type now. > > > > # mmlsnsd -X > > > > Disk name NSD volume ID Device Devtype Node > name Remarks > > > --------------------------------------------------------------------------------------------------- > > INGEST_FILEMGR_xis2301 0A23982E57FD995D - - > ingest-filemgr01.a.fXXXXXXX.net (not found) server node > > INGEST_FILEMGR_xis2301 0A23982E57FD995D - - > ingest-filemgr02.a.fXXXXXXX.net (not found) server node > > INGEST_FILEMGR_xis2302 0A23982E57FD9960 - - > ingest-filemgr01.a.fXXXXXXX.net (not found) server node > > INGEST_FILEMGR_xis2302 0A23982E57FD9960 - - > ingest-filemgr02.a.fXXXXXXX.net (not found) server node > > INGEST_FILEMGR_xis2303 0A23982E57FD9962 - - > ingest-filemgr01.a.fXXXXXXX.net (not found) server node > > > > > > *Mark* > > This message (including any attachments) is intended only for the use of > the individual or entity to which it is addressed and may contain > information that is non-public, proprietary, privileged, confidential, and > exempt from disclosure under applicable law. If you are not the intended > recipient, you are hereby notified that any use, dissemination, > distribution, or copying of this communication is strictly prohibited. This > message may be viewed by parties at Sirius Computer Solutions other than > those named in the message header. This message does not contain an > official representation of Sirius Computer Solutions. If you have received > this communication in error, notify Sirius Computer Solutions immediately > and (i) destroy this message if a facsimile or (ii) delete this message > immediately if this is an electronic communication. Thank you. > Sirius Computer Solutions > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: From Mark.Bush at siriuscom.com Wed Jul 26 19:19:15 2017 From: Mark.Bush at siriuscom.com (Mark Bush) Date: Wed, 26 Jul 2017 18:19:15 +0000 Subject: [gpfsug-discuss] Lost disks In-Reply-To: References: Message-ID: What is this manual header reconstruction you speak of? That doesn?t sound trivial at all. From: Oesterlin, Robert [mailto:Robert.Oesterlin at nuance.com] Sent: Wednesday, July 26, 2017 12:46 PM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] Lost disks One way this could possible happen would be a system is being installed (I?m assuming this is Linux) and the FC adapter is active; then the OS install will see disks and wipe out the NSD descriptor on those disks. (Which is why the NSD V2 format was invented, to prevent this from happening) If you don?t lose all of the descriptors, it?s sometimes possible to manually re-construct the missing header information - I?m assuming since you opened a PMR, IBM has looked at this. This is a scenario I?ve had to recover from - twice. Back-end array issue seems unlikely to me, I?d keep looking at the systems with access to those LUNs and see what commands/operations could have been run. Bob Oesterlin Sr Principal Storage Engineer, Nuance From: > on behalf of Mark Bush > Reply-To: gpfsug main discussion list > Date: Wednesday, July 26, 2017 at 12:29 PM To: "gpfsug-discuss at spectrumscale.org" > Subject: [EXTERNAL] [gpfsug-discuss] Lost disks I have a client has had an issue where all of the nsd disks disappeared in the cluster recently. Not sure if it?s due to a back end disk issue or if it?s a reboot that did it. But in their PMR they were told that all that data is lost now and that the disk headers didn?t appear as GPFS disk headers. How on earth could something like that happen? Could it be a backend disk thing? They are confident that nobody tried to reformat disks but aren?t 100% sure that something at the disk array couldn?t have caused this. This message (including any attachments) is intended only for the use of the individual or entity to which it is addressed and may contain information that is non-public, proprietary, privileged, confidential, and exempt from disclosure under applicable law. If you are not the intended recipient, you are hereby notified that any use, dissemination, distribution, or copying of this communication is strictly prohibited. This message may be viewed by parties at Sirius Computer Solutions other than those named in the message header. This message does not contain an official representation of Sirius Computer Solutions. If you have received this communication in error, notify Sirius Computer Solutions immediately and (i) destroy this message if a facsimile or (ii) delete this message immediately if this is an electronic communication. Thank you. Sirius Computer Solutions -------------- next part -------------- An HTML attachment was scrubbed... URL: From Robert.Oesterlin at nuance.com Wed Jul 26 20:05:59 2017 From: Robert.Oesterlin at nuance.com (Oesterlin, Robert) Date: Wed, 26 Jul 2017 19:05:59 +0000 Subject: [gpfsug-discuss] Lost disks In-Reply-To: References: Message-ID: IBM has a procedure for it that may work in some cases, but you?re manually editing the NSD descriptors on disk. Contact IBM if you think an NSD has been lost to descriptor being re-written. Bob Oesterlin Sr Principal Storage Engineer, Nuance From: on behalf of Mark Bush Reply-To: gpfsug main discussion list Date: Wednesday, July 26, 2017 at 1:19 PM To: gpfsug main discussion list Subject: [EXTERNAL] Re: [gpfsug-discuss] Lost disks What is this manual header reconstruction you speak of? That doesn?t sound trivial at all. -------------- next part -------------- An HTML attachment was scrubbed... URL: From john.hearns at asml.com Thu Jul 27 11:39:28 2017 From: john.hearns at asml.com (John Hearns) Date: Thu, 27 Jul 2017 10:39:28 +0000 Subject: [gpfsug-discuss] Lost disks In-Reply-To: References: Message-ID: Mark, I once rescued a system which had the disk partition on the OS disks deleted. (This was a system with a device mapper RAID pair of OS disks). Download a copy of sysrescue http://www.system-rescue-cd.org/ and create a bootable USB stick (or network boot). When you boot the system in sysrescue it has a utility to scan disks which will identify existing partitions, even if the partition table has been erased. I can?t say if this will do anything with the disks in your system, but this is certainly worth a try if you suspect that the data is all still on disk. From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Mark Bush Sent: Wednesday, July 26, 2017 8:19 PM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] Lost disks What is this manual header reconstruction you speak of? That doesn?t sound trivial at all. From: Oesterlin, Robert [mailto:Robert.Oesterlin at nuance.com] Sent: Wednesday, July 26, 2017 12:46 PM To: gpfsug main discussion list > Subject: Re: [gpfsug-discuss] Lost disks One way this could possible happen would be a system is being installed (I?m assuming this is Linux) and the FC adapter is active; then the OS install will see disks and wipe out the NSD descriptor on those disks. (Which is why the NSD V2 format was invented, to prevent this from happening) If you don?t lose all of the descriptors, it?s sometimes possible to manually re-construct the missing header information - I?m assuming since you opened a PMR, IBM has looked at this. This is a scenario I?ve had to recover from - twice. Back-end array issue seems unlikely to me, I?d keep looking at the systems with access to those LUNs and see what commands/operations could have been run. Bob Oesterlin Sr Principal Storage Engineer, Nuance From: > on behalf of Mark Bush > Reply-To: gpfsug main discussion list > Date: Wednesday, July 26, 2017 at 12:29 PM To: "gpfsug-discuss at spectrumscale.org" > Subject: [EXTERNAL] [gpfsug-discuss] Lost disks I have a client has had an issue where all of the nsd disks disappeared in the cluster recently. Not sure if it?s due to a back end disk issue or if it?s a reboot that did it. But in their PMR they were told that all that data is lost now and that the disk headers didn?t appear as GPFS disk headers. How on earth could something like that happen? Could it be a backend disk thing? They are confident that nobody tried to reformat disks but aren?t 100% sure that something at the disk array couldn?t have caused this. This message (including any attachments) is intended only for the use of the individual or entity to which it is addressed and may contain information that is non-public, proprietary, privileged, confidential, and exempt from disclosure under applicable law. If you are not the intended recipient, you are hereby notified that any use, dissemination, distribution, or copying of this communication is strictly prohibited. This message may be viewed by parties at Sirius Computer Solutions other than those named in the message header. This message does not contain an official representation of Sirius Computer Solutions. If you have received this communication in error, notify Sirius Computer Solutions immediately and (i) destroy this message if a facsimile or (ii) delete this message immediately if this is an electronic communication. Thank you. Sirius Computer Solutions -- The information contained in this communication and any attachments is confidential and may be privileged, and is for the sole use of the intended recipient(s). Any unauthorized review, use, disclosure or distribution is prohibited. Unless explicitly stated otherwise in the body of this communication or the attachment thereto (if any), the information is provided on an AS-IS basis without any express or implied warranties or liabilities. To the extent you are relying on this information, you are doing so at your own risk. If you are not the intended recipient, please notify the sender immediately by replying to this message and destroy all copies of this message and any attachments. Neither the sender nor the company/group of companies he or she represents shall be liable for the proper and complete transmission of the information contained in this communication, or for any delay in its receipt. -------------- next part -------------- An HTML attachment was scrubbed... URL: From jonathan at buzzard.me.uk Thu Jul 27 11:58:08 2017 From: jonathan at buzzard.me.uk (Jonathan Buzzard) Date: Thu, 27 Jul 2017 11:58:08 +0100 Subject: [gpfsug-discuss] Lost disks In-Reply-To: References: Message-ID: <1501153088.26563.39.camel@buzzard.me.uk> On Wed, 2017-07-26 at 17:45 +0000, Oesterlin, Robert wrote: > One way this could possible happen would be a system is being > installed (I?m assuming this is Linux) and the FC adapter is active; > then the OS install will see disks and wipe out the NSD descriptor on > those disks. (Which is why the NSD V2 format was invented, to prevent > this from happening) If you don?t lose all of the descriptors, it?s > sometimes possible to manually re-construct the missing header > information - I?m assuming since you opened a PMR, IBM has looked at > this. This is a scenario I?ve had to recover from - twice. Back-end > array issue seems unlikely to me, I?d keep looking at the systems with > access to those LUNs and see what commands/operations could have been > run. I would concur that this is the most likely scenario; an install where for whatever reason the machine could see the disks and they are gone. I know that RHEL6 and its derivatives will do that for you. Has happened to me at previous place of work where another admin forgot to de-zone a server, went to install CentOS6 as part of a cluster upgrade from CentOS5 and overwrote all the NSD descriptors. Thing is GPFS does not look at the NSD descriptors that much. So in my case it was several days before it was noticed, and only then because I rebooted the last NSD server as part of a rolling upgrade of GPFS. I could have cruised for weeks/months with no NSD descriptors if I had not restarted all the NSD servers. The moral of this is the overwrite could have take place quite some time ago. Basically if the disks are all missing then the NSD descriptor has been overwritten, and the protestations of the client are irrelevant. The chances of the disk array doing it to *ALL* the disks is somewhere around ? IMHO. JAB. -- Jonathan A. Buzzard Email: jonathan (at) buzzard.me.uk Fife, United Kingdom. From richard.rupp at us.ibm.com Thu Jul 27 12:28:35 2017 From: richard.rupp at us.ibm.com (RICHARD RUPP) Date: Thu, 27 Jul 2017 07:28:35 -0400 Subject: [gpfsug-discuss] Lost disks In-Reply-To: References: Message-ID: If you are under IBM support, leverage IBM for help. A third party utility has the possibility of making it worse. From: John Hearns To: gpfsug main discussion list Date: 07/27/2017 06:40 AM Subject: Re: [gpfsug-discuss] Lost disks Sent by: gpfsug-discuss-bounces at spectrumscale.org Mark, I once rescued a system which had the disk partition on the OS disks deleted. (This was a system with a device mapper RAID pair of OS disks). Download a copy of sysrescue http://www.system-rescue-cd.org/ and create a bootable USB stick (or network boot). When you boot the system in sysrescue it has a utility to scan disks which will identify existing partitions, even if the partition table has been erased. I can?t say if this will do anything with the disks in your system, but this is certainly worth a try if you suspect that the data is all still on disk. From: gpfsug-discuss-bounces at spectrumscale.org [ mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Mark Bush Sent: Wednesday, July 26, 2017 8:19 PM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] Lost disks What is this manual header reconstruction you speak of? That doesn?t sound trivial at all. From: Oesterlin, Robert [mailto:Robert.Oesterlin at nuance.com] Sent: Wednesday, July 26, 2017 12:46 PM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] Lost disks One way this could possible happen would be a system is being installed (I?m assuming this is Linux) and the FC adapter is active; then the OS install will see disks and wipe out the NSD descriptor on those disks. (Which is why the NSD V2 format was invented, to prevent this from happening) If you don?t lose all of the descriptors, it?s sometimes possible to manually re-construct the missing header information - I?m assuming since you opened a PMR, IBM has looked at this. This is a scenario I?ve had to recover from - twice. Back-end array issue seems unlikely to me, I?d keep looking at the systems with access to those LUNs and see what commands/operations could have been run. Bob Oesterlin Sr Principal Storage Engineer, Nuance From: on behalf of Mark Bush < Mark.Bush at siriuscom.com> Reply-To: gpfsug main discussion list Date: Wednesday, July 26, 2017 at 12:29 PM To: "gpfsug-discuss at spectrumscale.org" Subject: [EXTERNAL] [gpfsug-discuss] Lost disks I have a client has had an issue where all of the nsd disks disappeared in the cluster recently. Not sure if it?s due to a back end disk issue or if it?s a reboot that did it. But in their PMR they were told that all that data is lost now and that the disk headers didn?t appear as GPFS disk headers. How on earth could something like that happen? Could it be a backend disk thing? They are confident that nobody tried to reformat disks but aren?t 100% sure that something at the disk array couldn?t have caused this. This message (including any attachments) is intended only for the use of the individual or entity to which it is addressed and may contain information that is non-public, proprietary, privileged, confidential, and exempt from disclosure under applicable law. If you are not the intended recipient, you are hereby notified that any use, dissemination, distribution, or copying of this communication is strictly prohibited. This message may be viewed by parties at Sirius Computer Solutions other than those named in the message header. This message does not contain an official representation of Sirius Computer Solutions. If you have received this communication in error, notify Sirius Computer Solutions immediately and (i) destroy this message if a facsimile or (ii) delete this message immediately if this is an electronic communication. Thank you. Sirius Computer Solutions -- The information contained in this communication and any attachments is confidential and may be privileged, and is for the sole use of the intended recipient(s). Any unauthorized review, use, disclosure or distribution is prohibited. Unless explicitly stated otherwise in the body of this communication or the attachment thereto (if any), the information is provided on an AS-IS basis without any express or implied warranties or liabilities. To the extent you are relying on this information, you are doing so at your own risk. If you are not the intended recipient, please notify the sender immediately by replying to this message and destroy all copies of this message and any attachments. Neither the sender nor the company/group of companies he or she represents shall be liable for the proper and complete transmission of the information contained in this communication, or for any delay in its receipt. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: From jonathan.buzzard at strath.ac.uk Thu Jul 27 12:58:50 2017 From: jonathan.buzzard at strath.ac.uk (Jonathan Buzzard) Date: Thu, 27 Jul 2017 12:58:50 +0100 Subject: [gpfsug-discuss] Lost disks In-Reply-To: References: Message-ID: <1501156730.26563.49.camel@strath.ac.uk> On Thu, 2017-07-27 at 07:28 -0400, RICHARD RUPP wrote: > If you are under IBM support, leverage IBM for help. A third party > utility has the possibility of making it worse. > The chances of recovery are slim in the first place from this sort of problem. At least with v1 NSD descriptors. Further IBM have *ALREADY* told him the data is lost, I quote But in their PMR they were told that all that data is lost now and that the disk headers didn?t appear as GPFS disk headers. So in this scenario you have little to loose trying something because you are now on your own. Worst case scenario is that whatever you try does not work, which leave you no worse of than you are now. Well apart from lost time for the restore, but you might have started that already to somewhere else. I was once told by IBM (nine years ago now) that my GPFS file system was caput and to arrange a restore from tape. At which point some fiddling by myself fixed the problem and a 100TB restore was no longer required. However this was not due to overwritten NSD descriptors. When that happened the two file systems effected had to be restored. Well bizarrely one was still mounted and I was able to rsync the data off. However the point is that at this stage fiddling with third party tools is the only option left. JAB. -- Jonathan A. Buzzard Tel: +44141-5483420 HPC System Administrator, ARCHIE-WeSt. University of Strathclyde, John Anderson Building, Glasgow. G4 0NG From UWEFALKE at de.ibm.com Thu Jul 27 15:18:02 2017 From: UWEFALKE at de.ibm.com (Uwe Falke) Date: Thu, 27 Jul 2017 16:18:02 +0200 Subject: [gpfsug-discuss] Lost disks In-Reply-To: <1501156730.26563.49.camel@strath.ac.uk> References: <1501156730.26563.49.camel@strath.ac.uk> Message-ID: "Just doing something" makes things worse usually. Whether a 3rd party tool knows how to handle GPFS NSDs can be doubted (as long as it is not dedicated to that purpose). First, I'd look what is actually on the sectors where the NSD headers used to be, and try to find whether data beyond that area were also modified (if the latter is the case, restoring the NSDs does not make much sense as data and/or metadata (depending on disk usage) would also be corrupted. If you are sure that just the NSD header area has been affected, you might try to trick GPFS in getting just the information into the header area needed that GPFS recognises the devices as the NSDs they were. The first 4 kiB of a v1 NSD from a VM on my laptop look like $ cat nsdv1head | od --address-radix=x -xc 000000 0000 0000 0000 0000 0000 0000 0000 0000 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 * 000200 cf70 4192 0000 0100 0000 3000 e930 a028 p 317 222 A \0 \0 \0 001 \0 \0 \0 0 0 351 ( 240 000210 a8c0 ce7a a251 1f92 a251 1a92 0000 0800 300 250 z 316 Q 242 222 037 Q 242 222 032 \0 \0 \0 \b 000220 0000 f20f 0000 0000 0000 0000 0000 0000 \0 \0 017 362 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 000230 0000 0000 0000 0000 0000 0000 0000 0000 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 * 000400 93d2 7885 0000 0100 0000 0002 141e 64a8 322 223 205 x \0 \0 \0 001 \0 \0 002 \0 036 024 250 d 000410 a8c0 ce7a a251 3490 0000 fa0f 0000 0800 300 250 z 316 Q 242 220 4 \0 \0 017 372 \0 \0 \0 \b 000420 0000 0000 0000 0000 0000 0000 0000 0000 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 * 000480 534e 2044 6564 6373 6972 7470 726f 6620 N S D d e s c r i p t o r f 000490 726f 2f20 6564 2f76 6476 2062 7263 6165 o r / d e v / v d b c r e a 0004a0 6574 2064 7962 4720 4650 2053 6f4d 206e t e d b y G P F S M o n 0004b0 614d 2079 3732 3020 3a30 3434 303a 2034 M a y 2 7 0 0 : 4 4 : 0 4 0004c0 3032 3331 000a 0000 0000 0000 0000 0000 2 0 1 3 \n \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 0004d0 0000 0000 0000 0000 0000 0000 0000 0000 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 * 000e00 4c5f 4d56 0000 017d 0000 017d 0000 017d _ L V M \0 \0 } 001 \0 \0 } 001 \0 \0 } 001 000e10 0000 017d 0000 0000 0000 0000 0000 0000 \0 \0 } 001 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 000e20 0000 0000 0000 0000 0000 0000 0000 0000 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 000e30 0000 0000 0000 0000 0000 0000 017d 0000 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 } 001 \0 \0 000e40 0000 0000 0000 0000 0000 0000 0000 0000 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 * 001000 I suppose, the important area starts at 0x0200 (ie. with the second 512Byte sector) and ends at 0x04df (which would be within the 3rd 512Bytes sector, hence the 2nd and 3rd sectors appear crucial). I think that there is some more space before the payload area starts. Without knowledge what exactly has to go into the header, I'd try to create an NSD on one or two (new) disks, save the headers, then create an FS on them, save the headers again, check if anything has changed. So, creating some new NSDs, checking what keys might appear there and in the cluster configuration could get you very close to craft the header information which is gone. Of course, that depends on how dear the data on the gone FS AKA SG are and how hard it'd be to rebuild them otherwise (replay from backup, recalculate, ...) It seems not a bad idea to set aside the NSD headers of your NSDs in a back up :-) And also now: Before amending any blocks on your disks, save them! Mit freundlichen Gr??en / Kind regards Dr. Uwe Falke IT Specialist High Performance Computing Services / Integrated Technology Services / Data Center Services ------------------------------------------------------------------------------------------------------------------------------------------- IBM Deutschland Rathausstr. 7 09111 Chemnitz Phone: +49 371 6978 2165 Mobile: +49 175 575 2877 E-Mail: uwefalke at de.ibm.com ------------------------------------------------------------------------------------------------------------------------------------------- IBM Deutschland Business & Technology Services GmbH / Gesch?ftsf?hrung: Andreas Hasse, Thomas Wolter Sitz der Gesellschaft: Ehningen / Registergericht: Amtsgericht Stuttgart, HRB 17122 From: Jonathan Buzzard To: gpfsug main discussion list Date: 07/27/2017 01:59 PM Subject: Re: [gpfsug-discuss] Lost disks Sent by: gpfsug-discuss-bounces at spectrumscale.org On Thu, 2017-07-27 at 07:28 -0400, RICHARD RUPP wrote: > If you are under IBM support, leverage IBM for help. A third party > utility has the possibility of making it worse. > The chances of recovery are slim in the first place from this sort of problem. At least with v1 NSD descriptors. Further IBM have *ALREADY* told him the data is lost, I quote But in their PMR they were told that all that data is lost now and that the disk headers didn?t appear as GPFS disk headers. So in this scenario you have little to loose trying something because you are now on your own. Worst case scenario is that whatever you try does not work, which leave you no worse of than you are now. Well apart from lost time for the restore, but you might have started that already to somewhere else. I was once told by IBM (nine years ago now) that my GPFS file system was caput and to arrange a restore from tape. At which point some fiddling by myself fixed the problem and a 100TB restore was no longer required. However this was not due to overwritten NSD descriptors. When that happened the two file systems effected had to be restored. Well bizarrely one was still mounted and I was able to rsync the data off. However the point is that at this stage fiddling with third party tools is the only option left. JAB. -- Jonathan A. Buzzard Tel: +44141-5483420 HPC System Administrator, ARCHIE-WeSt. University of Strathclyde, John Anderson Building, Glasgow. G4 0NG _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss From jonathan.buzzard at strath.ac.uk Thu Jul 27 16:09:31 2017 From: jonathan.buzzard at strath.ac.uk (Jonathan Buzzard) Date: Thu, 27 Jul 2017 16:09:31 +0100 Subject: [gpfsug-discuss] Lost disks In-Reply-To: References: <1501156730.26563.49.camel@strath.ac.uk> Message-ID: <1501168171.26563.56.camel@strath.ac.uk> On Thu, 2017-07-27 at 16:18 +0200, Uwe Falke wrote: > "Just doing something" makes things worse usually. Whether a 3rd > party tool knows how to handle GPFS NSDs can be doubted (as long as it > is not dedicated to that purpose). It might usually, but IBM have *ALREADY* given up in this case and told the customer their data is toast. Under these circumstances other than wasting time that could have been spent profitably on a restore it is *IMPOSSIBLE* to make the situation worse. [SNIP] > It seems not a bad idea to set aside the NSD headers of your NSDs in a > back up :-) > And also now: Before amending any blocks on your disks, save them! > It's called NSD v2 descriptor format, so rather than use raw disks they are in a GPT partition, and for good measure a backup copy is stored at the end of the disk too. Personally if I had any v1 NSD's in a file system I would have a plan for a series of mmdeldisk/mmcrnsd/mmadddisk to get them all to v2 sooner rather than later. JAB. -- Jonathan A. Buzzard Tel: +44141-5483420 HPC System Administrator, ARCHIE-WeSt. University of Strathclyde, John Anderson Building, Glasgow. G4 0NG From Robert.Oesterlin at nuance.com Thu Jul 27 16:28:02 2017 From: Robert.Oesterlin at nuance.com (Oesterlin, Robert) Date: Thu, 27 Jul 2017 15:28:02 +0000 Subject: [gpfsug-discuss] NSD V2 vs V1 - how can you tell? Message-ID: <58646EEA-0AD2-4EC8-8A05-070064C35F2E@nuance.com> I?m sure I have a mix of V1 and V2 NSDs - how can I tell what the format each is? Bob Oesterlin Sr Principal Storage Engineer, Nuance -------------- next part -------------- An HTML attachment was scrubbed... URL: From UWEFALKE at de.ibm.com Thu Jul 27 16:51:29 2017 From: UWEFALKE at de.ibm.com (Uwe Falke) Date: Thu, 27 Jul 2017 17:51:29 +0200 Subject: [gpfsug-discuss] Lost disks In-Reply-To: <1501168171.26563.56.camel@strath.ac.uk> References: <1501156730.26563.49.camel@strath.ac.uk> <1501168171.26563.56.camel@strath.ac.uk> Message-ID: gpfsug-discuss-bounces at spectrumscale.org wrote on 07/27/2017 05:09:31 PM: > From: Jonathan Buzzard > To: gpfsug main discussion list > Date: 07/27/2017 05:09 PM > Subject: Re: [gpfsug-discuss] Lost disks > Sent by: gpfsug-discuss-bounces at spectrumscale.org > > On Thu, 2017-07-27 at 16:18 +0200, Uwe Falke wrote: > > > "Just doing something" makes things worse usually. Whether a 3rd > > party tool knows how to handle GPFS NSDs can be doubted (as long as it > > is not dedicated to that purpose). > > It might usually, but IBM have *ALREADY* given up in this case and told > the customer their data is toast. Under these circumstances other than > wasting time that could have been spent profitably on a restore it is > *IMPOSSIBLE* to make the situation worse. SCNR: It is always possible to make things worse. However, of course, if the efforts to do research on that system appear too expensive compared to the possible gain, then it is wise to give up and restore data from backup to a new file system. > > [SNIP] > > > It seems not a bad idea to set aside the NSD headers of your NSDs in a > > back up :-) > > And also now: Before amending any blocks on your disks, save them! > > > > It's called NSD v2 descriptor format, so rather than use raw disks they > are in a GPT partition, and for good measure a backup copy is stored at > the end of the disk too. > > Personally if I had any v1 NSD's in a file system I would have a plan > for a series of mmdeldisk/mmcrnsd/mmadddisk to get them all to v2 sooner > rather than later. Yep, but I suppose the gone NSDs were v1. Then, there might be some restrictions blocking the move from NSDv1 to NSDv2 (old FS level still req.ed, or just the hugeness of a file system). And you never know, if some tool runs wild due to logical failures it overwrites all GPT copies on a disk and you're lost again (but of course NSDv2 has been a tremendous step ahead). Mit freundlichen Gr??en / Kind regards Dr. Uwe Falke IT Specialist High Performance Computing Services / Integrated Technology Services / Data Center Services ------------------------------------------------------------------------------------------------------------------------------------------- IBM Deutschland Rathausstr. 09111 Chemnitz Phone: +49 371 6978 2165 Mobile: +49 175 575 2877 E-Mail: uwefalke at de.ibm.com ------------------------------------------------------------------------------------------------------------------------------------------- IBM Deutschland Business & Technology Services GmbH / Gesch?ftsf?hrung: Andreas Hasse, Thomas Wolter Sitz der Gesellschaft: Ehningen / Registergericht: Amtsgericht Stuttgart, HRB 17122 From luke.raimbach at googlemail.com Thu Jul 27 17:09:42 2017 From: luke.raimbach at googlemail.com (Luke Raimbach) Date: Thu, 27 Jul 2017 16:09:42 +0000 Subject: [gpfsug-discuss] NSD V2 vs V1 - how can you tell? In-Reply-To: <58646EEA-0AD2-4EC8-8A05-070064C35F2E@nuance.com> References: <58646EEA-0AD2-4EC8-8A05-070064C35F2E@nuance.com> Message-ID: mmfsadm test readdescraw On Thu, 27 Jul 2017, 16:28 Oesterlin, Robert, wrote: > I?m sure I have a mix of V1 and V2 NSDs - how can I tell what the format > each is? > > > > Bob Oesterlin > Sr Principal Storage Engineer, Nuance > > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: From Robert.Oesterlin at nuance.com Thu Jul 27 17:17:20 2017 From: Robert.Oesterlin at nuance.com (Oesterlin, Robert) Date: Thu, 27 Jul 2017 16:17:20 +0000 Subject: [gpfsug-discuss] NSD V2 vs V1 - how can you tell? Message-ID: <50669E00-32A8-4AC7-A729-CB961F96ECAE@nuance.com> Right - but what field do I look at? Bob Oesterlin Sr Principal Storage Engineer, Nuance From: on behalf of Luke Raimbach Reply-To: gpfsug main discussion list Date: Thursday, July 27, 2017 at 11:10 AM To: gpfsug main discussion list Subject: [EXTERNAL] Re: [gpfsug-discuss] NSD V2 vs V1 - how can you tell? mmfsadm test readdescraw -------------- next part -------------- An HTML attachment was scrubbed... URL: From jonathan.buzzard at strath.ac.uk Thu Jul 27 19:26:45 2017 From: jonathan.buzzard at strath.ac.uk (Jonathan Buzzard) Date: Thu, 27 Jul 2017 19:26:45 +0100 Subject: [gpfsug-discuss] Lost disks In-Reply-To: References: <1501156730.26563.49.camel@strath.ac.uk> <1501168171.26563.56.camel@strath.ac.uk> Message-ID: <3ea4371f-49c9-1236-44e2-d44f39e27c9e@strath.ac.uk> On 27/07/17 16:51, Uwe Falke wrote: [SNIP] > SCNR: It is always possible to make things worse. > However, of course, if the efforts to do research on that system appear > too expensive compared to the possible gain, then it is wise to give up > and restore data from backup to a new file system. > Explain to me when IBM have washed their hands of the situation; that is they deem the file system unrecoverable and will take no further action to help the customer, how under these circumstances it is possible for it to get any worse attempting to recover the situation yourself? The answer is you can't so and are talking complete codswallop. In general you are right, in this situation you are utterly and totally wrong. JAB. -- Jonathan A. Buzzard Tel: +44141-5483420 HPC System Administrator, ARCHIE-WeSt. University of Strathclyde, John Anderson Building, Glasgow. G4 0NG From chair at spectrumscale.org Thu Jul 27 21:19:15 2017 From: chair at spectrumscale.org (Spectrum Scale UG Chair (Simon Thompson)) Date: Thu, 27 Jul 2017 21:19:15 +0100 Subject: [gpfsug-discuss] Lost disks In-Reply-To: <3ea4371f-49c9-1236-44e2-d44f39e27c9e@strath.ac.uk> References: <1501156730.26563.49.camel@strath.ac.uk> <1501168171.26563.56.camel@strath.ac.uk> <3ea4371f-49c9-1236-44e2-d44f39e27c9e@strath.ac.uk> Message-ID: Guys, this is supposed to be a community mailing list where people can come and ask questions and we can have healthy debate, but please can we keep it calm? Thanks Simon Group Chair From sfadden at us.ibm.com Thu Jul 27 21:33:19 2017 From: sfadden at us.ibm.com (Scott Fadden) Date: Thu, 27 Jul 2017 20:33:19 +0000 Subject: [gpfsug-discuss] NSD V2 vs V1 - how can you tell? In-Reply-To: References: , <58646EEA-0AD2-4EC8-8A05-070064C35F2E@nuance.com> Message-ID: An HTML attachment was scrubbed... URL: From jonathan.buzzard at strath.ac.uk Fri Jul 28 00:29:47 2017 From: jonathan.buzzard at strath.ac.uk (Jonathan Buzzard) Date: Fri, 28 Jul 2017 00:29:47 +0100 Subject: [gpfsug-discuss] NSD V2 vs V1 - how can you tell? In-Reply-To: <58646EEA-0AD2-4EC8-8A05-070064C35F2E@nuance.com> References: <58646EEA-0AD2-4EC8-8A05-070064C35F2E@nuance.com> Message-ID: On 27/07/17 16:28, Oesterlin, Robert wrote: > I?m sure I have a mix of V1 and V2 NSDs - how can I tell what the format > each is? Well on anything approaching a recent Linux lsblk should as I understand it should show GPT partitions on v2 NSD's. Normally a v1 NSD would show up as a raw block device. I guess you could have created the v1 NSD's inside a partition but that was not normal practice. JAB. -- Jonathan A. Buzzard Tel: +44141-5483420 HPC System Administrator, ARCHIE-WeSt. University of Strathclyde, John Anderson Building, Glasgow. G4 0NG From daniel.kidger at uk.ibm.com Fri Jul 28 12:03:40 2017 From: daniel.kidger at uk.ibm.com (Daniel Kidger) Date: Fri, 28 Jul 2017 11:03:40 +0000 Subject: [gpfsug-discuss] Lost disks In-Reply-To: References: , <1501156730.26563.49.camel@strath.ac.uk><1501168171.26563.56.camel@strath.ac.uk><3ea4371f-49c9-1236-44e2-d44f39e27c9e@strath.ac.uk> Message-ID: An HTML attachment was scrubbed... URL: From Robert.Oesterlin at nuance.com Fri Jul 28 12:46:47 2017 From: Robert.Oesterlin at nuance.com (Oesterlin, Robert) Date: Fri, 28 Jul 2017 11:46:47 +0000 Subject: [gpfsug-discuss] NSD V2 vs V1 - how can you tell? Message-ID: <6DDEF46C-D30E-4C68-B569-770DE04A9D5B@nuance.com> Hi Scott This refers to the file system format which is independent of the NSD version number. File systems can be upgraded but all the NSDs are still at V1. For instance, here is an NSD I know is still V1: [root at gpfs2-gpfs01 ~]# grep msa0319VOL2 volmaps msa0319VOL2 mpathel (3600c0ff0001497e259ebac5001000000) dm-19 14T sdad 0[active][ready] sdft 1[active][ready] sdam 2[active][ready] sdgc 3[active][ready] [root at gpfs2-gpfs01 ~]# mmfsadm test readdescraw /dev/dm-19 | grep " original format" original format version 1001, cur version 1600 (mgr 1600, helper 1600, mnode 1600) The file system version is current however. Bob Oesterlin Sr Principal Storage Engineer, Nuance From: on behalf of Scott Fadden Reply-To: gpfsug main discussion list Date: Thursday, July 27, 2017 at 3:33 PM To: "gpfsug-discuss at spectrumscale.org" Cc: "gpfsug-discuss at spectrumscale.org" Subject: [EXTERNAL] Re: [gpfsug-discuss] NSD V2 vs V1 - how can you tell? # mmfsadm test readdescraw /dev/dm-14 | grep " original format" original format version 1600, cur version 1700 (mgr 1700, helper 1700, mnode 1700) The harder part is what version number = v2 and what matches version 1. The real answer is there is not a simple one, it is not really v1 vs v2 it is what feature you are interested in. Just one small example 4K Disk SECTOR support started in 1403 Dynamically enabling quotas started in 1404 -------------- next part -------------- An HTML attachment was scrubbed... URL: From Robert.Oesterlin at nuance.com Fri Jul 28 13:44:11 2017 From: Robert.Oesterlin at nuance.com (Oesterlin, Robert) Date: Fri, 28 Jul 2017 12:44:11 +0000 Subject: [gpfsug-discuss] LROC example Message-ID: <8103C497-EFA2-41E3-A047-4C3A3AA3EC0B@nuance.com> For those of you considering LROC, you may find this interesting. LROC can be very effective in some job mixes, as shown below. This is in a compute cluster of about 400 nodes. Each compute node has a 100GB LROC. In this particular job mix, LROC was recalling 3-4 times the traffic that was going to the NSDs. I see other cases where?s it?s less effective. [cid:image001.png at 01D30775.4ACF3D20] Bob Oesterlin Sr Principal Storage Engineer, Nuance -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image001.png Type: image/png Size: 54425 bytes Desc: image001.png URL: From knop at us.ibm.com Fri Jul 28 13:44:26 2017 From: knop at us.ibm.com (Felipe Knop) Date: Fri, 28 Jul 2017 08:44:26 -0400 Subject: [gpfsug-discuss] NSD V2 vs V1 - how can you tell? In-Reply-To: <6DDEF46C-D30E-4C68-B569-770DE04A9D5B@nuance.com> References: <6DDEF46C-D30E-4C68-B569-770DE04A9D5B@nuance.com> Message-ID: Bob, I believe the NSD format version (v1 vs v2) is shown in the " format version" line that starts with "NSDid" : # mmfsadm test readdescraw /dev/dm-11 NSD descriptor in sector 64 of /dev/dm-11 NSDid: 9461C0A85788693A format version: 1403 Label: It should say "1403" when the format is v2. Felipe ---- Felipe Knop knop at us.ibm.com GPFS Development and Security IBM Systems IBM Building 008 2455 South Rd, Poughkeepsie, NY 12601 (845) 433-9314 T/L 293-9314 From: "Oesterlin, Robert" To: gpfsug main discussion list Date: 07/28/2017 07:47 AM Subject: Re: [gpfsug-discuss] NSD V2 vs V1 - how can you tell? Sent by: gpfsug-discuss-bounces at spectrumscale.org Hi Scott This refers to the file system format which is independent of the NSD version number. File systems can be upgraded but all the NSDs are still at V1. For instance, here is an NSD I know is still V1: [root at gpfs2-gpfs01 ~]# grep msa0319VOL2 volmaps msa0319VOL2 mpathel (3600c0ff0001497e259ebac5001000000) dm-19 14T sdad 0[active][ready] sdft 1[active][ready] sdam 2[active][ready] sdgc 3[active][ready] [root at gpfs2-gpfs01 ~]# mmfsadm test readdescraw /dev/dm-19 | grep " original format" original format version 1001, cur version 1600 (mgr 1600, helper 1600, mnode 1600) The file system version is current however. Bob Oesterlin Sr Principal Storage Engineer, Nuance From: on behalf of Scott Fadden Reply-To: gpfsug main discussion list Date: Thursday, July 27, 2017 at 3:33 PM To: "gpfsug-discuss at spectrumscale.org" Cc: "gpfsug-discuss at spectrumscale.org" Subject: [EXTERNAL] Re: [gpfsug-discuss] NSD V2 vs V1 - how can you tell? # mmfsadm test readdescraw /dev/dm-14 | grep " original format" original format version 1600, cur version 1700 (mgr 1700, helper 1700, mnode 1700) The harder part is what version number = v2 and what matches version 1. The real answer is there is not a simple one, it is not really v1 vs v2 it is what feature you are interested in. Just one small example 4K Disk SECTOR support started in 1403 Dynamically enabling quotas started in 1404 _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From gcorneau at us.ibm.com Fri Jul 28 20:07:54 2017 From: gcorneau at us.ibm.com (Glen Corneau) Date: Fri, 28 Jul 2017 14:07:54 -0500 Subject: [gpfsug-discuss] NSD V2 vs V1 - how can you tell? In-Reply-To: References: <6DDEF46C-D30E-4C68-B569-770DE04A9D5B@nuance.com> Message-ID: Just a note for my AIX folks out there (and I know there's at least one!): When NSDv2 (version 1403) disks are defined in AIX we *don't* create GPTs on those LUNs. However with GPFS (Spectrum Scale) installed on AIX we will place the NSD name in the "VG" column of lsvg. But yes, we've had situations of customers creating new VGs on existing GPFS LUNs (force!) and destroying file systems. ------------------ Glen Corneau Power Systems Washington Systems Center gcorneau at us.ibm.com From: "Felipe Knop" To: gpfsug main discussion list Date: 07/28/2017 07:45 AM Subject: Re: [gpfsug-discuss] NSD V2 vs V1 - how can you tell? Sent by: gpfsug-discuss-bounces at spectrumscale.org Bob, I believe the NSD format version (v1 vs v2) is shown in the " format version" line that starts with "NSDid" : # mmfsadm test readdescraw /dev/dm-11 NSD descriptor in sector 64 of /dev/dm-11 NSDid: 9461C0A85788693A format version: 1403 Label: It should say "1403" when the format is v2. Felipe ---- Felipe Knop knop at us.ibm.com GPFS Development and Security IBM Systems IBM Building 008 2455 South Rd, Poughkeepsie, NY 12601 (845) 433-9314 T/L 293-9314 From: "Oesterlin, Robert" To: gpfsug main discussion list Date: 07/28/2017 07:47 AM Subject: Re: [gpfsug-discuss] NSD V2 vs V1 - how can you tell? Sent by: gpfsug-discuss-bounces at spectrumscale.org Hi Scott This refers to the file system format which is independent of the NSD version number. File systems can be upgraded but all the NSDs are still at V1. For instance, here is an NSD I know is still V1: [root at gpfs2-gpfs01 ~]# grep msa0319VOL2 volmaps msa0319VOL2 mpathel (3600c0ff0001497e259ebac5001000000) dm-19 14T sdad 0[active][ready] sdft 1[active][ready] sdam 2[active][ready] sdgc 3[active][ready] [root at gpfs2-gpfs01 ~]# mmfsadm test readdescraw /dev/dm-19 | grep " original format" original format version 1001, cur version 1600 (mgr 1600, helper 1600, mnode 1600) The file system version is current however. Bob Oesterlin Sr Principal Storage Engineer, Nuance -------------- next part -------------- An HTML attachment was scrubbed... URL: From scale at us.ibm.com Sun Jul 30 04:22:25 2017 From: scale at us.ibm.com (IBM Spectrum Scale) Date: Sat, 29 Jul 2017 23:22:25 -0400 Subject: [gpfsug-discuss] GPFS, LTFS/EE and data-in-inode? In-Reply-To: <1500908233.4387.194.camel@buzzard.me.uk> References: <33069.1500675853@turing-police.cc.vt.edu>, <28986.1500671597@turing-police.cc.vt.edu> <1500908233.4387.194.camel@buzzard.me.uk> Message-ID: Jonathan, all, We'll be introducing some clarification into the publications to highlight that data is not stored in the inode for encrypted files. Regards, The Spectrum Scale (GPFS) team ------------------------------------------------------------------------------------------------------------------ If you feel that your question can benefit other users of Spectrum Scale (GPFS), then please post it to the public IBM developerWroks Forum at https://www.ibm.com/developerworks/community/forums/html/forum?id=11111111-0000-0000-0000-000000000479 . If your query concerns a potential software error in Spectrum Scale (GPFS) and you have an IBM software maintenance contract please contact 1-800-237-5511 in the United States or your local IBM Service Center in other countries. The forum is informally monitored as time permits and should not be used for priority messages to the Spectrum Scale (GPFS) team. From: Jonathan Buzzard To: gpfsug main discussion list Date: 07/24/2017 10:57 AM Subject: Re: [gpfsug-discuss] GPFS, LTFS/EE and data-in-inode? Sent by: gpfsug-discuss-bounces at spectrumscale.org On Mon, 2017-07-24 at 14:45 +0000, James Davis wrote: > Hey all, > > On the documentation of encryption restrictions and encryption/HAWC > interplay... > > The encryption documentation currently states: > > "Secure storage uses encryption to make data unreadable to anyone who > does not possess the necessary encryption keys...Only data, not > metadata, is encrypted." > > The HAWC restrictions include: > > "Encrypted data is never stored in the recovery log..." > > If this is unclear, I'm open to suggestions for improvements. > Just because *DATA* is stored in the metadata does not make it magically metadata. It's still data so you could quite reasonably conclude that it is encrypted. We have now been disabused of this, but the documentation is not clear and needs clarifying. Perhaps say metadata blocks are not encrypted. Or just a simple data stored in inodes is not encrypted would suffice. JAB. -- Jonathan A. Buzzard Email: jonathan (at) buzzard.me.uk Fife, United Kingdom. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From scale at us.ibm.com Mon Jul 31 05:57:44 2017 From: scale at us.ibm.com (IBM Spectrum Scale) Date: Mon, 31 Jul 2017 00:57:44 -0400 Subject: [gpfsug-discuss] Lost disks In-Reply-To: <1501153088.26563.39.camel@buzzard.me.uk> References: <1501153088.26563.39.camel@buzzard.me.uk> Message-ID: Jonathan, Regarding >> Thing is GPFS does not look at the NSD descriptors that much. So in my >> case it was several days before it was noticed, and only then because I >> rebooted the last NSD server as part of a rolling upgrade of GPFS. I >> could have cruised for weeks/months with no NSD descriptors if I had not >> restarted all the NSD servers. The moral of this is the overwrite could >> have take place quite some time ago. While GPFS does not normally read the NSD descriptors in the course of performing file system operations, as of 4.1.1 a periodic check is done on the content of various descriptors, and a message like [E] On-disk NSD descriptor of is valid but has a different ID. ID in cache is and ID on-disk is should get issued if the content of the descriptor on disk changes. Regards, The Spectrum Scale (GPFS) team ------------------------------------------------------------------------------------------------------------------ If you feel that your question can benefit other users of Spectrum Scale (GPFS), then please post it to the public IBM developerWroks Forum at https://www.ibm.com/developerworks/community/forums/html/forum?id=11111111-0000-0000-0000-000000000479 . If your query concerns a potential software error in Spectrum Scale (GPFS) and you have an IBM software maintenance contract please contact 1-800-237-5511 in the United States or your local IBM Service Center in other countries. The forum is informally monitored as time permits and should not be used for priority messages to the Spectrum Scale (GPFS) team. From: Jonathan Buzzard To: gpfsug main discussion list Date: 07/27/2017 06:58 AM Subject: Re: [gpfsug-discuss] Lost disks Sent by: gpfsug-discuss-bounces at spectrumscale.org On Wed, 2017-07-26 at 17:45 +0000, Oesterlin, Robert wrote: > One way this could possible happen would be a system is being > installed (I?m assuming this is Linux) and the FC adapter is active; > then the OS install will see disks and wipe out the NSD descriptor on > those disks. (Which is why the NSD V2 format was invented, to prevent > this from happening) If you don?t lose all of the descriptors, it?s > sometimes possible to manually re-construct the missing header > information - I?m assuming since you opened a PMR, IBM has looked at > this. This is a scenario I?ve had to recover from - twice. Back-end > array issue seems unlikely to me, I?d keep looking at the systems with > access to those LUNs and see what commands/operations could have been > run. I would concur that this is the most likely scenario; an install where for whatever reason the machine could see the disks and they are gone. I know that RHEL6 and its derivatives will do that for you. Has happened to me at previous place of work where another admin forgot to de-zone a server, went to install CentOS6 as part of a cluster upgrade from CentOS5 and overwrote all the NSD descriptors. Thing is GPFS does not look at the NSD descriptors that much. So in my case it was several days before it was noticed, and only then because I rebooted the last NSD server as part of a rolling upgrade of GPFS. I could have cruised for weeks/months with no NSD descriptors if I had not restarted all the NSD servers. The moral of this is the overwrite could have take place quite some time ago. Basically if the disks are all missing then the NSD descriptor has been overwritten, and the protestations of the client are irrelevant. The chances of the disk array doing it to *ALL* the disks is somewhere around ? IMHO. JAB. -- Jonathan A. Buzzard Email: jonathan (at) buzzard.me.uk Fife, United Kingdom. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From Mark.Bush at siriuscom.com Mon Jul 31 18:30:34 2017 From: Mark.Bush at siriuscom.com (Mark Bush) Date: Mon, 31 Jul 2017 17:30:34 +0000 Subject: [gpfsug-discuss] Auditing Message-ID: Does someone already have a policy that can extract typical file audit items (user_id, last written, opened/accessed, modified, deleted, etc)? Am I barking up the wrong tree for this is there a better way to get this type of data from a Spectrum Scale filesystem? Mark This message (including any attachments) is intended only for the use of the individual or entity to which it is addressed and may contain information that is non-public, proprietary, privileged, confidential, and exempt from disclosure under applicable law. If you are not the intended recipient, you are hereby notified that any use, dissemination, distribution, or copying of this communication is strictly prohibited. This message may be viewed by parties at Sirius Computer Solutions other than those named in the message header. This message does not contain an official representation of Sirius Computer Solutions. If you have received this communication in error, notify Sirius Computer Solutions immediately and (i) destroy this message if a facsimile or (ii) delete this message immediately if this is an electronic communication. Thank you. Sirius Computer Solutions -------------- next part -------------- An HTML attachment was scrubbed... URL: From Renar.Grunenberg at huk-coburg.de Mon Jul 31 18:44:21 2017 From: Renar.Grunenberg at huk-coburg.de (Grunenberg, Renar) Date: Mon, 31 Jul 2017 17:44:21 +0000 Subject: [gpfsug-discuss] Quota and hardlimit enforcement Message-ID: <200a086c1740448da544e667c03887e5@SMXRF105.msg.hukrf.de> Hallo All, we are on Version 4.2.3.2 and see some missunderstandig in the enforcement of hardlimit definitions on a flieset quota. What we see is we put some 200 GB files on following quota definitions: quota 150 GB Limit 250 GB Grace none. After the creating of one 200 GB we hit the softquota limit, thats ok. But After the the second file was created!! we expect an io error but it don?t happen. We define all well know Parameters (-Q,..) on the filesystem . Is this a bug or a Feature? mmcheckquota are already running at first. Regards Renar. Renar Grunenberg Abteilung Informatik ? Betrieb HUK-COBURG Bahnhofsplatz 96444 Coburg Telefon: 09561 96-44110 Telefax: 09561 96-44104 E-Mail: Renar.Grunenberg at huk-coburg.de Internet: www.huk.de ________________________________ HUK-COBURG Haftpflicht-Unterst?tzungs-Kasse kraftfahrender Beamter Deutschlands a. G. in Coburg Reg.-Gericht Coburg HRB 100; St.-Nr. 9212/101/00021 Sitz der Gesellschaft: Bahnhofsplatz, 96444 Coburg Vorsitzender des Aufsichtsrats: Prof. Dr. Heinrich R. Schradin. Vorstand: Klaus-J?rgen Heitmann (Sprecher), Stefan Gronbach, Dr. Hans Olav Her?y, Dr. J?rg Rheinl?nder (stv.), Sarah R?ssler, Daniel Thomas (stv.). ________________________________ Diese Nachricht enth?lt vertrauliche und/oder rechtlich gesch?tzte Informationen. Wenn Sie nicht der richtige Adressat sind oder diese Nachricht irrt?mlich erhalten haben, informieren Sie bitte sofort den Absender und vernichten Sie diese Nachricht. Das unerlaubte Kopieren sowie die unbefugte Weitergabe dieser Nachricht ist nicht gestattet. This information may contain confidential and/or privileged information. If you are not the intended recipient (or have received this information in error) please notify the sender immediately and destroy this information. Any unauthorized copying, disclosure or distribution of the material in this information is strictly forbidden. ________________________________ -------------- next part -------------- An HTML attachment was scrubbed... URL: From eric.wonderley at vt.edu Mon Jul 31 18:54:52 2017 From: eric.wonderley at vt.edu (J. Eric Wonderley) Date: Mon, 31 Jul 2017 13:54:52 -0400 Subject: [gpfsug-discuss] Quota and hardlimit enforcement In-Reply-To: <200a086c1740448da544e667c03887e5@SMXRF105.msg.hukrf.de> References: <200a086c1740448da544e667c03887e5@SMXRF105.msg.hukrf.de> Message-ID: Hi Renar: What does 'mmlsquota -j fileset filesystem' report? I did not think you would get a grace period of none unless the hardlimit=softlimit. On Mon, Jul 31, 2017 at 1:44 PM, Grunenberg, Renar < Renar.Grunenberg at huk-coburg.de> wrote: > Hallo All, > we are on Version 4.2.3.2 and see some missunderstandig in the enforcement > of hardlimit definitions on a flieset quota. What we see is we put some 200 > GB files on following quota definitions: quota 150 GB Limit 250 GB Grace > none. > After the creating of one 200 GB we hit the softquota limit, thats ok. But > After the the second file was created!! we expect an io error but it don?t > happen. We define all well know Parameters (-Q,..) on the filesystem . Is > this a bug or a Feature? mmcheckquota are already running at first. > Regards Renar. > > > > Renar Grunenberg > Abteilung Informatik ? Betrieb > > HUK-COBURG > Bahnhofsplatz > 96444 Coburg > Telefon: 09561 96-44110 > Telefax: 09561 96-44104 > E-Mail: Renar.Grunenberg at huk-coburg.de > Internet: www.huk.de > > ------------------------------ > HUK-COBURG Haftpflicht-Unterst?tzungs-Kasse kraftfahrender Beamter > Deutschlands a. G. in Coburg > Reg.-Gericht Coburg HRB 100; St.-Nr. 9212/101/00021 > Sitz der Gesellschaft: Bahnhofsplatz, 96444 Coburg > Vorsitzender des Aufsichtsrats: Prof. Dr. Heinrich R. Schradin. > Vorstand: Klaus-J?rgen Heitmann (Sprecher), Stefan Gronbach, Dr. Hans Olav > Her?y, Dr. J?rg Rheinl?nder (stv.), Sarah R?ssler, Daniel Thomas (stv.). > ------------------------------ > Diese Nachricht enth?lt vertrauliche und/oder rechtlich gesch?tzte > Informationen. > Wenn Sie nicht der richtige Adressat sind oder diese Nachricht irrt?mlich > erhalten haben, > informieren Sie bitte sofort den Absender und vernichten Sie diese > Nachricht. > Das unerlaubte Kopieren sowie die unbefugte Weitergabe dieser Nachricht > ist nicht gestattet. > > This information may contain confidential and/or privileged information. > If you are not the intended recipient (or have received this information > in error) please notify the > sender immediately and destroy this information. > Any unauthorized copying, disclosure or distribution of the material in > this information is strictly forbidden. > ------------------------------ > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jfosburg at mdanderson.org Mon Jul 31 18:56:46 2017 From: jfosburg at mdanderson.org (Fosburgh,Jonathan) Date: Mon, 31 Jul 2017 17:56:46 +0000 Subject: [gpfsug-discuss] Auditing In-Reply-To: References: Message-ID: At present there is not a method to audit file access. Jonathan Fosburgh Principal Application Systems Analyst Storage Team IT Operations jfosburg at mdanderson.org (713) 745-9346 On 07/31/2017 12:30 PM, Mark Bush wrote: Does someone already have a policy that can extract typical file audit items (user_id, last written, opened/accessed, modified, deleted, etc)? Am I barking up the wrong tree for this is there a better way to get this type of data from a Spectrum Scale filesystem? Mark This message (including any attachments) is intended only for the use of the individual or entity to which it is addressed and may contain information that is non-public, proprietary, privileged, confidential, and exempt from disclosure under applicable law. If you are not the intended recipient, you are hereby notified that any use, dissemination, distribution, or copying of this communication is strictly prohibited. This message may be viewed by parties at Sirius Computer Solutions other than those named in the message header. This message does not contain an official representation of Sirius Computer Solutions. If you have received this communication in error, notify Sirius Computer Solutions immediately and (i) destroy this message if a facsimile or (ii) delete this message immediately if this is an electronic communication. Thank you. Sirius Computer Solutions _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss The information contained in this e-mail message may be privileged, confidential, and/or protected from disclosure. This e-mail message may contain protected health information (PHI); dissemination of PHI should comply with applicable federal and state laws. If you are not the intended recipient, or an authorized representative of the intended recipient, any further review, disclosure, use, dissemination, distribution, or copying of this message or any attachment (or the information contained therein) is strictly prohibited. If you think that you have received this e-mail message in error, please notify the sender by return e-mail and delete all references to it and its contents from your systems. -------------- next part -------------- An HTML attachment was scrubbed... URL: From Robert.Oesterlin at nuance.com Mon Jul 31 19:02:30 2017 From: Robert.Oesterlin at nuance.com (Oesterlin, Robert) Date: Mon, 31 Jul 2017 18:02:30 +0000 Subject: [gpfsug-discuss] Re Auditing Message-ID: We run a policy that looks like this: -- cut here -- define(daysToEpoch, days(timestamp('1970-01-01 00:00:00.0'))) define(unixTS, char(int( (( days(\$1) - daysToEpoch ) * 86400) + ( hour(\$1) * 3600) + (minute(\$1) * 60) + (second(\$1)) )) ) rule 'dumpall' list '"$filesystem"' DIRECTORIES_PLUS SHOW( '|' || varchar(user_id) || '|' || varchar(group_id) || '|' || char(mode) || '|' || varchar(file_size) || '|' || varchar(kb_allocated) || '|' || varchar(nlink) || '|' || unixTS(access_time,19) || '|' || unixTS(modification_time) || '|' || unixTS(creation_time) || '|' || char(misc_attributes,1) || '|' ) -- cut here -- Bob Oesterlin Sr Principal Storage Engineer, Nuance From: on behalf of Mark Bush Reply-To: gpfsug main discussion list Date: Monday, July 31, 2017 at 12:31 PM To: "gpfsug-discuss at spectrumscale.org" Subject: [EXTERNAL] [gpfsug-discuss] Auditing Does someone already have a policy that can extract typical file audit items (user_id, last written, opened/accessed, modified, deleted, etc)? Am I barking up the wrong tree for this is there a better way to get this type of data from a Spectrum Scale filesystem? -------------- next part -------------- An HTML attachment was scrubbed... URL: From Mark.Bush at siriuscom.com Mon Jul 31 19:05:37 2017 From: Mark.Bush at siriuscom.com (Mark Bush) Date: Mon, 31 Jul 2017 18:05:37 +0000 Subject: [gpfsug-discuss] Re Auditing In-Reply-To: References: Message-ID: Brilliant. Thanks Bob. From: Oesterlin, Robert [mailto:Robert.Oesterlin at nuance.com] Sent: Monday, July 31, 2017 1:03 PM To: gpfsug main discussion list Subject: [gpfsug-discuss] Re Auditing We run a policy that looks like this: -- cut here -- define(daysToEpoch, days(timestamp('1970-01-01 00:00:00.0'))) define(unixTS, char(int( (( days(\$1) - daysToEpoch ) * 86400) + ( hour(\$1) * 3600) + (minute(\$1) * 60) + (second(\$1)) )) ) rule 'dumpall' list '"$filesystem"' DIRECTORIES_PLUS SHOW( '|' || varchar(user_id) || '|' || varchar(group_id) || '|' || char(mode) || '|' || varchar(file_size) || '|' || varchar(kb_allocated) || '|' || varchar(nlink) || '|' || unixTS(access_time,19) || '|' || unixTS(modification_time) || '|' || unixTS(creation_time) || '|' || char(misc_attributes,1) || '|' ) -- cut here -- Bob Oesterlin Sr Principal Storage Engineer, Nuance From: > on behalf of Mark Bush > Reply-To: gpfsug main discussion list > Date: Monday, July 31, 2017 at 12:31 PM To: "gpfsug-discuss at spectrumscale.org" > Subject: [EXTERNAL] [gpfsug-discuss] Auditing Does someone already have a policy that can extract typical file audit items (user_id, last written, opened/accessed, modified, deleted, etc)? Am I barking up the wrong tree for this is there a better way to get this type of data from a Spectrum Scale filesystem? This message (including any attachments) is intended only for the use of the individual or entity to which it is addressed and may contain information that is non-public, proprietary, privileged, confidential, and exempt from disclosure under applicable law. If you are not the intended recipient, you are hereby notified that any use, dissemination, distribution, or copying of this communication is strictly prohibited. This message may be viewed by parties at Sirius Computer Solutions other than those named in the message header. This message does not contain an official representation of Sirius Computer Solutions. If you have received this communication in error, notify Sirius Computer Solutions immediately and (i) destroy this message if a facsimile or (ii) delete this message immediately if this is an electronic communication. Thank you. Sirius Computer Solutions -------------- next part -------------- An HTML attachment was scrubbed... URL: From makaplan at us.ibm.com Mon Jul 31 19:26:52 2017 From: makaplan at us.ibm.com (Marc A Kaplan) Date: Mon, 31 Jul 2017 14:26:52 -0400 Subject: [gpfsug-discuss] Re Auditing - timestamps In-Reply-To: References: Message-ID: The "ILM" chapter in the Admin Guide has some tips, among which: 18. You can convert a time interval value to a number of seconds with the SQL cast syntax, as in the following example: define([toSeconds],[(($1) SECONDS(12,6))]) define([toUnixSeconds],[toSeconds($1 - ?1970-1-1 at 0:00?)]) RULE external list b RULE list b SHOW(?sinceNow=? toSeconds(current_timestamp-modification_time) ) RULE external list c RULE list c SHOW(?sinceUnixEpoch=? toUnixSeconds(modification_time) ) The following method is also supported: define(access_age_in_days,( INTEGER(( (CURRENT_TIMESTAMP - ACCESS_TIME) SECONDS)) /(24*3600.0) ) ) RULE external list w exec ?? RULE list w weight(access_age_in_days) show(access_age_in_days) --marc of GPFS -------------- next part -------------- An HTML attachment was scrubbed... URL: From pinto at scinet.utoronto.ca Mon Jul 31 19:46:53 2017 From: pinto at scinet.utoronto.ca (Jaime Pinto) Date: Mon, 31 Jul 2017 14:46:53 -0400 Subject: [gpfsug-discuss] Quota and hardlimit enforcement In-Reply-To: <200a086c1740448da544e667c03887e5@SMXRF105.msg.hukrf.de> References: <200a086c1740448da544e667c03887e5@SMXRF105.msg.hukrf.de> Message-ID: <20170731144653.160355y5whmerokd@support.scinet.utoronto.ca> Renar For as long as the usage is below the hard limit (space or inodes) and below the grace period you'll be able to write. I don't think you can set the grace period to an specific value as a quota parameter, such as none. That is set at the filesystem creation time. BTW, grace period limit has been a mystery to me for many years. My impression is that GPFS keeps changing it internally depending on the position of the moon. I think ours is 2 hours, but at times I can see users writing for longer. Jaime Quoting "Grunenberg, Renar" : > Hallo All, > we are on Version 4.2.3.2 and see some missunderstandig in the > enforcement of hardlimit definitions on a flieset quota. What we see > is we put some 200 GB files on following quota definitions: quota > 150 GB Limit 250 GB Grace none. > After the creating of one 200 GB we hit the softquota limit, thats > ok. But After the the second file was created!! we expect an io > error but it don?t happen. We define all well know Parameters > (-Q,..) on the filesystem . Is this a bug or a Feature? mmcheckquota > are already running at first. > Regards Renar. > > > Renar Grunenberg > Abteilung Informatik ? Betrieb > > HUK-COBURG > Bahnhofsplatz > 96444 Coburg > Telefon: 09561 96-44110 > Telefax: 09561 96-44104 > E-Mail: Renar.Grunenberg at huk-coburg.de > Internet: www.huk.de > ________________________________ > HUK-COBURG Haftpflicht-Unterst?tzungs-Kasse kraftfahrender Beamter > Deutschlands a. G. in Coburg > Reg.-Gericht Coburg HRB 100; St.-Nr. 9212/101/00021 > Sitz der Gesellschaft: Bahnhofsplatz, 96444 Coburg > Vorsitzender des Aufsichtsrats: Prof. Dr. Heinrich R. Schradin. > Vorstand: Klaus-J?rgen Heitmann (Sprecher), Stefan Gronbach, Dr. > Hans Olav Her?y, Dr. J?rg Rheinl?nder (stv.), Sarah R?ssler, Daniel > Thomas (stv.). > ________________________________ > Diese Nachricht enth?lt vertrauliche und/oder rechtlich gesch?tzte > Informationen. > Wenn Sie nicht der richtige Adressat sind oder diese Nachricht > irrt?mlich erhalten haben, > informieren Sie bitte sofort den Absender und vernichten Sie diese Nachricht. > Das unerlaubte Kopieren sowie die unbefugte Weitergabe dieser > Nachricht ist nicht gestattet. > > This information may contain confidential and/or privileged information. > If you are not the intended recipient (or have received this > information in error) please notify the > sender immediately and destroy this information. > Any unauthorized copying, disclosure or distribution of the material > in this information is strictly forbidden. > ________________________________ > ************************************ TELL US ABOUT YOUR SUCCESS STORIES http://www.scinethpc.ca/testimonials ************************************ --- Jaime Pinto SciNet HPC Consortium - Compute/Calcul Canada www.scinet.utoronto.ca - www.computecanada.ca University of Toronto 661 University Ave. (MaRS), Suite 1140 Toronto, ON, M5G1M1 P: 416-978-2755 C: 416-505-1477 ---------------------------------------------------------------- This message was sent using IMP at SciNet Consortium, University of Toronto. From Renar.Grunenberg at huk-coburg.de Mon Jul 31 20:04:56 2017 From: Renar.Grunenberg at huk-coburg.de (Grunenberg, Renar) Date: Mon, 31 Jul 2017 19:04:56 +0000 Subject: [gpfsug-discuss] Quota and hardlimit enforcement In-Reply-To: References: <200a086c1740448da544e667c03887e5@SMXRF105.msg.hukrf.de> Message-ID: Hallo J. Eric, hallo Jaime, Ok after we hit the softlimit we see that the graceperiod are go to 7 days. I think that?s the default. But was does it mean. After we reach the ?hard?-limit. we see additionaly the gbytes in_doubt. My interpretation now we can write many gb to the nospace-left event in the filesystem. But our intention is to restricted some application to write only to the hardlimit in the fileset. Any hints to accomplish this? Renar Grunenberg Abteilung Informatik ? Betrieb HUK-COBURG Bahnhofsplatz 96444 Coburg Telefon: 09561 96-44110 Telefax: 09561 96-44104 E-Mail: Renar.Grunenberg at huk-coburg.de Internet: www.huk.de ________________________________ HUK-COBURG Haftpflicht-Unterst?tzungs-Kasse kraftfahrender Beamter Deutschlands a. G. in Coburg Reg.-Gericht Coburg HRB 100; St.-Nr. 9212/101/00021 Sitz der Gesellschaft: Bahnhofsplatz, 96444 Coburg Vorsitzender des Aufsichtsrats: Prof. Dr. Heinrich R. Schradin. Vorstand: Klaus-J?rgen Heitmann (Sprecher), Stefan Gronbach, Dr. Hans Olav Her?y, Dr. J?rg Rheinl?nder (stv.), Sarah R?ssler, Daniel Thomas (stv.). ________________________________ Diese Nachricht enth?lt vertrauliche und/oder rechtlich gesch?tzte Informationen. Wenn Sie nicht der richtige Adressat sind oder diese Nachricht irrt?mlich erhalten haben, informieren Sie bitte sofort den Absender und vernichten Sie diese Nachricht. Das unerlaubte Kopieren sowie die unbefugte Weitergabe dieser Nachricht ist nicht gestattet. This information may contain confidential and/or privileged information. If you are not the intended recipient (or have received this information in error) please notify the sender immediately and destroy this information. Any unauthorized copying, disclosure or distribution of the material in this information is strictly forbidden. ________________________________ Von: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] Im Auftrag von J. Eric Wonderley Gesendet: Montag, 31. Juli 2017 19:55 An: gpfsug main discussion list Betreff: Re: [gpfsug-discuss] Quota and hardlimit enforcement Hi Renar: What does 'mmlsquota -j fileset filesystem' report? I did not think you would get a grace period of none unless the hardlimit=softlimit. On Mon, Jul 31, 2017 at 1:44 PM, Grunenberg, Renar > wrote: Hallo All, we are on Version 4.2.3.2 and see some missunderstandig in the enforcement of hardlimit definitions on a flieset quota. What we see is we put some 200 GB files on following quota definitions: quota 150 GB Limit 250 GB Grace none. After the creating of one 200 GB we hit the softquota limit, thats ok. But After the the second file was created!! we expect an io error but it don?t happen. We define all well know Parameters (-Q,..) on the filesystem . Is this a bug or a Feature? mmcheckquota are already running at first. Regards Renar. Renar Grunenberg Abteilung Informatik ? Betrieb HUK-COBURG Bahnhofsplatz 96444 Coburg Telefon: 09561 96-44110 Telefax: 09561 96-44104 E-Mail: Renar.Grunenberg at huk-coburg.de Internet: www.huk.de ________________________________ HUK-COBURG Haftpflicht-Unterst?tzungs-Kasse kraftfahrender Beamter Deutschlands a. G. in Coburg Reg.-Gericht Coburg HRB 100; St.-Nr. 9212/101/00021 Sitz der Gesellschaft: Bahnhofsplatz, 96444 Coburg Vorsitzender des Aufsichtsrats: Prof. Dr. Heinrich R. Schradin. Vorstand: Klaus-J?rgen Heitmann (Sprecher), Stefan Gronbach, Dr. Hans Olav Her?y, Dr. J?rg Rheinl?nder (stv.), Sarah R?ssler, Daniel Thomas (stv.). ________________________________ Diese Nachricht enth?lt vertrauliche und/oder rechtlich gesch?tzte Informationen. Wenn Sie nicht der richtige Adressat sind oder diese Nachricht irrt?mlich erhalten haben, informieren Sie bitte sofort den Absender und vernichten Sie diese Nachricht. Das unerlaubte Kopieren sowie die unbefugte Weitergabe dieser Nachricht ist nicht gestattet. This information may contain confidential and/or privileged information. If you are not the intended recipient (or have received this information in error) please notify the sender immediately and destroy this information. Any unauthorized copying, disclosure or distribution of the material in this information is strictly forbidden. ________________________________ _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From Kevin.Buterbaugh at Vanderbilt.Edu Mon Jul 31 20:21:46 2017 From: Kevin.Buterbaugh at Vanderbilt.Edu (Buterbaugh, Kevin L) Date: Mon, 31 Jul 2017 19:21:46 +0000 Subject: [gpfsug-discuss] Quota and hardlimit enforcement In-Reply-To: References: <200a086c1740448da544e667c03887e5@SMXRF105.msg.hukrf.de> Message-ID: Hi Renar, I?m sure this is the case, but I don?t see anywhere in this thread where this is explicitly stated ? you?re not doing your tests as root, are you? root, of course, is not bound by any quotas. Kevin On Jul 31, 2017, at 2:04 PM, Grunenberg, Renar > wrote: Hallo J. Eric, hallo Jaime, Ok after we hit the softlimit we see that the graceperiod are go to 7 days. I think that?s the default. But was does it mean. After we reach the ?hard?-limit. we see additionaly the gbytes in_doubt. My interpretation now we can write many gb to the nospace-left event in the filesystem. But our intention is to restricted some application to write only to the hardlimit in the fileset. Any hints to accomplish this? Renar Grunenberg Abteilung Informatik ? Betrieb HUK-COBURG Bahnhofsplatz 96444 Coburg Telefon: 09561 96-44110 Telefax: 09561 96-44104 E-Mail: Renar.Grunenberg at huk-coburg.de Internet: www.huk.de ________________________________ HUK-COBURG Haftpflicht-Unterst?tzungs-Kasse kraftfahrender Beamter Deutschlands a. G. in Coburg Reg.-Gericht Coburg HRB 100; St.-Nr. 9212/101/00021 Sitz der Gesellschaft: Bahnhofsplatz, 96444 Coburg Vorsitzender des Aufsichtsrats: Prof. Dr. Heinrich R. Schradin. Vorstand: Klaus-J?rgen Heitmann (Sprecher), Stefan Gronbach, Dr. Hans Olav Her?y, Dr. J?rg Rheinl?nder (stv.), Sarah R?ssler, Daniel Thomas (stv.). ________________________________ Diese Nachricht enth?lt vertrauliche und/oder rechtlich gesch?tzte Informationen. Wenn Sie nicht der richtige Adressat sind oder diese Nachricht irrt?mlich erhalten haben, informieren Sie bitte sofort den Absender und vernichten Sie diese Nachricht. Das unerlaubte Kopieren sowie die unbefugte Weitergabe dieser Nachricht ist nicht gestattet. This information may contain confidential and/or privileged information. If you are not the intended recipient (or have received this information in error) please notify the sender immediately and destroy this information. Any unauthorized copying, disclosure or distribution of the material in this information is strictly forbidden. ________________________________ Von: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] Im Auftrag von J. Eric Wonderley Gesendet: Montag, 31. Juli 2017 19:55 An: gpfsug main discussion list > Betreff: Re: [gpfsug-discuss] Quota and hardlimit enforcement Hi Renar: What does 'mmlsquota -j fileset filesystem' report? I did not think you would get a grace period of none unless the hardlimit=softlimit. On Mon, Jul 31, 2017 at 1:44 PM, Grunenberg, Renar > wrote: Hallo All, we are on Version 4.2.3.2 and see some missunderstandig in the enforcement of hardlimit definitions on a flieset quota. What we see is we put some 200 GB files on following quota definitions: quota 150 GB Limit 250 GB Grace none. After the creating of one 200 GB we hit the softquota limit, thats ok. But After the the second file was created!! we expect an io error but it don?t happen. We define all well know Parameters (-Q,..) on the filesystem . Is this a bug or a Feature? mmcheckquota are already running at first. Regards Renar. Renar Grunenberg Abteilung Informatik ? Betrieb HUK-COBURG Bahnhofsplatz 96444 Coburg Telefon: 09561 96-44110 Telefax: 09561 96-44104 E-Mail: Renar.Grunenberg at huk-coburg.de Internet: www.huk.de ________________________________ HUK-COBURG Haftpflicht-Unterst?tzungs-Kasse kraftfahrender Beamter Deutschlands a. G. in Coburg Reg.-Gericht Coburg HRB 100; St.-Nr. 9212/101/00021 Sitz der Gesellschaft: Bahnhofsplatz, 96444 Coburg Vorsitzender des Aufsichtsrats: Prof. Dr. Heinrich R. Schradin. Vorstand: Klaus-J?rgen Heitmann (Sprecher), Stefan Gronbach, Dr. Hans Olav Her?y, Dr. J?rg Rheinl?nder (stv.), Sarah R?ssler, Daniel Thomas (stv.). ________________________________ Diese Nachricht enth?lt vertrauliche und/oder rechtlich gesch?tzte Informationen. Wenn Sie nicht der richtige Adressat sind oder diese Nachricht irrt?mlich erhalten haben, informieren Sie bitte sofort den Absender und vernichten Sie diese Nachricht. Das unerlaubte Kopieren sowie die unbefugte Weitergabe dieser Nachricht ist nicht gestattet. This information may contain confidential and/or privileged information. If you are not the intended recipient (or have received this information in error) please notify the sender immediately and destroy this information. Any unauthorized copying, disclosure or distribution of the material in this information is strictly forbidden. ________________________________ _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From Renar.Grunenberg at huk-coburg.de Mon Jul 31 20:30:20 2017 From: Renar.Grunenberg at huk-coburg.de (Grunenberg, Renar) Date: Mon, 31 Jul 2017 19:30:20 +0000 Subject: [gpfsug-discuss] Quota and hardlimit enforcement In-Reply-To: References: <200a086c1740448da544e667c03887e5@SMXRF105.msg.hukrf.de> Message-ID: Hallo Kevin, thanks for your hint i will check these tomorrow, and yes as root, lol. Regards Renar Renar Grunenberg Abteilung Informatik ? Betrieb HUK-COBURG Bahnhofsplatz 96444 Coburg Telefon: 09561 96-44110 Telefax: 09561 96-44104 E-Mail: Renar.Grunenberg at huk-coburg.de Internet: www.huk.de ________________________________ HUK-COBURG Haftpflicht-Unterst?tzungs-Kasse kraftfahrender Beamter Deutschlands a. G. in Coburg Reg.-Gericht Coburg HRB 100; St.-Nr. 9212/101/00021 Sitz der Gesellschaft: Bahnhofsplatz, 96444 Coburg Vorsitzender des Aufsichtsrats: Prof. Dr. Heinrich R. Schradin. Vorstand: Klaus-J?rgen Heitmann (Sprecher), Stefan Gronbach, Dr. Hans Olav Her?y, Dr. J?rg Rheinl?nder (stv.), Sarah R?ssler, Daniel Thomas (stv.). ________________________________ Diese Nachricht enth?lt vertrauliche und/oder rechtlich gesch?tzte Informationen. Wenn Sie nicht der richtige Adressat sind oder diese Nachricht irrt?mlich erhalten haben, informieren Sie bitte sofort den Absender und vernichten Sie diese Nachricht. Das unerlaubte Kopieren sowie die unbefugte Weitergabe dieser Nachricht ist nicht gestattet. This information may contain confidential and/or privileged information. If you are not the intended recipient (or have received this information in error) please notify the sender immediately and destroy this information. Any unauthorized copying, disclosure or distribution of the material in this information is strictly forbidden. ________________________________ Von: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] Im Auftrag von Buterbaugh, Kevin L Gesendet: Montag, 31. Juli 2017 21:22 An: gpfsug main discussion list Betreff: Re: [gpfsug-discuss] Quota and hardlimit enforcement Hi Renar, I?m sure this is the case, but I don?t see anywhere in this thread where this is explicitly stated ? you?re not doing your tests as root, are you? root, of course, is not bound by any quotas. Kevin On Jul 31, 2017, at 2:04 PM, Grunenberg, Renar > wrote: Hallo J. Eric, hallo Jaime, Ok after we hit the softlimit we see that the graceperiod are go to 7 days. I think that?s the default. But was does it mean. After we reach the ?hard?-limit. we see additionaly the gbytes in_doubt. My interpretation now we can write many gb to the nospace-left event in the filesystem. But our intention is to restricted some application to write only to the hardlimit in the fileset. Any hints to accomplish this? Renar Grunenberg Abteilung Informatik ? Betrieb HUK-COBURG Bahnhofsplatz 96444 Coburg Telefon: 09561 96-44110 Telefax: 09561 96-44104 E-Mail: Renar.Grunenberg at huk-coburg.de Internet: www.huk.de ________________________________ HUK-COBURG Haftpflicht-Unterst?tzungs-Kasse kraftfahrender Beamter Deutschlands a. G. in Coburg Reg.-Gericht Coburg HRB 100; St.-Nr. 9212/101/00021 Sitz der Gesellschaft: Bahnhofsplatz, 96444 Coburg Vorsitzender des Aufsichtsrats: Prof. Dr. Heinrich R. Schradin. Vorstand: Klaus-J?rgen Heitmann (Sprecher), Stefan Gronbach, Dr. Hans Olav Her?y, Dr. J?rg Rheinl?nder (stv.), Sarah R?ssler, Daniel Thomas (stv.). ________________________________ Diese Nachricht enth?lt vertrauliche und/oder rechtlich gesch?tzte Informationen. Wenn Sie nicht der richtige Adressat sind oder diese Nachricht irrt?mlich erhalten haben, informieren Sie bitte sofort den Absender und vernichten Sie diese Nachricht. Das unerlaubte Kopieren sowie die unbefugte Weitergabe dieser Nachricht ist nicht gestattet. This information may contain confidential and/or privileged information. If you are not the intended recipient (or have received this information in error) please notify the sender immediately and destroy this information. Any unauthorized copying, disclosure or distribution of the material in this information is strictly forbidden. ________________________________ Von: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] Im Auftrag von J. Eric Wonderley Gesendet: Montag, 31. Juli 2017 19:55 An: gpfsug main discussion list > Betreff: Re: [gpfsug-discuss] Quota and hardlimit enforcement Hi Renar: What does 'mmlsquota -j fileset filesystem' report? I did not think you would get a grace period of none unless the hardlimit=softlimit. On Mon, Jul 31, 2017 at 1:44 PM, Grunenberg, Renar > wrote: Hallo All, we are on Version 4.2.3.2 and see some missunderstandig in the enforcement of hardlimit definitions on a flieset quota. What we see is we put some 200 GB files on following quota definitions: quota 150 GB Limit 250 GB Grace none. After the creating of one 200 GB we hit the softquota limit, thats ok. But After the the second file was created!! we expect an io error but it don?t happen. We define all well know Parameters (-Q,..) on the filesystem . Is this a bug or a Feature? mmcheckquota are already running at first. Regards Renar. Renar Grunenberg Abteilung Informatik ? Betrieb HUK-COBURG Bahnhofsplatz 96444 Coburg Telefon: 09561 96-44110 Telefax: 09561 96-44104 E-Mail: Renar.Grunenberg at huk-coburg.de Internet: www.huk.de ________________________________ HUK-COBURG Haftpflicht-Unterst?tzungs-Kasse kraftfahrender Beamter Deutschlands a. G. in Coburg Reg.-Gericht Coburg HRB 100; St.-Nr. 9212/101/00021 Sitz der Gesellschaft: Bahnhofsplatz, 96444 Coburg Vorsitzender des Aufsichtsrats: Prof. Dr. Heinrich R. Schradin. Vorstand: Klaus-J?rgen Heitmann (Sprecher), Stefan Gronbach, Dr. Hans Olav Her?y, Dr. J?rg Rheinl?nder (stv.), Sarah R?ssler, Daniel Thomas (stv.). ________________________________ Diese Nachricht enth?lt vertrauliche und/oder rechtlich gesch?tzte Informationen. Wenn Sie nicht der richtige Adressat sind oder diese Nachricht irrt?mlich erhalten haben, informieren Sie bitte sofort den Absender und vernichten Sie diese Nachricht. Das unerlaubte Kopieren sowie die unbefugte Weitergabe dieser Nachricht ist nicht gestattet. This information may contain confidential and/or privileged information. If you are not the intended recipient (or have received this information in error) please notify the sender immediately and destroy this information. Any unauthorized copying, disclosure or distribution of the material in this information is strictly forbidden. ________________________________ _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From pinto at scinet.utoronto.ca Mon Jul 31 21:03:53 2017 From: pinto at scinet.utoronto.ca (Jaime Pinto) Date: Mon, 31 Jul 2017 16:03:53 -0400 Subject: [gpfsug-discuss] Quota and hardlimit enforcement In-Reply-To: References: <200a086c1740448da544e667c03887e5@SMXRF105.msg.hukrf.de> Message-ID: <20170731160353.54412s4i1r957eax@support.scinet.utoronto.ca> In addition, the in_doubt column is a function of the data turn-over and the internal gpfs accounting synchronization period (beyond root control). The higher the in_doubt values the less accurate the real amount of space/inodes a user/group/fileset has in the filesystem. What I noticed in practice is the the in_doubt values only get worst overtime, and work against the quotas, making them hit the limits sooner. Therefore, you may wish to run a 'mmcheckquota' crontab job once or twice a day, to reset the in_doubt column to zero mover often. GPFS has a very high lag to do this on its own in the most recent versions, and seldom really catches up on a very active filesystem. If your grace period is set to 7 days I can assure you that in an HPC environment it's the equivalent of not having quotas effectively. You should set it to 2 hours or 4 hours. In an environment such as ours a runway process can easily generate 500TB of data or 1 billion inodes in few hours, and choke the file system to all users/jobs. Jaime Quoting "Buterbaugh, Kevin L" : > Hi Renar, > > I?m sure this is the case, but I don?t see anywhere in this thread > where this is explicitly stated ? you?re not doing your tests as > root, are you? root, of course, is not bound by any quotas. > > Kevin > > On Jul 31, 2017, at 2:04 PM, Grunenberg, Renar > > > wrote: > > > Hallo J. Eric, hallo Jaime, > Ok after we hit the softlimit we see that the graceperiod are go to > 7 days. I think that?s the default. But was does it mean. > After we reach the ?hard?-limit. we see additionaly the gbytes in_doubt. > My interpretation now we can write many gb to the nospace-left event > in the filesystem. > But our intention is to restricted some application to write only to > the hardlimit in the fileset. Any hints to accomplish this? > > > > Renar Grunenberg > Abteilung Informatik ? Betrieb > > HUK-COBURG > Bahnhofsplatz > 96444 Coburg > Telefon: 09561 96-44110 > Telefax: 09561 96-44104 > E-Mail: Renar.Grunenberg at huk-coburg.de > Internet: www.huk.de > > ________________________________ > HUK-COBURG Haftpflicht-Unterst?tzungs-Kasse kraftfahrender Beamter > Deutschlands a. G. in Coburg > Reg.-Gericht Coburg HRB 100; St.-Nr. 9212/101/00021 > Sitz der Gesellschaft: Bahnhofsplatz, 96444 Coburg > Vorsitzender des Aufsichtsrats: Prof. Dr. Heinrich R. Schradin. > Vorstand: Klaus-J?rgen Heitmann (Sprecher), Stefan Gronbach, Dr. > Hans Olav Her?y, Dr. J?rg Rheinl?nder (stv.), Sarah R?ssler, Daniel > Thomas (stv.). > ________________________________ > Diese Nachricht enth?lt vertrauliche und/oder rechtlich gesch?tzte > Informationen. > Wenn Sie nicht der richtige Adressat sind oder diese Nachricht > irrt?mlich erhalten haben, > informieren Sie bitte sofort den Absender und vernichten Sie diese Nachricht. > Das unerlaubte Kopieren sowie die unbefugte Weitergabe dieser > Nachricht ist nicht gestattet. > > This information may contain confidential and/or privileged information. > If you are not the intended recipient (or have received this > information in error) please notify the > sender immediately and destroy this information. > Any unauthorized copying, disclosure or distribution of the material > in this information is strictly forbidden. > ________________________________ > > > > Von: > gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] Im Auftrag von J. Eric > Wonderley > Gesendet: Montag, 31. Juli 2017 19:55 > An: gpfsug main discussion list > > > Betreff: Re: [gpfsug-discuss] Quota and hardlimit enforcement > > Hi Renar: > What does 'mmlsquota -j fileset filesystem' report? > I did not think you would get a grace period of none unless the > hardlimit=softlimit. > > On Mon, Jul 31, 2017 at 1:44 PM, Grunenberg, Renar > > > wrote: > Hallo All, > we are on Version 4.2.3.2 and see some missunderstandig in the > enforcement of hardlimit definitions on a flieset quota. What we see > is we put some 200 GB files on following quota definitions: quota > 150 GB Limit 250 GB Grace none. > After the creating of one 200 GB we hit the softquota limit, thats > ok. But After the the second file was created!! we expect an io > error but it don?t happen. We define all well know Parameters > (-Q,..) on the filesystem . Is this a bug or a Feature? mmcheckquota > are already running at first. > Regards Renar. > > > Renar Grunenberg > Abteilung Informatik ? Betrieb > > HUK-COBURG > Bahnhofsplatz > 96444 Coburg > > Telefon: > > 09561 96-44110 > > Telefax: > > 09561 96-44104 > > E-Mail: > > Renar.Grunenberg at huk-coburg.de > > Internet: > > www.huk.de > > ________________________________ > HUK-COBURG Haftpflicht-Unterst?tzungs-Kasse kraftfahrender Beamter > Deutschlands a. G. in Coburg > Reg.-Gericht Coburg HRB 100; St.-Nr. 9212/101/00021 > Sitz der Gesellschaft: Bahnhofsplatz, 96444 Coburg > Vorsitzender des Aufsichtsrats: Prof. Dr. Heinrich R. Schradin. > Vorstand: Klaus-J?rgen Heitmann (Sprecher), Stefan Gronbach, Dr. > Hans Olav Her?y, Dr. J?rg Rheinl?nder (stv.), Sarah R?ssler, Daniel > Thomas (stv.). > ________________________________ > Diese Nachricht enth?lt vertrauliche und/oder rechtlich gesch?tzte > Informationen. > Wenn Sie nicht der richtige Adressat sind oder diese Nachricht > irrt?mlich erhalten haben, > informieren Sie bitte sofort den Absender und vernichten Sie diese Nachricht. > Das unerlaubte Kopieren sowie die unbefugte Weitergabe dieser > Nachricht ist nicht gestattet. > > This information may contain confidential and/or privileged information. > If you are not the intended recipient (or have received this > information in error) please notify the > sender immediately and destroy this information. > Any unauthorized copying, disclosure or distribution of the material > in this information is strictly forbidden. > ________________________________ > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > > > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > ************************************ TELL US ABOUT YOUR SUCCESS STORIES http://www.scinethpc.ca/testimonials ************************************ --- Jaime Pinto SciNet HPC Consortium - Compute/Calcul Canada www.scinet.utoronto.ca - www.computecanada.ca University of Toronto 661 University Ave. (MaRS), Suite 1140 Toronto, ON, M5G1M1 P: 416-978-2755 C: 416-505-1477 ---------------------------------------------------------------- This message was sent using IMP at SciNet Consortium, University of Toronto. From Kevin.Buterbaugh at Vanderbilt.Edu Mon Jul 31 21:11:14 2017 From: Kevin.Buterbaugh at Vanderbilt.Edu (Buterbaugh, Kevin L) Date: Mon, 31 Jul 2017 20:11:14 +0000 Subject: [gpfsug-discuss] Quota and hardlimit enforcement In-Reply-To: <20170731160353.54412s4i1r957eax@support.scinet.utoronto.ca> References: <200a086c1740448da544e667c03887e5@SMXRF105.msg.hukrf.de> <20170731160353.54412s4i1r957eax@support.scinet.utoronto.ca> Message-ID: <3789811E-523F-47AE-93F3-E2985DD84D60@vanderbilt.edu> Jaime, That?s heavily workload dependent. We run a traditional HPC cluster and have a 7 day grace on home and 14 days on scratch. By setting the soft and hard limits appropriately we?ve slammed the door on many a runaway user / group / fileset. YMMV? Kevin On Jul 31, 2017, at 3:03 PM, Jaime Pinto > wrote: If your grace period is set to 7 days I can assure you that in an HPC environment it's the equivalent of not having quotas effectively. You should set it to 2 hours or 4 hours. In an environment such as ours a runway process can easily generate 500TB of data or 1 billion inodes in few hours, and choke the file system to all users/jobs. Jaime ? Kevin Buterbaugh - Senior System Administrator Vanderbilt University - Advanced Computing Center for Research and Education Kevin.Buterbaugh at vanderbilt.edu - (615)875-9633 -------------- next part -------------- An HTML attachment was scrubbed... URL: From jonathan at buzzard.me.uk Sat Jul 1 10:20:18 2017 From: jonathan at buzzard.me.uk (Jonathan Buzzard) Date: Sat, 1 Jul 2017 10:20:18 +0100 Subject: [gpfsug-discuss] Mass UID migration suggestions In-Reply-To: <59566c3b.kqOSy9e2kg80XZ/k%hpc-luke@uconn.edu> References: <59566c3b.kqOSy9e2kg80XZ/k%hpc-luke@uconn.edu> Message-ID: On 30/06/17 16:20, hpc-luke at uconn.edu wrote: > Hello, > > We're trying to change most of our users uids, is there a clean way to > migrate all of one users files with say `mmapplypolicy`? We have to change the > owner of around 273539588 files, and my estimates for runtime are around 6 days. > > What we've been doing is indexing all of the files and splitting them up by > owner which takes around an hour, and then we were locking the user out while we > chown their files. I made it multi threaded as it weirdly gave a 10% speedup > despite my expectation that multi threading access from a single node would not > give any speedup. > > Generally I'm looking for advice on how to make the chowning faster. Would > spreading the chowning processes over multiple nodes improve performance? Should > I not stat the files before running lchown on them, since lchown checks the file > before changing it? I saw mention of inodescan(), in an old gpfsug email, which > speeds up disk read access, by not guaranteeing that the data is up to date. We > have a maintenance day coming up where all users will be locked out, so the file > handles(?) from GPFS's perspective will not be able to go stale. Is there a > function with similar constraints to inodescan that I can use to speed up this > process? My suggestion is to do some development work in C to write a custom program to do it for you. That way you can hook into the GPFS API to leverage the fast file system scanning API. Take a look at the tsbackup.C file in the samples directory. Obviously this is going to require someone with appropriate coding skills to develop. On the other hand given it is a one off and input is strictly controlled so error checking is a one off, then couple hundred lines C tops. My tip for this would be load the new UID's into a sparse array so you can just use the current UID to index into the array for the new UID, for speeding things up. It burns RAM but these days RAM is cheap and plentiful and speed is the major consideration here. This should in theory be able to do this in a few hours with this technique. One thing to bear in mind is that once the UID change is complete you will have to backup the entire file system again. JAB. -- Jonathan A. Buzzard Email: jonathan (at) buzzard.me.uk Fife, United Kingdom. From ilan84 at gmail.com Tue Jul 4 09:16:43 2017 From: ilan84 at gmail.com (Ilan Schwarts) Date: Tue, 4 Jul 2017 11:16:43 +0300 Subject: [gpfsug-discuss] Fail to mount file system Message-ID: Hi everyone, I have received existing corrupted GPFS 4.2.2 lab and I am trying to make it work. There are 2 nodes in a cluster: [root at LH20-GPFS1 ~]# mmgetstate -a Node number Node name GPFS state ------------------------------------------ 1 LH20-GPFS1 active 3 LH20-GPFS2 active The Cluster status is: [root at LH20-GPFS1 ~]# mmlscluster GPFS cluster information ======================== GPFS cluster name: MyCluster.LH20-GPFS2 GPFS cluster id: 10777108240438931454 GPFS UID domain: MyCluster.LH20-GPFS2 Remote shell command: /usr/bin/ssh Remote file copy command: /usr/bin/scp Repository type: CCR Node Daemon node name IP address Admin node name Designation -------------------------------------------------------------------- 1 LH20-GPFS1 10.10.158.61 LH20-GPFS1 quorum-manager 3 LH20-GPFS2 10.10.158.62 LH20-GPFS2 There is a file system: [root at LH20-GPFS1 ~]# mmlsnsd File system Disk name NSD servers --------------------------------------------------------------------------- fs_gpfs01 nynsd1 (directly attached) fs_gpfs01 nynsd2 (directly attached) [root at LH20-GPFS1 ~]# On each Node, There is folder /fs_gpfs01 The next step is to mount this fs_gpfs01 to be synced between the 2 nodes. Whilte executing mmmount i get exception: [root at LH20-GPFS1 ~]# mmmount /fs_gpfs01 Tue Jul 4 11:14:18 IDT 2017: mmmount: Mounting file systems ... mount: mount fs_gpfs01 on /fs_gpfs01 failed: Wrong medium type mmmount: Command failed. Examine previous error messages to determine cause. What am i doing wrong ? From scale at us.ibm.com Tue Jul 4 09:36:43 2017 From: scale at us.ibm.com (IBM Spectrum Scale) Date: Tue, 4 Jul 2017 14:06:43 +0530 Subject: [gpfsug-discuss] Fail to mount file system In-Reply-To: References: Message-ID: What exactly do you mean by "I have received existing corrupted GPFS 4.2.2 lab"? Is the file system corrupted ? Maybe this error is then due to file system corruption. Can you once try: mmmount fs_gpfs01 -a If this does not work then try: mmmount -o rs fs_gpfs01 Let me know which mount is working. Regards, The Spectrum Scale (GPFS) team ------------------------------------------------------------------------------------------------------------------ If you feel that your question can benefit other users of Spectrum Scale (GPFS), then please post it to the public IBM developerWroks Forum at https://www.ibm.com/developerworks/community/forums/html/forum?id=11111111-0000-0000-0000-000000000479 . If your query concerns a potential software error in Spectrum Scale (GPFS) and you have an IBM software maintenance contract please contact 1-800-237-5511 in the United States or your local IBM Service Center in other countries. The forum is informally monitored as time permits and should not be used for priority messages to the Spectrum Scale (GPFS) team. From: Ilan Schwarts To: gpfsug-discuss at spectrumscale.org Date: 07/04/2017 01:47 PM Subject: [gpfsug-discuss] Fail to mount file system Sent by: gpfsug-discuss-bounces at spectrumscale.org Hi everyone, I have received existing corrupted GPFS 4.2.2 lab and I am trying to make it work. There are 2 nodes in a cluster: [root at LH20-GPFS1 ~]# mmgetstate -a Node number Node name GPFS state ------------------------------------------ 1 LH20-GPFS1 active 3 LH20-GPFS2 active The Cluster status is: [root at LH20-GPFS1 ~]# mmlscluster GPFS cluster information ======================== GPFS cluster name: MyCluster.LH20-GPFS2 GPFS cluster id: 10777108240438931454 GPFS UID domain: MyCluster.LH20-GPFS2 Remote shell command: /usr/bin/ssh Remote file copy command: /usr/bin/scp Repository type: CCR Node Daemon node name IP address Admin node name Designation -------------------------------------------------------------------- 1 LH20-GPFS1 10.10.158.61 LH20-GPFS1 quorum-manager 3 LH20-GPFS2 10.10.158.62 LH20-GPFS2 There is a file system: [root at LH20-GPFS1 ~]# mmlsnsd File system Disk name NSD servers --------------------------------------------------------------------------- fs_gpfs01 nynsd1 (directly attached) fs_gpfs01 nynsd2 (directly attached) [root at LH20-GPFS1 ~]# On each Node, There is folder /fs_gpfs01 The next step is to mount this fs_gpfs01 to be synced between the 2 nodes. Whilte executing mmmount i get exception: [root at LH20-GPFS1 ~]# mmmount /fs_gpfs01 Tue Jul 4 11:14:18 IDT 2017: mmmount: Mounting file systems ... mount: mount fs_gpfs01 on /fs_gpfs01 failed: Wrong medium type mmmount: Command failed. Examine previous error messages to determine cause. What am i doing wrong ? _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From ilan84 at gmail.com Tue Jul 4 09:38:28 2017 From: ilan84 at gmail.com (Ilan Schwarts) Date: Tue, 4 Jul 2017 11:38:28 +0300 Subject: [gpfsug-discuss] Fail to mount file system In-Reply-To: References: Message-ID: I mean the person tried to configure it... didnt do good job so now its me to continue On Jul 4, 2017 11:37, "IBM Spectrum Scale" wrote: > What exactly do you mean by "I have received existing corrupted GPFS > 4.2.2 lab"? > Is the file system corrupted ? Maybe this error is then due to file system > corruption. > > Can you once try: mmmount fs_gpfs01 -a > If this does not work then try: mmmount -o rs fs_gpfs01 > > Let me know which mount is working. > > Regards, The Spectrum Scale (GPFS) team > > ------------------------------------------------------------ > ------------------------------------------------------ > If you feel that your question can benefit other users of Spectrum Scale > (GPFS), then please post it to the public IBM developerWroks Forum at > https://www.ibm.com/developerworks/community/ > forums/html/forum?id=11111111-0000-0000-0000-000000000479. > > If your query concerns a potential software error in Spectrum Scale (GPFS) > and you have an IBM software maintenance contract please contact > 1-800-237-5511 in the United States or your local IBM Service Center in > other countries. > > The forum is informally monitored as time permits and should not be used > for priority messages to the Spectrum Scale (GPFS) team. > > > > From: Ilan Schwarts > To: gpfsug-discuss at spectrumscale.org > Date: 07/04/2017 01:47 PM > Subject: [gpfsug-discuss] Fail to mount file system > Sent by: gpfsug-discuss-bounces at spectrumscale.org > ------------------------------ > > > > Hi everyone, I have received existing corrupted GPFS 4.2.2 lab and I > am trying to make it work. > There are 2 nodes in a cluster: > [root at LH20-GPFS1 ~]# mmgetstate -a > > Node number Node name GPFS state > ------------------------------------------ > 1 LH20-GPFS1 active > 3 LH20-GPFS2 active > > The Cluster status is: > [root at LH20-GPFS1 ~]# mmlscluster > > GPFS cluster information > ======================== > GPFS cluster name: MyCluster.LH20-GPFS2 > GPFS cluster id: 10777108240438931454 > GPFS UID domain: MyCluster.LH20-GPFS2 > Remote shell command: /usr/bin/ssh > Remote file copy command: /usr/bin/scp > Repository type: CCR > > Node Daemon node name IP address Admin node name Designation > -------------------------------------------------------------------- > 1 LH20-GPFS1 10.10.158.61 LH20-GPFS1 quorum-manager > 3 LH20-GPFS2 10.10.158.62 LH20-GPFS2 > > There is a file system: > [root at LH20-GPFS1 ~]# mmlsnsd > > File system Disk name NSD servers > ------------------------------------------------------------ > --------------- > fs_gpfs01 nynsd1 (directly attached) > fs_gpfs01 nynsd2 (directly attached) > > [root at LH20-GPFS1 ~]# > > On each Node, There is folder /fs_gpfs01 > The next step is to mount this fs_gpfs01 to be synced between the 2 nodes. > Whilte executing mmmount i get exception: > [root at LH20-GPFS1 ~]# mmmount /fs_gpfs01 > Tue Jul 4 11:14:18 IDT 2017: mmmount: Mounting file systems ... > mount: mount fs_gpfs01 on /fs_gpfs01 failed: Wrong medium type > mmmount: Command failed. Examine previous error messages to determine > cause. > > > What am i doing wrong ? > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From r.sobey at imperial.ac.uk Tue Jul 4 11:54:52 2017 From: r.sobey at imperial.ac.uk (Sobey, Richard A) Date: Tue, 4 Jul 2017 10:54:52 +0000 Subject: [gpfsug-discuss] Requirement to keep gpfs.smb the same on all nodes Message-ID: Hi all, For how long has this requirement been in force, and why? https://www.ibm.com/support/knowledgecenter/en/STXKQY_4.2.3/com.ibm.spectrum.scale.v4r23.doc/bl1ins_updatingsmb.htm All protocol nodes running the SMB service must have the same version of gpfs.smb installed at any time. This requires a brief outage of the SMB service to upgrade gpfs.smb to the newer version across all protocol nodes. The procedure outlined here is intended to reduce the outage to a minimum. Previously I've upgraded nodes one at a time over the course of a few days. Is the impact just that we won't be supported, or will a hole open up beneath my feet and swallow me whole? I really don't fancy the headache of getting approvals to get an outage of even 5 minutes at 6am.... Cheers Richard -------------- next part -------------- An HTML attachment was scrubbed... URL: From ilan84 at gmail.com Tue Jul 4 11:56:20 2017 From: ilan84 at gmail.com (Ilan Schwarts) Date: Tue, 4 Jul 2017 13:56:20 +0300 Subject: [gpfsug-discuss] Fail to mount file system In-Reply-To: References: Message-ID: [root at LH20-GPFS1 ~]# mmmount fs_gpfs01 -a Tue Jul 4 13:52:07 IDT 2017: mmmount: Mounting file systems ... LH20-GPFS1: mount: mount fs_gpfs01 on /fs_gpfs01 failed: Wrong medium type mmdsh: LH20-GPFS1 remote shell process had return code 32. LH20-GPFS2: mount: mount fs_gpfs01 on /fs_gpfs01 failed: Stale file handle mmdsh: LH20-GPFS2 remote shell process had return code 32. mmmount: Command failed. Examine previous error messages to determine cause. [root at LH20-GPFS1 ~]# mmmount -o rs /fs_gpfs01 mmmount: Mount point can not be a relative path name: rs [root at LH20-GPFS1 ~]# mmmount -o rs fs_gpfs01 mmmount: Mount point can not be a relative path name: rs I recieve in "dmesg": [ 18.338044] sd 2:0:0:1: [sdc] Attached SCSI disk [ 141.363422] hvt_cn_callback: unexpected netlink message! [ 141.366153] hvt_cn_callback: unexpected netlink message! [ 4479.292850] tracedev: loading out-of-tree module taints kernel. [ 4479.292888] tracedev: module verification failed: signature and/or required key missing - tainting kernel [ 4482.928413] ------------[ cut here ]------------ [ 4482.928445] WARNING: at fs/xfs/xfs_aops.c:906 xfs_do_writepage+0x537/0x550 [xfs]() [ 4482.928446] Modules linked in: mmfs26(OE) mmfslinux(OE) tracedev(OE) iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi ext4 mbcache jbd2 loop intel_powerclamp iosf_mbi sg pcspkr hv_utils i2c_piix4 i2c_core nfsd auth_rpcgss nfs_acl lockd grace sunrpc binfmt_misc ip_tables xfs libcrc32c sd_mod crc_t10dif crct10dif_generic crct10dif_common ata_generic pata_acpi hv_netvsc hyperv_keyboard hid_hyperv hv_storvsc hyperv_fb serio_raw fjes floppy libata hv_vmbus dm_mirror dm_region_hash dm_log dm_mod [ 4482.928471] CPU: 1 PID: 15210 Comm: mmfsd Tainted: G OE ------------ 3.10.0-514.21.2.el7.x86_64 #1 On Tue, Jul 4, 2017 at 11:36 AM, IBM Spectrum Scale wrote: > What exactly do you mean by "I have received existing corrupted GPFS 4.2.2 > lab"? > Is the file system corrupted ? Maybe this error is then due to file system > corruption. > > Can you once try: mmmount fs_gpfs01 -a > If this does not work then try: mmmount -o rs fs_gpfs01 > > Let me know which mount is working. > > Regards, The Spectrum Scale (GPFS) team > > ------------------------------------------------------------------------------------------------------------------ > If you feel that your question can benefit other users of Spectrum Scale > (GPFS), then please post it to the public IBM developerWroks Forum at > https://www.ibm.com/developerworks/community/forums/html/forum?id=11111111-0000-0000-0000-000000000479. > > If your query concerns a potential software error in Spectrum Scale (GPFS) > and you have an IBM software maintenance contract please contact > 1-800-237-5511 in the United States or your local IBM Service Center in > other countries. > > The forum is informally monitored as time permits and should not be used for > priority messages to the Spectrum Scale (GPFS) team. > > > > From: Ilan Schwarts > To: gpfsug-discuss at spectrumscale.org > Date: 07/04/2017 01:47 PM > Subject: [gpfsug-discuss] Fail to mount file system > Sent by: gpfsug-discuss-bounces at spectrumscale.org > ________________________________ > > > > Hi everyone, I have received existing corrupted GPFS 4.2.2 lab and I > am trying to make it work. > There are 2 nodes in a cluster: > [root at LH20-GPFS1 ~]# mmgetstate -a > > Node number Node name GPFS state > ------------------------------------------ > 1 LH20-GPFS1 active > 3 LH20-GPFS2 active > > The Cluster status is: > [root at LH20-GPFS1 ~]# mmlscluster > > GPFS cluster information > ======================== > GPFS cluster name: MyCluster.LH20-GPFS2 > GPFS cluster id: 10777108240438931454 > GPFS UID domain: MyCluster.LH20-GPFS2 > Remote shell command: /usr/bin/ssh > Remote file copy command: /usr/bin/scp > Repository type: CCR > > Node Daemon node name IP address Admin node name Designation > -------------------------------------------------------------------- > 1 LH20-GPFS1 10.10.158.61 LH20-GPFS1 quorum-manager > 3 LH20-GPFS2 10.10.158.62 LH20-GPFS2 > > There is a file system: > [root at LH20-GPFS1 ~]# mmlsnsd > > File system Disk name NSD servers > --------------------------------------------------------------------------- > fs_gpfs01 nynsd1 (directly attached) > fs_gpfs01 nynsd2 (directly attached) > > [root at LH20-GPFS1 ~]# > > On each Node, There is folder /fs_gpfs01 > The next step is to mount this fs_gpfs01 to be synced between the 2 nodes. > Whilte executing mmmount i get exception: > [root at LH20-GPFS1 ~]# mmmount /fs_gpfs01 > Tue Jul 4 11:14:18 IDT 2017: mmmount: Mounting file systems ... > mount: mount fs_gpfs01 on /fs_gpfs01 failed: Wrong medium type > mmmount: Command failed. Examine previous error messages to determine cause. > > > What am i doing wrong ? > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > -- - Ilan Schwarts From S.J.Thompson at bham.ac.uk Tue Jul 4 12:09:18 2017 From: S.J.Thompson at bham.ac.uk (Simon Thompson (IT Research Support)) Date: Tue, 4 Jul 2017 11:09:18 +0000 Subject: [gpfsug-discuss] Requirement to keep gpfs.smb the same on all nodes Message-ID: AFAIK. Always. We have had the service eat itself BTW by having different code releases and trying this. Yes its a PITA that we have to get a change approval for it (so we don't do it as often as we should)... The upgrade process upgrades the SMB registry, we have also seen the CTDB lock stuff break when they are not running the same code release, so now we just don't do this. Simon From: > on behalf of "Sobey, Richard A" > Reply-To: "gpfsug-discuss at spectrumscale.org" > Date: Tuesday, 4 July 2017 at 11:54 To: "gpfsug-discuss at spectrumscale.org" > Subject: [gpfsug-discuss] Requirement to keep gpfs.smb the same on all nodes Hi all, For how long has this requirement been in force, and why? https://www.ibm.com/support/knowledgecenter/en/STXKQY_4.2.3/com.ibm.spectrum.scale.v4r23.doc/bl1ins_updatingsmb.htm All protocol nodes running the SMB service must have the same version of gpfs.smb installed at any time. This requires a brief outage of the SMB service to upgrade gpfs.smb to the newer version across all protocol nodes. The procedure outlined here is intended to reduce the outage to a minimum. Previously I?ve upgraded nodes one at a time over the course of a few days. Is the impact just that we won?t be supported, or will a hole open up beneath my feet and swallow me whole? I really don?t fancy the headache of getting approvals to get an outage of even 5 minutes at 6am?. Cheers Richard -------------- next part -------------- An HTML attachment was scrubbed... URL: From r.sobey at imperial.ac.uk Tue Jul 4 12:12:10 2017 From: r.sobey at imperial.ac.uk (Sobey, Richard A) Date: Tue, 4 Jul 2017 11:12:10 +0000 Subject: [gpfsug-discuss] Requirement to keep gpfs.smb the same on all nodes In-Reply-To: References: Message-ID: OK Simon, thanks. I suppose we're all in the same boat having to get change management approval etc! From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Simon Thompson (IT Research Support) Sent: 04 July 2017 12:09 To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] Requirement to keep gpfs.smb the same on all nodes AFAIK. Always. We have had the service eat itself BTW by having different code releases and trying this. Yes its a PITA that we have to get a change approval for it (so we don't do it as often as we should)... The upgrade process upgrades the SMB registry, we have also seen the CTDB lock stuff break when they are not running the same code release, so now we just don't do this. Simon From: > on behalf of "Sobey, Richard A" > Reply-To: "gpfsug-discuss at spectrumscale.org" > Date: Tuesday, 4 July 2017 at 11:54 To: "gpfsug-discuss at spectrumscale.org" > Subject: [gpfsug-discuss] Requirement to keep gpfs.smb the same on all nodes Hi all, For how long has this requirement been in force, and why? https://www.ibm.com/support/knowledgecenter/en/STXKQY_4.2.3/com.ibm.spectrum.scale.v4r23.doc/bl1ins_updatingsmb.htm All protocol nodes running the SMB service must have the same version of gpfs.smb installed at any time. This requires a brief outage of the SMB service to upgrade gpfs.smb to the newer version across all protocol nodes. The procedure outlined here is intended to reduce the outage to a minimum. Previously I've upgraded nodes one at a time over the course of a few days. Is the impact just that we won't be supported, or will a hole open up beneath my feet and swallow me whole? I really don't fancy the headache of getting approvals to get an outage of even 5 minutes at 6am.... Cheers Richard -------------- next part -------------- An HTML attachment was scrubbed... URL: From scale at us.ibm.com Tue Jul 4 17:28:07 2017 From: scale at us.ibm.com (IBM Spectrum Scale) Date: Tue, 4 Jul 2017 21:58:07 +0530 Subject: [gpfsug-discuss] Fail to mount file system In-Reply-To: References: Message-ID: My bad gave the wrong command, the right one is: mmmount fs_gpfs01 -o rs Also can you send output of mmlsnsd -X, need to check device type of the NSDs. Are you ok with deleting the file system and disks and building everything from scratch? Regards, The Spectrum Scale (GPFS) team ------------------------------------------------------------------------------------------------------------------ If you feel that your question can benefit other users of Spectrum Scale (GPFS), then please post it to the public IBM developerWroks Forum at https://www.ibm.com/developerworks/community/forums/html/forum?id=11111111-0000-0000-0000-000000000479 . If your query concerns a potential software error in Spectrum Scale (GPFS) and you have an IBM software maintenance contract please contact 1-800-237-5511 in the United States or your local IBM Service Center in other countries. The forum is informally monitored as time permits and should not be used for priority messages to the Spectrum Scale (GPFS) team. From: Ilan Schwarts To: IBM Spectrum Scale Cc: gpfsug-discuss-bounces at spectrumscale.org, gpfsug main discussion list Date: 07/04/2017 04:26 PM Subject: Re: [gpfsug-discuss] Fail to mount file system [root at LH20-GPFS1 ~]# mmmount fs_gpfs01 -a Tue Jul 4 13:52:07 IDT 2017: mmmount: Mounting file systems ... LH20-GPFS1: mount: mount fs_gpfs01 on /fs_gpfs01 failed: Wrong medium type mmdsh: LH20-GPFS1 remote shell process had return code 32. LH20-GPFS2: mount: mount fs_gpfs01 on /fs_gpfs01 failed: Stale file handle mmdsh: LH20-GPFS2 remote shell process had return code 32. mmmount: Command failed. Examine previous error messages to determine cause. [root at LH20-GPFS1 ~]# mmmount -o rs /fs_gpfs01 mmmount: Mount point can not be a relative path name: rs [root at LH20-GPFS1 ~]# mmmount -o rs fs_gpfs01 mmmount: Mount point can not be a relative path name: rs I recieve in "dmesg": [ 18.338044] sd 2:0:0:1: [sdc] Attached SCSI disk [ 141.363422] hvt_cn_callback: unexpected netlink message! [ 141.366153] hvt_cn_callback: unexpected netlink message! [ 4479.292850] tracedev: loading out-of-tree module taints kernel. [ 4479.292888] tracedev: module verification failed: signature and/or required key missing - tainting kernel [ 4482.928413] ------------[ cut here ]------------ [ 4482.928445] WARNING: at fs/xfs/xfs_aops.c:906 xfs_do_writepage+0x537/0x550 [xfs]() [ 4482.928446] Modules linked in: mmfs26(OE) mmfslinux(OE) tracedev(OE) iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi ext4 mbcache jbd2 loop intel_powerclamp iosf_mbi sg pcspkr hv_utils i2c_piix4 i2c_core nfsd auth_rpcgss nfs_acl lockd grace sunrpc binfmt_misc ip_tables xfs libcrc32c sd_mod crc_t10dif crct10dif_generic crct10dif_common ata_generic pata_acpi hv_netvsc hyperv_keyboard hid_hyperv hv_storvsc hyperv_fb serio_raw fjes floppy libata hv_vmbus dm_mirror dm_region_hash dm_log dm_mod [ 4482.928471] CPU: 1 PID: 15210 Comm: mmfsd Tainted: G OE ------------ 3.10.0-514.21.2.el7.x86_64 #1 On Tue, Jul 4, 2017 at 11:36 AM, IBM Spectrum Scale wrote: > What exactly do you mean by "I have received existing corrupted GPFS 4.2.2 > lab"? > Is the file system corrupted ? Maybe this error is then due to file system > corruption. > > Can you once try: mmmount fs_gpfs01 -a > If this does not work then try: mmmount -o rs fs_gpfs01 > > Let me know which mount is working. > > Regards, The Spectrum Scale (GPFS) team > > ------------------------------------------------------------------------------------------------------------------ > If you feel that your question can benefit other users of Spectrum Scale > (GPFS), then please post it to the public IBM developerWroks Forum at > https://www.ibm.com/developerworks/community/forums/html/forum?id=11111111-0000-0000-0000-000000000479 . > > If your query concerns a potential software error in Spectrum Scale (GPFS) > and you have an IBM software maintenance contract please contact > 1-800-237-5511 in the United States or your local IBM Service Center in > other countries. > > The forum is informally monitored as time permits and should not be used for > priority messages to the Spectrum Scale (GPFS) team. > > > > From: Ilan Schwarts > To: gpfsug-discuss at spectrumscale.org > Date: 07/04/2017 01:47 PM > Subject: [gpfsug-discuss] Fail to mount file system > Sent by: gpfsug-discuss-bounces at spectrumscale.org > ________________________________ > > > > Hi everyone, I have received existing corrupted GPFS 4.2.2 lab and I > am trying to make it work. > There are 2 nodes in a cluster: > [root at LH20-GPFS1 ~]# mmgetstate -a > > Node number Node name GPFS state > ------------------------------------------ > 1 LH20-GPFS1 active > 3 LH20-GPFS2 active > > The Cluster status is: > [root at LH20-GPFS1 ~]# mmlscluster > > GPFS cluster information > ======================== > GPFS cluster name: MyCluster.LH20-GPFS2 > GPFS cluster id: 10777108240438931454 > GPFS UID domain: MyCluster.LH20-GPFS2 > Remote shell command: /usr/bin/ssh > Remote file copy command: /usr/bin/scp > Repository type: CCR > > Node Daemon node name IP address Admin node name Designation > -------------------------------------------------------------------- > 1 LH20-GPFS1 10.10.158.61 LH20-GPFS1 quorum-manager > 3 LH20-GPFS2 10.10.158.62 LH20-GPFS2 > > There is a file system: > [root at LH20-GPFS1 ~]# mmlsnsd > > File system Disk name NSD servers > --------------------------------------------------------------------------- > fs_gpfs01 nynsd1 (directly attached) > fs_gpfs01 nynsd2 (directly attached) > > [root at LH20-GPFS1 ~]# > > On each Node, There is folder /fs_gpfs01 > The next step is to mount this fs_gpfs01 to be synced between the 2 nodes. > Whilte executing mmmount i get exception: > [root at LH20-GPFS1 ~]# mmmount /fs_gpfs01 > Tue Jul 4 11:14:18 IDT 2017: mmmount: Mounting file systems ... > mount: mount fs_gpfs01 on /fs_gpfs01 failed: Wrong medium type > mmmount: Command failed. Examine previous error messages to determine cause. > > > What am i doing wrong ? > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > -- - Ilan Schwarts -------------- next part -------------- An HTML attachment was scrubbed... URL: From ilan84 at gmail.com Tue Jul 4 17:46:17 2017 From: ilan84 at gmail.com (Ilan Schwarts) Date: Tue, 4 Jul 2017 19:46:17 +0300 Subject: [gpfsug-discuss] Fail to mount file system In-Reply-To: References: Message-ID: Yes I am ok with deleting. I follow a guide from john olsen at the ibm team from tuscon.. but the guide had steps after the gpfs setup... Is there step by step guide for gpfs cluster setup other than the one in the ibm site? Thank My bad gave the wrong command, the right one is: mmmount fs_gpfs01-o rs Also can you send output of mmlsnsd -X, need to check device type of the NSDs. Are you ok with deleting the file system and disks and building everything from scratch? Regards, The Spectrum Scale (GPFS) team ------------------------------------------------------------ ------------------------------------------------------ If you feel that your question can benefit other users of Spectrum Scale (GPFS), then please post it to the public IBM developerWroks Forum at https://www.ibm.com/developerworks/community/forums/html/forum?id=11111111- 0000-0000-0000-000000000479. If your query concerns a potential software error in Spectrum Scale (GPFS) and you have an IBM software maintenance contract please contact 1-800-237-5511 in the United States or your local IBM Service Center in other countries. The forum is informally monitored as time permits and should not be used for priority messages to the Spectrum Scale (GPFS) team. From: Ilan Schwarts To: IBM Spectrum Scale Cc: gpfsug-discuss-bounces at spectrumscale.org, gpfsug main discussion list Date: 07/04/2017 04:26 PM Subject: Re: [gpfsug-discuss] Fail to mount file system ------------------------------ [root at LH20-GPFS1 ~]# mmmount fs_gpfs01 -a Tue Jul 4 13:52:07 IDT 2017: mmmount: Mounting file systems ... LH20-GPFS1: mount: mount fs_gpfs01 on /fs_gpfs01 failed: Wrong medium type mmdsh: LH20-GPFS1 remote shell process had return code 32. LH20-GPFS2: mount: mount fs_gpfs01 on /fs_gpfs01 failed: Stale file handle mmdsh: LH20-GPFS2 remote shell process had return code 32. mmmount: Command failed. Examine previous error messages to determine cause. [root at LH20-GPFS1 ~]# mmmount -o rs /fs_gpfs01 mmmount: Mount point can not be a relative path name: rs [root at LH20-GPFS1 ~]# mmmount -o rs fs_gpfs01 mmmount: Mount point can not be a relative path name: rs I recieve in "dmesg": [ 18.338044] sd 2:0:0:1: [sdc] Attached SCSI disk [ 141.363422] hvt_cn_callback: unexpected netlink message! [ 141.366153] hvt_cn_callback: unexpected netlink message! [ 4479.292850] tracedev: loading out-of-tree module taints kernel. [ 4479.292888] tracedev: module verification failed: signature and/or required key missing - tainting kernel [ 4482.928413] ------------[ cut here ]------------ [ 4482.928445] WARNING: at fs/xfs/xfs_aops.c:906 xfs_do_writepage+0x537/0x550 [xfs]() [ 4482.928446] Modules linked in: mmfs26(OE) mmfslinux(OE) tracedev(OE) iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi ext4 mbcache jbd2 loop intel_powerclamp iosf_mbi sg pcspkr hv_utils i2c_piix4 i2c_core nfsd auth_rpcgss nfs_acl lockd grace sunrpc binfmt_misc ip_tables xfs libcrc32c sd_mod crc_t10dif crct10dif_generic crct10dif_common ata_generic pata_acpi hv_netvsc hyperv_keyboard hid_hyperv hv_storvsc hyperv_fb serio_raw fjes floppy libata hv_vmbus dm_mirror dm_region_hash dm_log dm_mod [ 4482.928471] CPU: 1 PID: 15210 Comm: mmfsd Tainted: G OE ------------ 3.10.0-514.21.2.el7.x86_64 #1 On Tue, Jul 4, 2017 at 11:36 AM, IBM Spectrum Scale wrote: > What exactly do you mean by "I have received existing corrupted GPFS 4.2.2 > lab"? > Is the file system corrupted ? Maybe this error is then due to file system > corruption. > > Can you once try: mmmount fs_gpfs01 -a > If this does not work then try: mmmount -o rs fs_gpfs01 > > Let me know which mount is working. > > Regards, The Spectrum Scale (GPFS) team > > ------------------------------------------------------------ ------------------------------------------------------ > If you feel that your question can benefit other users of Spectrum Scale > (GPFS), then please post it to the public IBM developerWroks Forum at > https://www.ibm.com/developerworks/community/ forums/html/forum?id=11111111-0000-0000-0000-000000000479. > > If your query concerns a potential software error in Spectrum Scale (GPFS) > and you have an IBM software maintenance contract please contact > 1-800-237-5511 in the United States or your local IBM Service Center in > other countries. > > The forum is informally monitored as time permits and should not be used for > priority messages to the Spectrum Scale (GPFS) team. > > > > From: Ilan Schwarts > To: gpfsug-discuss at spectrumscale.org > Date: 07/04/2017 01:47 PM > Subject: [gpfsug-discuss] Fail to mount file system > Sent by: gpfsug-discuss-bounces at spectrumscale.org > ________________________________ > > > > Hi everyone, I have received existing corrupted GPFS 4.2.2 lab and I > am trying to make it work. > There are 2 nodes in a cluster: > [root at LH20-GPFS1 ~]# mmgetstate -a > > Node number Node name GPFS state > ------------------------------------------ > 1 LH20-GPFS1 active > 3 LH20-GPFS2 active > > The Cluster status is: > [root at LH20-GPFS1 ~]# mmlscluster > > GPFS cluster information > ======================== > GPFS cluster name: MyCluster.LH20-GPFS2 > GPFS cluster id: 10777108240438931454 > GPFS UID domain: MyCluster.LH20-GPFS2 > Remote shell command: /usr/bin/ssh > Remote file copy command: /usr/bin/scp > Repository type: CCR > > Node Daemon node name IP address Admin node name Designation > -------------------------------------------------------------------- > 1 LH20-GPFS1 10.10.158.61 LH20-GPFS1 quorum-manager > 3 LH20-GPFS2 10.10.158.62 LH20-GPFS2 > > There is a file system: > [root at LH20-GPFS1 ~]# mmlsnsd > > File system Disk name NSD servers > ------------------------------------------------------------ --------------- > fs_gpfs01 nynsd1 (directly attached) > fs_gpfs01 nynsd2 (directly attached) > > [root at LH20-GPFS1 ~]# > > On each Node, There is folder /fs_gpfs01 > The next step is to mount this fs_gpfs01 to be synced between the 2 nodes. > Whilte executing mmmount i get exception: > [root at LH20-GPFS1 ~]# mmmount /fs_gpfs01 > Tue Jul 4 11:14:18 IDT 2017: mmmount: Mounting file systems ... > mount: mount fs_gpfs01 on /fs_gpfs01 failed: Wrong medium type > mmmount: Command failed. Examine previous error messages to determine cause. > > > What am i doing wrong ? > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > -- - Ilan Schwarts -------------- next part -------------- An HTML attachment was scrubbed... URL: From jcatana at gmail.com Tue Jul 4 17:47:09 2017 From: jcatana at gmail.com (Josh Catana) Date: Tue, 4 Jul 2017 12:47:09 -0400 Subject: [gpfsug-discuss] Fail to mount file system In-Reply-To: References: Message-ID: Check /var/adm/ras/mmfs.log.latest The dmesg xfs bug is probably from boot if you look at the dmesg with -T to show the timestamp On Jul 4, 2017 12:29 PM, "IBM Spectrum Scale" wrote: > My bad gave the wrong command, the right one is: mmmount fs_gpfs01-o rs > > Also can you send output of mmlsnsd -X, need to check device type of the > NSDs. > > Are you ok with deleting the file system and disks and building everything > from scratch? > > > Regards, The Spectrum Scale (GPFS) team > > ------------------------------------------------------------ > ------------------------------------------------------ > If you feel that your question can benefit other users of Spectrum Scale > (GPFS), then please post it to the public IBM developerWroks Forum at > https://www.ibm.com/developerworks/community/ > forums/html/forum?id=11111111-0000-0000-0000-000000000479. > > If your query concerns a potential software error in Spectrum Scale (GPFS) > and you have an IBM software maintenance contract please contact > 1-800-237-5511 <(800)%20237-5511> in the United States or your local IBM > Service Center in other countries. > > The forum is informally monitored as time permits and should not be used > for priority messages to the Spectrum Scale (GPFS) team. > > > > From: Ilan Schwarts > To: IBM Spectrum Scale > Cc: gpfsug-discuss-bounces at spectrumscale.org, gpfsug main > discussion list > Date: 07/04/2017 04:26 PM > Subject: Re: [gpfsug-discuss] Fail to mount file system > ------------------------------ > > > > [root at LH20-GPFS1 ~]# mmmount fs_gpfs01 -a > Tue Jul 4 13:52:07 IDT 2017: mmmount: Mounting file systems ... > LH20-GPFS1: mount: mount fs_gpfs01 on /fs_gpfs01 failed: Wrong medium type > mmdsh: LH20-GPFS1 remote shell process had return code 32. > LH20-GPFS2: mount: mount fs_gpfs01 on /fs_gpfs01 failed: Stale file handle > mmdsh: LH20-GPFS2 remote shell process had return code 32. > mmmount: Command failed. Examine previous error messages to determine > cause. > > [root at LH20-GPFS1 ~]# mmmount -o rs /fs_gpfs01 > mmmount: Mount point can not be a relative path name: rs > [root at LH20-GPFS1 ~]# mmmount -o rs fs_gpfs01 > mmmount: Mount point can not be a relative path name: rs > > > > I recieve in "dmesg": > > [ 18.338044] sd 2:0:0:1: [sdc] Attached SCSI disk > [ 141.363422] hvt_cn_callback: unexpected netlink message! > [ 141.366153] hvt_cn_callback: unexpected netlink message! > [ 4479.292850] tracedev: loading out-of-tree module taints kernel. > [ 4479.292888] tracedev: module verification failed: signature and/or > required key missing - tainting kernel > [ 4482.928413] ------------[ cut here ]------------ > [ 4482.928445] WARNING: at fs/xfs/xfs_aops.c:906 > xfs_do_writepage+0x537/0x550 [xfs]() > [ 4482.928446] Modules linked in: mmfs26(OE) mmfslinux(OE) > tracedev(OE) iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi ext4 > mbcache jbd2 loop intel_powerclamp iosf_mbi sg pcspkr hv_utils > i2c_piix4 i2c_core nfsd auth_rpcgss nfs_acl lockd grace sunrpc > binfmt_misc ip_tables xfs libcrc32c sd_mod crc_t10dif > crct10dif_generic crct10dif_common ata_generic pata_acpi hv_netvsc > hyperv_keyboard hid_hyperv hv_storvsc hyperv_fb serio_raw fjes floppy > libata hv_vmbus dm_mirror dm_region_hash dm_log dm_mod > [ 4482.928471] CPU: 1 PID: 15210 Comm: mmfsd Tainted: G OE > ------------ 3.10.0-514.21.2.el7.x86_64 #1 > > On Tue, Jul 4, 2017 at 11:36 AM, IBM Spectrum Scale > wrote: > > What exactly do you mean by "I have received existing corrupted GPFS > 4.2.2 > > lab"? > > Is the file system corrupted ? Maybe this error is then due to file > system > > corruption. > > > > Can you once try: mmmount fs_gpfs01 -a > > If this does not work then try: mmmount -o rs fs_gpfs01 > > > > Let me know which mount is working. > > > > Regards, The Spectrum Scale (GPFS) team > > > > ------------------------------------------------------------ > ------------------------------------------------------ > > If you feel that your question can benefit other users of Spectrum Scale > > (GPFS), then please post it to the public IBM developerWroks Forum at > > https://www.ibm.com/developerworks/community/ > forums/html/forum?id=11111111-0000-0000-0000-000000000479. > > > > If your query concerns a potential software error in Spectrum Scale > (GPFS) > > and you have an IBM software maintenance contract please contact > > 1-800-237-5511 <(800)%20237-5511> in the United States or your local > IBM Service Center in > > other countries. > > > > The forum is informally monitored as time permits and should not be used > for > > priority messages to the Spectrum Scale (GPFS) team. > > > > > > > > From: Ilan Schwarts > > To: gpfsug-discuss at spectrumscale.org > > Date: 07/04/2017 01:47 PM > > Subject: [gpfsug-discuss] Fail to mount file system > > Sent by: gpfsug-discuss-bounces at spectrumscale.org > > ________________________________ > > > > > > > > Hi everyone, I have received existing corrupted GPFS 4.2.2 lab and I > > am trying to make it work. > > There are 2 nodes in a cluster: > > [root at LH20-GPFS1 ~]# mmgetstate -a > > > > Node number Node name GPFS state > > ------------------------------------------ > > 1 LH20-GPFS1 active > > 3 LH20-GPFS2 active > > > > The Cluster status is: > > [root at LH20-GPFS1 ~]# mmlscluster > > > > GPFS cluster information > > ======================== > > GPFS cluster name: MyCluster.LH20-GPFS2 > > GPFS cluster id: 10777108240438931454 > > GPFS UID domain: MyCluster.LH20-GPFS2 > > Remote shell command: /usr/bin/ssh > > Remote file copy command: /usr/bin/scp > > Repository type: CCR > > > > Node Daemon node name IP address Admin node name Designation > > -------------------------------------------------------------------- > > 1 LH20-GPFS1 10.10.158.61 LH20-GPFS1 quorum-manager > > 3 LH20-GPFS2 10.10.158.62 LH20-GPFS2 > > > > There is a file system: > > [root at LH20-GPFS1 ~]# mmlsnsd > > > > File system Disk name NSD servers > > ------------------------------------------------------------ > --------------- > > fs_gpfs01 nynsd1 (directly attached) > > fs_gpfs01 nynsd2 (directly attached) > > > > [root at LH20-GPFS1 ~]# > > > > On each Node, There is folder /fs_gpfs01 > > The next step is to mount this fs_gpfs01 to be synced between the 2 > nodes. > > Whilte executing mmmount i get exception: > > [root at LH20-GPFS1 ~]# mmmount /fs_gpfs01 > > Tue Jul 4 11:14:18 IDT 2017: mmmount: Mounting file systems ... > > mount: mount fs_gpfs01 on /fs_gpfs01 failed: Wrong medium type > > mmmount: Command failed. Examine previous error messages to determine > cause. > > > > > > What am i doing wrong ? > > _______________________________________________ > > gpfsug-discuss mailing list > > gpfsug-discuss at spectrumscale.org > > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > > > > > > > > > -- > > > - > Ilan Schwarts > > > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From scale at us.ibm.com Tue Jul 4 19:15:49 2017 From: scale at us.ibm.com (IBM Spectrum Scale) Date: Tue, 4 Jul 2017 23:45:49 +0530 Subject: [gpfsug-discuss] Fail to mount file system In-Reply-To: References: Message-ID: You can refer to the concepts, planning and installation guide at the link ( https://www.ibm.com/support/knowledgecenter/STXKQY_4.2.2/com.ibm.spectrum.scale.v4r22.doc/bl1xx_library_prodoc.htm ) for finding detailed steps on setting up a cluster or creating a file system. Or open a PMR and work with IBM support to set it up. In your case (just as an example) you can use the below simple steps to delete and recreate the file system: 1) To delete file system and NSDs: a) Unmount file system - mmumount -a b) Delete file system - mmdelfs c) Delete NSDs - mmdelnsd "nynsd1;nynsd2" 2) To create file system with both disks in one system pool and having dataAndMetadata and data and metadata replica and directly attached to the nodes, you can use following steps: a) Create a /tmp/nsd file and fill it up with below information :::dataAndMetadata:1:nynsd1:system :::dataAndMetadata:2:nynsd2:system b) Use mmcrnsd -F /tmp/nsd to create NSDs c) Create file system using (just an example with assumptions on config) - mmcrfs /dev/fs_gpfs01 -F /tmp/nsd -A yes -B 256K -n 32 -m 2 -r 2 -T /fs_gpfs01 You can refer to above guide for configuring it in other ways as you want. If you have any issues with these steps you can raise PMR and follow proper channel to setup file system as well. Regards, The Spectrum Scale (GPFS) team ------------------------------------------------------------------------------------------------------------------ If you feel that your question can benefit other users of Spectrum Scale (GPFS), then please post it to the public IBM developerWroks Forum at https://www.ibm.com/developerworks/community/forums/html/forum?id=11111111-0000-0000-0000-000000000479 . If your query concerns a potential software error in Spectrum Scale (GPFS) and you have an IBM software maintenance contract please contact 1-800-237-5511 in the United States or your local IBM Service Center in other countries. The forum is informally monitored as time permits and should not be used for priority messages to the Spectrum Scale (GPFS) team. From: Ilan Schwarts To: IBM Spectrum Scale Cc: gpfsug-discuss-bounces at spectrumscale.org, gpfsug main discussion list Date: 07/04/2017 10:16 PM Subject: Re: [gpfsug-discuss] Fail to mount file system Yes I am ok with deleting. I follow a guide from john olsen at the ibm team from tuscon.. but the guide had steps after the gpfs setup... Is there step by step guide for gpfs cluster setup other than the one in the ibm site? Thank My bad gave the wrong command, the right one is: mmmount fs_gpfs01-o rs Also can you send output of mmlsnsd -X, need to check device type of the NSDs. Are you ok with deleting the file system and disks and building everything from scratch? Regards, The Spectrum Scale (GPFS) team ------------------------------------------------------------------------------------------------------------------ If you feel that your question can benefit other users of Spectrum Scale (GPFS), then please post it to the public IBM developerWroks Forum at https://www.ibm.com/developerworks/community/forums/html/forum?id=11111111-0000-0000-0000-000000000479 . If your query concerns a potential software error in Spectrum Scale (GPFS) and you have an IBM software maintenance contract please contact 1-800-237-5511 in the United States or your local IBM Service Center in other countries. The forum is informally monitored as time permits and should not be used for priority messages to the Spectrum Scale (GPFS) team. From: Ilan Schwarts To: IBM Spectrum Scale Cc: gpfsug-discuss-bounces at spectrumscale.org, gpfsug main discussion list Date: 07/04/2017 04:26 PM Subject: Re: [gpfsug-discuss] Fail to mount file system [root at LH20-GPFS1 ~]# mmmount fs_gpfs01 -a Tue Jul 4 13:52:07 IDT 2017: mmmount: Mounting file systems ... LH20-GPFS1: mount: mount fs_gpfs01 on /fs_gpfs01 failed: Wrong medium type mmdsh: LH20-GPFS1 remote shell process had return code 32. LH20-GPFS2: mount: mount fs_gpfs01 on /fs_gpfs01 failed: Stale file handle mmdsh: LH20-GPFS2 remote shell process had return code 32. mmmount: Command failed. Examine previous error messages to determine cause. [root at LH20-GPFS1 ~]# mmmount -o rs /fs_gpfs01 mmmount: Mount point can not be a relative path name: rs [root at LH20-GPFS1 ~]# mmmount -o rs fs_gpfs01 mmmount: Mount point can not be a relative path name: rs I recieve in "dmesg": [ 18.338044] sd 2:0:0:1: [sdc] Attached SCSI disk [ 141.363422] hvt_cn_callback: unexpected netlink message! [ 141.366153] hvt_cn_callback: unexpected netlink message! [ 4479.292850] tracedev: loading out-of-tree module taints kernel. [ 4479.292888] tracedev: module verification failed: signature and/or required key missing - tainting kernel [ 4482.928413] ------------[ cut here ]------------ [ 4482.928445] WARNING: at fs/xfs/xfs_aops.c:906 xfs_do_writepage+0x537/0x550 [xfs]() [ 4482.928446] Modules linked in: mmfs26(OE) mmfslinux(OE) tracedev(OE) iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi ext4 mbcache jbd2 loop intel_powerclamp iosf_mbi sg pcspkr hv_utils i2c_piix4 i2c_core nfsd auth_rpcgss nfs_acl lockd grace sunrpc binfmt_misc ip_tables xfs libcrc32c sd_mod crc_t10dif crct10dif_generic crct10dif_common ata_generic pata_acpi hv_netvsc hyperv_keyboard hid_hyperv hv_storvsc hyperv_fb serio_raw fjes floppy libata hv_vmbus dm_mirror dm_region_hash dm_log dm_mod [ 4482.928471] CPU: 1 PID: 15210 Comm: mmfsd Tainted: G OE ------------ 3.10.0-514.21.2.el7.x86_64 #1 On Tue, Jul 4, 2017 at 11:36 AM, IBM Spectrum Scale wrote: > What exactly do you mean by "I have received existing corrupted GPFS 4.2.2 > lab"? > Is the file system corrupted ? Maybe this error is then due to file system > corruption. > > Can you once try: mmmount fs_gpfs01 -a > If this does not work then try: mmmount -o rs fs_gpfs01 > > Let me know which mount is working. > > Regards, The Spectrum Scale (GPFS) team > > ------------------------------------------------------------------------------------------------------------------ > If you feel that your question can benefit other users of Spectrum Scale > (GPFS), then please post it to the public IBM developerWroks Forum at > https://www.ibm.com/developerworks/community/forums/html/forum?id=11111111-0000-0000-0000-000000000479 . > > If your query concerns a potential software error in Spectrum Scale (GPFS) > and you have an IBM software maintenance contract please contact > 1-800-237-5511 in the United States or your local IBM Service Center in > other countries. > > The forum is informally monitored as time permits and should not be used for > priority messages to the Spectrum Scale (GPFS) team. > > > > From: Ilan Schwarts > To: gpfsug-discuss at spectrumscale.org > Date: 07/04/2017 01:47 PM > Subject: [gpfsug-discuss] Fail to mount file system > Sent by: gpfsug-discuss-bounces at spectrumscale.org > ________________________________ > > > > Hi everyone, I have received existing corrupted GPFS 4.2.2 lab and I > am trying to make it work. > There are 2 nodes in a cluster: > [root at LH20-GPFS1 ~]# mmgetstate -a > > Node number Node name GPFS state > ------------------------------------------ > 1 LH20-GPFS1 active > 3 LH20-GPFS2 active > > The Cluster status is: > [root at LH20-GPFS1 ~]# mmlscluster > > GPFS cluster information > ======================== > GPFS cluster name: MyCluster.LH20-GPFS2 > GPFS cluster id: 10777108240438931454 > GPFS UID domain: MyCluster.LH20-GPFS2 > Remote shell command: /usr/bin/ssh > Remote file copy command: /usr/bin/scp > Repository type: CCR > > Node Daemon node name IP address Admin node name Designation > -------------------------------------------------------------------- > 1 LH20-GPFS1 10.10.158.61 LH20-GPFS1 quorum-manager > 3 LH20-GPFS2 10.10.158.62 LH20-GPFS2 > > There is a file system: > [root at LH20-GPFS1 ~]# mmlsnsd > > File system Disk name NSD servers > --------------------------------------------------------------------------- > fs_gpfs01 nynsd1 (directly attached) > fs_gpfs01 nynsd2 (directly attached) > > [root at LH20-GPFS1 ~]# > > On each Node, There is folder /fs_gpfs01 > The next step is to mount this fs_gpfs01 to be synced between the 2 nodes. > Whilte executing mmmount i get exception: > [root at LH20-GPFS1 ~]# mmmount /fs_gpfs01 > Tue Jul 4 11:14:18 IDT 2017: mmmount: Mounting file systems ... > mount: mount fs_gpfs01 on /fs_gpfs01 failed: Wrong medium type > mmmount: Command failed. Examine previous error messages to determine cause. > > > What am i doing wrong ? > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > -- - Ilan Schwarts -------------- next part -------------- An HTML attachment was scrubbed... URL: From ilan84 at gmail.com Wed Jul 5 08:02:19 2017 From: ilan84 at gmail.com (Ilan Schwarts) Date: Wed, 5 Jul 2017 10:02:19 +0300 Subject: [gpfsug-discuss] Fail to mount file system In-Reply-To: References: Message-ID: Hi, [root at LH20-GPFS2 ~]# mmlsnsd -X Disk name NSD volume ID Device Devtype Node name Remarks --------------------------------------------------------------------------------------------------- nynsd1 0A0A9E3D594D5CA8 - - LH20-GPFS2 (not found) directly attached nynsd2 0A0A9E3D594D5CA9 - - LH20-GPFS2 (not found) directly attached mmmount failed with -o rs root at LH20-GPFS2 ~]# mmmount fs_gpfs01 -o rs Wed Jul 5 09:58:29 IDT 2017: mmmount: Mounting file systems ... mount: mount fs_gpfs01 on /fs_gpfs01 failed: Wrong medium type mmmount: Command failed. Examine previous error messages to determine cause. and in logs /var/adm/ras/mmfs.log.latest: 2017-07-05_09:58:30.009+0300: [I] Command: mount fs_gpfs01 2017-07-05_09:58:30.890+0300: Failed to open fs_gpfs01. 2017-07-05_09:58:30.890+0300: Wrong medium type 2017-07-05_09:58:30.890+0300: [E] Failed to open fs_gpfs01. 2017-07-05_09:58:30.890+0300: [W] Command: err 48: mount fs_gpfs01 From scale at us.ibm.com Wed Jul 5 08:44:19 2017 From: scale at us.ibm.com (IBM Spectrum Scale) Date: Wed, 5 Jul 2017 13:14:19 +0530 Subject: [gpfsug-discuss] Fail to mount file system In-Reply-To: References: Message-ID: >From mmlsnsd output can see that the disks are not found by gpfs (maybe some connection issue or they have been changed/removed from backend) Please open a PMR and work with IBM support to resolve this. Regards, The Spectrum Scale (GPFS) team ------------------------------------------------------------------------------------------------------------------ If you feel that your question can benefit other users of Spectrum Scale (GPFS), then please post it to the public IBM developerWroks Forum at https://www.ibm.com/developerworks/community/forums/html/forum?id=11111111-0000-0000-0000-000000000479 . If your query concerns a potential software error in Spectrum Scale (GPFS) and you have an IBM software maintenance contract please contact 1-800-237-5511 in the United States or your local IBM Service Center in other countries. The forum is informally monitored as time permits and should not be used for priority messages to the Spectrum Scale (GPFS) team. From: Ilan Schwarts To: IBM Spectrum Scale Cc: gpfsug main discussion list , gpfsug-discuss-bounces at spectrumscale.org Date: 07/05/2017 12:32 PM Subject: Re: [gpfsug-discuss] Fail to mount file system Hi, [root at LH20-GPFS2 ~]# mmlsnsd -X Disk name NSD volume ID Device Devtype Node name Remarks --------------------------------------------------------------------------------------------------- nynsd1 0A0A9E3D594D5CA8 - - LH20-GPFS2 (not found) directly attached nynsd2 0A0A9E3D594D5CA9 - - LH20-GPFS2 (not found) directly attached mmmount failed with -o rs root at LH20-GPFS2 ~]# mmmount fs_gpfs01 -o rs Wed Jul 5 09:58:29 IDT 2017: mmmount: Mounting file systems ... mount: mount fs_gpfs01 on /fs_gpfs01 failed: Wrong medium type mmmount: Command failed. Examine previous error messages to determine cause. and in logs /var/adm/ras/mmfs.log.latest: 2017-07-05_09:58:30.009+0300: [I] Command: mount fs_gpfs01 2017-07-05_09:58:30.890+0300: Failed to open fs_gpfs01. 2017-07-05_09:58:30.890+0300: Wrong medium type 2017-07-05_09:58:30.890+0300: [E] Failed to open fs_gpfs01. 2017-07-05_09:58:30.890+0300: [W] Command: err 48: mount fs_gpfs01 -------------- next part -------------- An HTML attachment was scrubbed... URL: From UWEFALKE at de.ibm.com Wed Jul 5 09:00:23 2017 From: UWEFALKE at de.ibm.com (Uwe Falke) Date: Wed, 5 Jul 2017 10:00:23 +0200 Subject: [gpfsug-discuss] Fail to mount file system In-Reply-To: References: Message-ID: Hi, maybe you need to specify your NSDs via the nsddevices user exit (Identifies local physical devices that are used as GPFS Network Shared Disks (NSDs).). script to list the NSDs , place it under /var/mmfs/etc/nsddevices. There is a template under /usr/lpp/mmfs/samples/nsddevices.sample which should provide the necessary details. Mit freundlichen Gr??en / Kind regards Dr. Uwe Falke IT Specialist High Performance Computing Services / Integrated Technology Services / Data Center Services ------------------------------------------------------------------------------------------------------------------------------------------- IBM Deutschland Rathausstr. 7 09111 Chemnitz Phone: +49 371 6978 2165 Mobile: +49 175 575 2877 E-Mail: uwefalke at de.ibm.com ------------------------------------------------------------------------------------------------------------------------------------------- IBM Deutschland Business & Technology Services GmbH / Gesch?ftsf?hrung: Andreas Hasse, Thomas Wolter Sitz der Gesellschaft: Ehningen / Registergericht: Amtsgericht Stuttgart, HRB 17122 From ilan84 at gmail.com Wed Jul 5 13:12:14 2017 From: ilan84 at gmail.com (Ilan Schwarts) Date: Wed, 5 Jul 2017 15:12:14 +0300 Subject: [gpfsug-discuss] update smb package ? Message-ID: Hi, while trying to enable SMB service i receive the following root at LH20-GPFS1 ~]# mmces service enable smb LH20-GPFS1: Cannot enable SMB service on LH20-GPFS1 LH20-GPFS1: mmcesop: Prerequisite libraries not found or correct version not LH20-GPFS1: installed. Ensure gpfs.smb is properly installed. LH20-GPFS1: mmcesop: Command failed. Examine previous error messages to determine cause. mmdsh: LH20-GPFS1 remote shell process had return code 1. Do i use normal yum update ? how to solve this issue ? Thanks From ilan84 at gmail.com Wed Jul 5 13:18:54 2017 From: ilan84 at gmail.com (Ilan Schwarts) Date: Wed, 5 Jul 2017 15:18:54 +0300 Subject: [gpfsug-discuss] Fwd: update smb package ? In-Reply-To: References: Message-ID: [root at LH20-GPFS1 ~]# rpm -qa | grep gpfs gpfs.ext-4.2.2-0.x86_64 gpfs.msg.en_US-4.2.2-0.noarch gpfs.gui-4.2.2-0.noarch gpfs.gpl-4.2.2-0.noarch gpfs.gskit-8.0.50-57.x86_64 gpfs.gss.pmsensors-4.2.2-0.el7.x86_64 gpfs.adv-4.2.2-0.x86_64 gpfs.java-4.2.2-0.x86_64 gpfs.gss.pmcollector-4.2.2-0.el7.x86_64 gpfs.base-4.2.2-0.x86_64 gpfs.crypto-4.2.2-0.x86_64 [root at LH20-GPFS1 ~]# uname -a Linux LH20-GPFS1.LH20.com 3.10.0-514.21.2.el7.x86_64 #1 SMP Tue Jun 20 12:24:47 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux [root at LH20-GPFS1 ~]# From r.sobey at imperial.ac.uk Wed Jul 5 13:23:10 2017 From: r.sobey at imperial.ac.uk (Sobey, Richard A) Date: Wed, 5 Jul 2017 12:23:10 +0000 Subject: [gpfsug-discuss] Fwd: update smb package ? In-Reply-To: References: Message-ID: You don't have the gpfs.smb package installed. Yum install gpfs.smb Or install the package manually from /usr/lpp/mmfs//smb_rpms [root at ces ~]# rpm -qa | grep gpfs gpfs.smb-4.3.11_gpfs_21-8.el7.x86_64 -----Original Message----- From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Ilan Schwarts Sent: 05 July 2017 13:19 To: gpfsug main discussion list Subject: [gpfsug-discuss] Fwd: update smb package ? [root at LH20-GPFS1 ~]# rpm -qa | grep gpfs gpfs.ext-4.2.2-0.x86_64 gpfs.msg.en_US-4.2.2-0.noarch gpfs.gui-4.2.2-0.noarch gpfs.gpl-4.2.2-0.noarch gpfs.gskit-8.0.50-57.x86_64 gpfs.gss.pmsensors-4.2.2-0.el7.x86_64 gpfs.adv-4.2.2-0.x86_64 gpfs.java-4.2.2-0.x86_64 gpfs.gss.pmcollector-4.2.2-0.el7.x86_64 gpfs.base-4.2.2-0.x86_64 gpfs.crypto-4.2.2-0.x86_64 [root at LH20-GPFS1 ~]# uname -a Linux LH20-GPFS1.LH20.com 3.10.0-514.21.2.el7.x86_64 #1 SMP Tue Jun 20 12:24:47 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux [root at LH20-GPFS1 ~]# _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From ilan84 at gmail.com Wed Jul 5 13:29:11 2017 From: ilan84 at gmail.com (Ilan Schwarts) Date: Wed, 5 Jul 2017 15:29:11 +0300 Subject: [gpfsug-discuss] Fwd: update smb package ? In-Reply-To: References: Message-ID: [root at LH20-GPFS1 ~]# yum install gpfs.smb Loaded plugins: fastestmirror, langpacks base | 3.6 kB 00:00:00 epel/x86_64/metalink | 24 kB 00:00:00 epel | 4.3 kB 00:00:00 extras | 3.4 kB 00:00:00 updates | 3.4 kB 00:00:00 (1/4): epel/x86_64/updateinfo | 789 kB 00:00:00 (2/4): extras/7/x86_64/primary_db | 188 kB 00:00:00 (3/4): epel/x86_64/primary_db | 4.8 MB 00:00:00 (4/4): updates/7/x86_64/primary_db | 7.7 MB 00:00:01 Loading mirror speeds from cached hostfile * base: centos.spd.co.il * epel: mirror.nonstop.co.il * extras: centos.spd.co.il * updates: centos.spd.co.il No package gpfs.smb available. Error: Nothing to do [root at LH20-GPFS1 ~]# ls /usr/lpp/mmfs/4.2.2.0/ gpfs_rpms/ license/ manifest zimon_debs/ zimon_rpms/ something is missing in my machine :) On Wed, Jul 5, 2017 at 3:23 PM, Sobey, Richard A wrote: > You don't have the gpfs.smb package installed. > > > > Yum install gpfs.smb > > > > Or install the package manually from /usr/lpp/mmfs//smb_rpms > > > > [root at ces ~]# rpm -qa | grep gpfs > > gpfs.smb-4.3.11_gpfs_21-8.el7.x86_64 > > > > > > -----Original Message----- > From: gpfsug-discuss-bounces at spectrumscale.org > [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Ilan Schwarts > Sent: 05 July 2017 13:19 > To: gpfsug main discussion list > Subject: [gpfsug-discuss] Fwd: update smb package ? > > > > [root at LH20-GPFS1 ~]# rpm -qa | grep gpfs > > gpfs.ext-4.2.2-0.x86_64 > > gpfs.msg.en_US-4.2.2-0.noarch > > gpfs.gui-4.2.2-0.noarch > > gpfs.gpl-4.2.2-0.noarch > > gpfs.gskit-8.0.50-57.x86_64 > > gpfs.gss.pmsensors-4.2.2-0.el7.x86_64 > > gpfs.adv-4.2.2-0.x86_64 > > gpfs.java-4.2.2-0.x86_64 > > gpfs.gss.pmcollector-4.2.2-0.el7.x86_64 > > gpfs.base-4.2.2-0.x86_64 > > gpfs.crypto-4.2.2-0.x86_64 > > [root at LH20-GPFS1 ~]# uname -a > > Linux LH20-GPFS1.LH20.com 3.10.0-514.21.2.el7.x86_64 #1 SMP Tue Jun 20 > > 12:24:47 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux > > [root at LH20-GPFS1 ~]# > > _______________________________________________ > > gpfsug-discuss mailing list > > gpfsug-discuss at spectrumscale.org > > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -- - Ilan Schwarts From r.sobey at imperial.ac.uk Wed Jul 5 13:41:29 2017 From: r.sobey at imperial.ac.uk (Sobey, Richard A) Date: Wed, 5 Jul 2017 12:41:29 +0000 Subject: [gpfsug-discuss] Fwd: update smb package ? In-Reply-To: References: Message-ID: Ah... yes you need to download the protocols version of gpfs from Fix Central. Same GPFS but with the SMB/Object etc packages. -----Original Message----- From: Ilan Schwarts [mailto:ilan84 at gmail.com] Sent: 05 July 2017 13:29 To: gpfsug main discussion list ; Sobey, Richard A Subject: Re: [gpfsug-discuss] Fwd: update smb package ? [root at LH20-GPFS1 ~]# yum install gpfs.smb Loaded plugins: fastestmirror, langpacks base | 3.6 kB 00:00:00 epel/x86_64/metalink | 24 kB 00:00:00 epel | 4.3 kB 00:00:00 extras | 3.4 kB 00:00:00 updates | 3.4 kB 00:00:00 (1/4): epel/x86_64/updateinfo | 789 kB 00:00:00 (2/4): extras/7/x86_64/primary_db | 188 kB 00:00:00 (3/4): epel/x86_64/primary_db | 4.8 MB 00:00:00 (4/4): updates/7/x86_64/primary_db | 7.7 MB 00:00:01 Loading mirror speeds from cached hostfile * base: centos.spd.co.il * epel: mirror.nonstop.co.il * extras: centos.spd.co.il * updates: centos.spd.co.il No package gpfs.smb available. Error: Nothing to do [root at LH20-GPFS1 ~]# ls /usr/lpp/mmfs/4.2.2.0/ gpfs_rpms/ license/ manifest zimon_debs/ zimon_rpms/ something is missing in my machine :) On Wed, Jul 5, 2017 at 3:23 PM, Sobey, Richard A wrote: > You don't have the gpfs.smb package installed. > > > > Yum install gpfs.smb > > > > Or install the package manually from /usr/lpp/mmfs//smb_rpms > > > > [root at ces ~]# rpm -qa | grep gpfs > > gpfs.smb-4.3.11_gpfs_21-8.el7.x86_64 > > > > > > -----Original Message----- > From: gpfsug-discuss-bounces at spectrumscale.org > [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Ilan > Schwarts > Sent: 05 July 2017 13:19 > To: gpfsug main discussion list > Subject: [gpfsug-discuss] Fwd: update smb package ? > > > > [root at LH20-GPFS1 ~]# rpm -qa | grep gpfs > > gpfs.ext-4.2.2-0.x86_64 > > gpfs.msg.en_US-4.2.2-0.noarch > > gpfs.gui-4.2.2-0.noarch > > gpfs.gpl-4.2.2-0.noarch > > gpfs.gskit-8.0.50-57.x86_64 > > gpfs.gss.pmsensors-4.2.2-0.el7.x86_64 > > gpfs.adv-4.2.2-0.x86_64 > > gpfs.java-4.2.2-0.x86_64 > > gpfs.gss.pmcollector-4.2.2-0.el7.x86_64 > > gpfs.base-4.2.2-0.x86_64 > > gpfs.crypto-4.2.2-0.x86_64 > > [root at LH20-GPFS1 ~]# uname -a > > Linux LH20-GPFS1.LH20.com 3.10.0-514.21.2.el7.x86_64 #1 SMP Tue Jun 20 > > 12:24:47 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux > > [root at LH20-GPFS1 ~]# > > _______________________________________________ > > gpfsug-discuss mailing list > > gpfsug-discuss at spectrumscale.org > > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -- - Ilan Schwarts From ilan84 at gmail.com Wed Jul 5 14:08:39 2017 From: ilan84 at gmail.com (Ilan Schwarts) Date: Wed, 5 Jul 2017 16:08:39 +0300 Subject: [gpfsug-discuss] Fwd: update smb package ? In-Reply-To: References: Message-ID: Sorry for newbish question, What do you mean by "from Fix Central", Do i need to define another repository for the yum ? or download manually ? its spectrum scale 4.2.2 On Wed, Jul 5, 2017 at 3:41 PM, Sobey, Richard A wrote: > Ah... yes you need to download the protocols version of gpfs from Fix Central. Same GPFS but with the SMB/Object etc packages. > > -----Original Message----- > From: Ilan Schwarts [mailto:ilan84 at gmail.com] > Sent: 05 July 2017 13:29 > To: gpfsug main discussion list ; Sobey, Richard A > Subject: Re: [gpfsug-discuss] Fwd: update smb package ? > > [root at LH20-GPFS1 ~]# yum install gpfs.smb Loaded plugins: fastestmirror, langpacks base > > | 3.6 kB 00:00:00 > epel/x86_64/metalink > > | 24 kB 00:00:00 > epel > > | 4.3 kB 00:00:00 > extras > > | 3.4 kB 00:00:00 > updates > > | 3.4 kB 00:00:00 > (1/4): epel/x86_64/updateinfo > > | 789 kB 00:00:00 > (2/4): extras/7/x86_64/primary_db > > | 188 kB 00:00:00 > (3/4): epel/x86_64/primary_db > > | 4.8 MB 00:00:00 > (4/4): updates/7/x86_64/primary_db > > | 7.7 MB 00:00:01 > Loading mirror speeds from cached hostfile > * base: centos.spd.co.il > * epel: mirror.nonstop.co.il > * extras: centos.spd.co.il > * updates: centos.spd.co.il > No package gpfs.smb available. > Error: Nothing to do > > > [root at LH20-GPFS1 ~]# ls /usr/lpp/mmfs/4.2.2.0/ > gpfs_rpms/ license/ manifest zimon_debs/ zimon_rpms/ > > > something is missing in my machine :) > > > On Wed, Jul 5, 2017 at 3:23 PM, Sobey, Richard A wrote: >> You don't have the gpfs.smb package installed. >> >> >> >> Yum install gpfs.smb >> >> >> >> Or install the package manually from /usr/lpp/mmfs//smb_rpms >> >> >> >> [root at ces ~]# rpm -qa | grep gpfs >> >> gpfs.smb-4.3.11_gpfs_21-8.el7.x86_64 >> >> >> >> >> >> -----Original Message----- >> From: gpfsug-discuss-bounces at spectrumscale.org >> [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Ilan >> Schwarts >> Sent: 05 July 2017 13:19 >> To: gpfsug main discussion list >> Subject: [gpfsug-discuss] Fwd: update smb package ? >> >> >> >> [root at LH20-GPFS1 ~]# rpm -qa | grep gpfs >> >> gpfs.ext-4.2.2-0.x86_64 >> >> gpfs.msg.en_US-4.2.2-0.noarch >> >> gpfs.gui-4.2.2-0.noarch >> >> gpfs.gpl-4.2.2-0.noarch >> >> gpfs.gskit-8.0.50-57.x86_64 >> >> gpfs.gss.pmsensors-4.2.2-0.el7.x86_64 >> >> gpfs.adv-4.2.2-0.x86_64 >> >> gpfs.java-4.2.2-0.x86_64 >> >> gpfs.gss.pmcollector-4.2.2-0.el7.x86_64 >> >> gpfs.base-4.2.2-0.x86_64 >> >> gpfs.crypto-4.2.2-0.x86_64 >> >> [root at LH20-GPFS1 ~]# uname -a >> >> Linux LH20-GPFS1.LH20.com 3.10.0-514.21.2.el7.x86_64 #1 SMP Tue Jun 20 >> >> 12:24:47 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux >> >> [root at LH20-GPFS1 ~]# >> >> _______________________________________________ >> >> gpfsug-discuss mailing list >> >> gpfsug-discuss at spectrumscale.org >> >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >> >> >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at spectrumscale.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >> > > > > -- > > > - > Ilan Schwarts -- - Ilan Schwarts From S.J.Thompson at bham.ac.uk Wed Jul 5 14:40:46 2017 From: S.J.Thompson at bham.ac.uk (Simon Thompson (IT Research Support)) Date: Wed, 5 Jul 2017 13:40:46 +0000 Subject: [gpfsug-discuss] Fwd: update smb package ? In-Reply-To: References: Message-ID: IBM code comes from either IBM Passport Advantage (where you sign in with a corporate account that lists your product associations), or from IBM Fix Central (google it). Fix Central is supposed to be for service updates. Give the lack of experience, you may want to look at the install toolkit which ships with Spectrum Scale. Simon On 05/07/2017, 14:08, "gpfsug-discuss-bounces at spectrumscale.org on behalf of ilan84 at gmail.com" wrote: >Sorry for newbish question, >What do you mean by "from Fix Central", >Do i need to define another repository for the yum ? or download manually >? >its spectrum scale 4.2.2 > >On Wed, Jul 5, 2017 at 3:41 PM, Sobey, Richard A >wrote: >> Ah... yes you need to download the protocols version of gpfs from Fix >>Central. Same GPFS but with the SMB/Object etc packages. >> >> -----Original Message----- >> From: Ilan Schwarts [mailto:ilan84 at gmail.com] >> Sent: 05 July 2017 13:29 >> To: gpfsug main discussion list ; >>Sobey, Richard A >> Subject: Re: [gpfsug-discuss] Fwd: update smb package ? >> >> [root at LH20-GPFS1 ~]# yum install gpfs.smb Loaded plugins: >>fastestmirror, langpacks base >> >> | 3.6 kB 00:00:00 >> epel/x86_64/metalink >> >> | 24 kB 00:00:00 >> epel >> >> | 4.3 kB 00:00:00 >> extras >> >> | 3.4 kB 00:00:00 >> updates >> >> | 3.4 kB 00:00:00 >> (1/4): epel/x86_64/updateinfo >> >> | 789 kB 00:00:00 >> (2/4): extras/7/x86_64/primary_db >> >> | 188 kB 00:00:00 >> (3/4): epel/x86_64/primary_db >> >> | 4.8 MB 00:00:00 >> (4/4): updates/7/x86_64/primary_db >> >> | 7.7 MB 00:00:01 >> Loading mirror speeds from cached hostfile >> * base: centos.spd.co.il >> * epel: mirror.nonstop.co.il >> * extras: centos.spd.co.il >> * updates: centos.spd.co.il >> No package gpfs.smb available. >> Error: Nothing to do >> >> >> [root at LH20-GPFS1 ~]# ls /usr/lpp/mmfs/4.2.2.0/ >> gpfs_rpms/ license/ manifest zimon_debs/ zimon_rpms/ >> >> >> something is missing in my machine :) >> >> >> On Wed, Jul 5, 2017 at 3:23 PM, Sobey, Richard A >> wrote: >>> You don't have the gpfs.smb package installed. >>> >>> >>> >>> Yum install gpfs.smb >>> >>> >>> >>> Or install the package manually from /usr/lpp/mmfs//smb_rpms >>> >>> >>> >>> [root at ces ~]# rpm -qa | grep gpfs >>> >>> gpfs.smb-4.3.11_gpfs_21-8.el7.x86_64 >>> >>> >>> >>> >>> >>> -----Original Message----- >>> From: gpfsug-discuss-bounces at spectrumscale.org >>> [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Ilan >>> Schwarts >>> Sent: 05 July 2017 13:19 >>> To: gpfsug main discussion list >>> Subject: [gpfsug-discuss] Fwd: update smb package ? >>> >>> >>> >>> [root at LH20-GPFS1 ~]# rpm -qa | grep gpfs >>> >>> gpfs.ext-4.2.2-0.x86_64 >>> >>> gpfs.msg.en_US-4.2.2-0.noarch >>> >>> gpfs.gui-4.2.2-0.noarch >>> >>> gpfs.gpl-4.2.2-0.noarch >>> >>> gpfs.gskit-8.0.50-57.x86_64 >>> >>> gpfs.gss.pmsensors-4.2.2-0.el7.x86_64 >>> >>> gpfs.adv-4.2.2-0.x86_64 >>> >>> gpfs.java-4.2.2-0.x86_64 >>> >>> gpfs.gss.pmcollector-4.2.2-0.el7.x86_64 >>> >>> gpfs.base-4.2.2-0.x86_64 >>> >>> gpfs.crypto-4.2.2-0.x86_64 >>> >>> [root at LH20-GPFS1 ~]# uname -a >>> >>> Linux LH20-GPFS1.LH20.com 3.10.0-514.21.2.el7.x86_64 #1 SMP Tue Jun 20 >>> >>> 12:24:47 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux >>> >>> [root at LH20-GPFS1 ~]# >>> >>> _______________________________________________ >>> >>> gpfsug-discuss mailing list >>> >>> gpfsug-discuss at spectrumscale.org >>> >>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >>> >>> >>> _______________________________________________ >>> gpfsug-discuss mailing list >>> gpfsug-discuss at spectrumscale.org >>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >>> >> >> >> >> -- >> >> >> - >> Ilan Schwarts > > > >-- > > >- >Ilan Schwarts >_______________________________________________ >gpfsug-discuss mailing list >gpfsug-discuss at spectrumscale.org >http://gpfsug.org/mailman/listinfo/gpfsug-discuss From hpc-luke at uconn.edu Wed Jul 5 15:52:52 2017 From: hpc-luke at uconn.edu (hpc-luke at uconn.edu) Date: Wed, 05 Jul 2017 10:52:52 -0400 Subject: [gpfsug-discuss] Mass UID migration suggestions Message-ID: <595cfd44.kc2G2OUXdgiX+srO%hpc-luke@uconn.edu> Thank you both, I was already using the c++ stl hash map to do the mapping of uid_t to uid_t, but I will use that example to learn how to use the proper gpfs apis. And thank you for the ACL suggestion, as that is likely the best way to handle certain users who are logged in/running jobs constantly, where we would not like to force them to logout. And thank you for the reminder to re-run backups. Thank you for your time, Luke Storrs-HPC University of Connecticut From mweil at wustl.edu Wed Jul 5 16:51:50 2017 From: mweil at wustl.edu (Matt Weil) Date: Wed, 5 Jul 2017 10:51:50 -0500 Subject: [gpfsug-discuss] pmcollector node Message-ID: <390fca36-e49a-7255-1ecf-a467a6ee92a5@wustl.edu> Hello all, Question on the requirements on pmcollector node/s for a 500+ node cluster. Is there a sizing guide? What specifics should we scale? CPU Disks memory? Thanks Matt From kkr at lbl.gov Wed Jul 5 17:23:38 2017 From: kkr at lbl.gov (Kristy Kallback-Rose) Date: Wed, 5 Jul 2017 09:23:38 -0700 Subject: [gpfsug-discuss] Zimon checks, ability to use a subset of checks Message-ID: As I understand it, there is currently no way to collect just a subset of stats in a category. For example, CPU stats are: cpu_contexts cpu_guest cpu_guest_nice cpu_hiq cpu_idle cpu_interrupts cpu_iowait cpu_nice cpu_siq cpu_steal cpu_system cpu_user but I'm only interested in tracking a subset. The config file seems to want the category "CPU" which seems like an all-or-nothing approach. I am thinking about submitting an RFE to request the ability to pick and choose checks within a category, but thought I'd float the idea here before I submit. Would others find value in this? Thanks, Kristy -------------- next part -------------- An HTML attachment was scrubbed... URL: From Robert.Oesterlin at nuance.com Wed Jul 5 18:00:44 2017 From: Robert.Oesterlin at nuance.com (Oesterlin, Robert) Date: Wed, 5 Jul 2017 17:00:44 +0000 Subject: [gpfsug-discuss] Zimon checks, ability to use a subset of checks Message-ID: <11A5144D-A5AF-4829-B7D4-4313F357C6CB@nuance.com> Count me in! Bob Oesterlin Sr Principal Storage Engineer, Nuance From: on behalf of Kristy Kallback-Rose Reply-To: gpfsug main discussion list Date: Wednesday, July 5, 2017 at 11:23 AM To: gpfsug main discussion list Subject: [EXTERNAL] [gpfsug-discuss] Zimon checks, ability to use a subset of checks I am thinking about submitting an RFE to request the ability to pick and choose checks within a category, but thought I'd float the idea here before I submit. Would others find value in this? -------------- next part -------------- An HTML attachment was scrubbed... URL: From kkr at lbl.gov Wed Jul 5 19:22:14 2017 From: kkr at lbl.gov (Kristy Kallback-Rose) Date: Wed, 5 Jul 2017 11:22:14 -0700 Subject: [gpfsug-discuss] Meaning of API Stats Category In-Reply-To: References: Message-ID: Thank you Eric. That did help. On Mon, Jun 12, 2017 at 2:01 PM, IBM Spectrum Scale wrote: > Hello Kristy, > > The GPFSFileSystemAPI and GPFSNodeAPI sensor metrics are from the point of > view of "applications" in the sense that they provide stats about I/O > requests made to files in GPFS file systems from user level applications > using POSIX interfaces like open(), close(), read(), write(), etc. > > This is in contrast to similarly named sensors without the "API" suffix, > like GPFSFilesystem and GPFSNode. Those sensors provide stats about I/O > requests made by the GPFS code to NSDs (disks) making up GPFS file systems. > > The relationship between application I/O and disk I/O might or might not > be obvious. Consider some examples. An application that starts > sequentially reading a file might, at least initially, cause more disk I/O > than expected because GPFS has decided to prefetch data. An application > write() might not immediately cause a the writing of disk blocks due to the > operation of the pagepool. Ultimately, application write()s might cause > twice as much data written to disk due to the replication factor of the > file system. Application I/O concerns itself with user data; disk I/O > might have to occur to handle the user data and associated file system > metadata (like inodes and indirect blocks). > > The difference between GPFSFileSystemAPI and GPFSNodeAPI: > GPFSFileSystemAPI reports stats for application I/O per filesystem per > node; GPFSNodeAPI reports application I/O stats per node. Similarly, > GPFSFilesystem reports stats for disk I/O per filesystem per node; GPFSNode > reports disk I/O stats per node. > > I hope this helps. > Eric Agar > > > Regards, The Spectrum Scale (GPFS) team > > ------------------------------------------------------------ > ------------------------------------------------------ > If you feel that your question can benefit other users of Spectrum Scale > (GPFS), then please post it to the public IBM developerWroks Forum at > https://www.ibm.com/developerworks/community/ > forums/html/forum?id=11111111-0000-0000-0000-000000000479. > > If your query concerns a potential software error in Spectrum Scale (GPFS) > and you have an IBM software maintenance contract please contact > 1-800-237-5511 <(800)%20237-5511> in the United States or your local IBM > Service Center in other countries. > > The forum is informally monitored as time permits and should not be used > for priority messages to the Spectrum Scale (GPFS) team. > > > > From: "Oesterlin, Robert" > To: gpfsug main discussion list > Date: 06/12/2017 04:43 PM > Subject: Re: [gpfsug-discuss] Meaning of API Stats Category > Sent by: gpfsug-discuss-bounces at spectrumscale.org > ------------------------------ > > > > Hi Kristy > > What I *think* the difference is: > > gpfs_fis: - calls to the GPFS file system interface > gpfs_fs: calls from the node that actually make it to the NSD > server/metadata > > The difference being what?s served out of the local node pagepool. > > Bob Oesterlin > Sr Principal Storage Engineer, Nuance > > > *From: * on behalf of Kristy > Kallback-Rose > * Reply-To: *gpfsug main discussion list > > * Date: *Monday, June 12, 2017 at 3:17 PM > * To: *gpfsug main discussion list > * Subject: *[EXTERNAL] [gpfsug-discuss] Meaning of API Stats Category > > Hi, > > Can anyone provide more detail about what is meant by the following two > categories of stats? The PDG has a limited description as far as I could > see. I'm not sure what is meant by Application PoV. Would the Grafana > bridge count as an "application"? > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From sfadden at us.ibm.com Wed Jul 5 19:50:24 2017 From: sfadden at us.ibm.com (Scott Fadden) Date: Wed, 5 Jul 2017 18:50:24 +0000 Subject: [gpfsug-discuss] Zimon checks, ability to use a subset of checks In-Reply-To: <11A5144D-A5AF-4829-B7D4-4313F357C6CB@nuance.com> Message-ID: What do You mean by category? Node class, metric type or something else? On Jul 5, 2017, 10:01:33 AM, Robert.Oesterlin at nuance.com wrote: From: Robert.Oesterlin at nuance.com To: gpfsug-discuss at spectrumscale.org Cc: Date: Jul 5, 2017 10:01:33 AM Subject: Re: [gpfsug-discuss] Zimon checks, ability to use a subset of checks Count me in! Bob Oesterlin Sr Principal Storage Engineer, Nuance From: on behalf of Kristy Kallback-Rose Reply-To: gpfsug main discussion list Date: Wednesday, July 5, 2017 at 11:23 AM To: gpfsug main discussion list Subject: [EXTERNAL] [gpfsug-discuss] Zimon checks, ability to use a subset of checks I am thinking about submitting an RFE to request the ability to pick and choose checks within a category, but thought I'd float the idea here before I submit. Would others find value in this? -------------- next part -------------- An HTML attachment was scrubbed... URL: From sfadden at us.ibm.com Wed Jul 5 19:51:46 2017 From: sfadden at us.ibm.com (Scott Fadden) Date: Wed, 5 Jul 2017 18:51:46 +0000 Subject: [gpfsug-discuss] Zimon checks, ability to use a subset of checks In-Reply-To: Message-ID: Never mind just saw your earlier email On Jul 5, 2017, 11:50:24 AM, sfadden at us.ibm.com wrote: From: sfadden at us.ibm.com To: gpfsug-discuss at spectrumscale.org Cc: gpfsug-discuss at spectrumscale.org Date: Jul 5, 2017 11:50:24 AM Subject: Re: [gpfsug-discuss] Zimon checks, ability to use a subset of checks What do You mean by category? Node class, metric type or something else? On Jul 5, 2017, 10:01:33 AM, Robert.Oesterlin at nuance.com wrote: From: Robert.Oesterlin at nuance.com To: gpfsug-discuss at spectrumscale.org Cc: Date: Jul 5, 2017 10:01:33 AM Subject: Re: [gpfsug-discuss] Zimon checks, ability to use a subset of checks Count me in! Bob Oesterlin Sr Principal Storage Engineer, Nuance From: on behalf of Kristy Kallback-Rose Reply-To: gpfsug main discussion list Date: Wednesday, July 5, 2017 at 11:23 AM To: gpfsug main discussion list Subject: [EXTERNAL] [gpfsug-discuss] Zimon checks, ability to use a subset of checks I am thinking about submitting an RFE to request the ability to pick and choose checks within a category, but thought I'd float the idea here before I submit. Would others find value in this? -------------- next part -------------- An HTML attachment was scrubbed... URL: From scale at us.ibm.com Thu Jul 6 06:37:33 2017 From: scale at us.ibm.com (IBM Spectrum Scale) Date: Thu, 6 Jul 2017 11:07:33 +0530 Subject: [gpfsug-discuss] pmcollector node In-Reply-To: <390fca36-e49a-7255-1ecf-a467a6ee92a5@wustl.edu> References: <390fca36-e49a-7255-1ecf-a467a6ee92a5@wustl.edu> Message-ID: Hi Anna, Can you please check if you can answer this. Or else let me know who to contact for this. Regards, The Spectrum Scale (GPFS) team ------------------------------------------------------------------------------------------------------------------ If you feel that your question can benefit other users of Spectrum Scale (GPFS), then please post it to the public IBM developerWroks Forum at https://www.ibm.com/developerworks/community/forums/html/forum?id=11111111-0000-0000-0000-000000000479 . If your query concerns a potential software error in Spectrum Scale (GPFS) and you have an IBM software maintenance contract please contact 1-800-237-5511 in the United States or your local IBM Service Center in other countries. The forum is informally monitored as time permits and should not be used for priority messages to the Spectrum Scale (GPFS) team. From: Matt Weil To: gpfsug-discuss at spectrumscale.org Date: 07/05/2017 09:22 PM Subject: [gpfsug-discuss] pmcollector node Sent by: gpfsug-discuss-bounces at spectrumscale.org Hello all, Question on the requirements on pmcollector node/s for a 500+ node cluster. Is there a sizing guide? What specifics should we scale? CPU Disks memory? Thanks Matt _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From Wei1.Guo at UTSouthwestern.edu Thu Jul 6 18:49:32 2017 From: Wei1.Guo at UTSouthwestern.edu (Wei Guo) Date: Thu, 6 Jul 2017 17:49:32 +0000 Subject: [gpfsug-discuss] depmod: ERROR: fstatat(4, mmfs26.ko): No such file or directory Message-ID: <7304eaf93aa74265ae45a214288dfe4c@SWMS13MAIL10.swmed.org> Hi, All, We are testing to upgrade our clients to new RHEL 7.3 kernel with GPFS 4.2.1.0. When we have 3.10.0-514.26.2.el7, installing the gplbin has the following errors: # ./mmbuildgpl --build-package -v # cd /root/rpmbuild/RPMS/x86_64/ # rpm -ivh gpfs.gplbin-3.10.0-514.26.2.el7.x86_64-4.2.1-0.x86_64.rpm Running transaction Installing : gpfs.gplbin-3.10.0-514.26.2.el7.x86_64-4.2.1-0.x86_64 1/1 depmod: ERROR: fstatat(4, mmfs26.ko): No such file or directory depmod: ERROR: fstatat(4, mmfslinux.ko): No such file or directory depmod: ERROR: fstatat(4, tracedev.ko): No such file or directory depmod -a also show the three kernel extension not found. However, in the following directory, they are there. # pwd /lib/modules/3.10.0-514.26.2.el7.x86_64/extra # ls kernel mmfs26.ko mmfslinux.ko tracedev.ko The error does not show in a slightly older kernel -3.10.0-514.21.2 version. From https://www.ibm.com/support/knowledgecenter/en/STXKQY/gpfsclustersfaq.html Table 29, both versions should be supported. RHEL Distribution Latest Kernel Level Tested1 Minimum Kernel Level Required2 Minimum IBM Spectrum Scale Level Tested3 Minimum IBM Spectrum Scale Level Supported4 7.3 3.10.0-514 3.10.0-514 V4.1.1.11/V4.2.2.1 V4.1.1.11/V4.2.1.2 For technical reasons, this test node will not be added to production. A previous thread http://www.spectrumscale.org/pipermail/gpfsug-discuss/2016-April/001529.html indicated that this will be OK. However, it is better to get a clear conclusion before we update other client nodes. Shall we recompile the kernel? Thanks all. Wei Guo ________________________________ UT Southwestern Medical Center The future of medicine, today. -------------- next part -------------- An HTML attachment was scrubbed... URL: From S.J.Thompson at bham.ac.uk Thu Jul 6 18:52:44 2017 From: S.J.Thompson at bham.ac.uk (Simon Thompson (IT Research Support)) Date: Thu, 6 Jul 2017 17:52:44 +0000 Subject: [gpfsug-discuss] depmod: ERROR: fstatat(4, mmfs26.ko): No such file or directory In-Reply-To: <7304eaf93aa74265ae45a214288dfe4c@SWMS13MAIL10.swmed.org> References: <7304eaf93aa74265ae45a214288dfe4c@SWMS13MAIL10.swmed.org> Message-ID: Look in the kernel weak-updates directory, you will probably find some broken files in there. These come from things trying to update the kernel modules when you do the kernel upgrade. Just delete the three gpfs related ones and run depmod The safest way is to remove the gpfs.gplbin packages, then upgrade the kernel, reboot and add the new gpfs.gplbin packages for the new kernel. Simon ________________________________________ From: gpfsug-discuss-bounces at spectrumscale.org [gpfsug-discuss-bounces at spectrumscale.org] on behalf of Wei Guo [Wei1.Guo at UTSouthwestern.edu] Sent: 06 July 2017 18:49 To: gpfsug-discuss at spectrumscale.org Subject: [gpfsug-discuss] depmod: ERROR: fstatat(4, mmfs26.ko): No such file or directory Hi, All, We are testing to upgrade our clients to new RHEL 7.3 kernel with GPFS 4.2.1.0. When we have 3.10.0-514.26.2.el7, installing the gplbin has the following errors: # ./mmbuildgpl --build-package ?v # cd /root/rpmbuild/RPMS/x86_64/ # rpm -ivh gpfs.gplbin-3.10.0-514.26.2.el7.x86_64-4.2.1-0.x86_64.rpm Running transaction Installing : gpfs.gplbin-3.10.0-514.26.2.el7.x86_64-4.2.1-0.x86_64 1/1 depmod: ERROR: fstatat(4, mmfs26.ko): No such file or directory depmod: ERROR: fstatat(4, mmfslinux.ko): No such file or directory depmod: ERROR: fstatat(4, tracedev.ko): No such file or directory depmod -a also show the three kernel extension not found. However, in the following directory, they are there. # pwd /lib/modules/3.10.0-514.26.2.el7.x86_64/extra # ls kernel mmfs26.ko mmfslinux.ko tracedev.ko The error does not show in a slightly older kernel -3.10.0-514.21.2 version. From https://www.ibm.com/support/knowledgecenter/en/STXKQY/gpfsclustersfaq.html Table 29, both versions should be supported. RHEL Distribution Latest Kernel Level Tested1 Minimum Kernel Level Required2 Minimum IBM Spectrum Scale Level Tested3 Minimum IBM Spectrum Scale Level Supported4 7.3 3.10.0-514 3.10.0-514 V4.1.1.11/V4.2.2.1 V4.1.1.11/V4.2.1.2 For technical reasons, this test node will not be added to production. A previous thread http://www.spectrumscale.org/pipermail/gpfsug-discuss/2016-April/001529.html indicated that this will be OK. However, it is better to get a clear conclusion before we update other client nodes. Shall we recompile the kernel? Thanks all. Wei Guo ________________________________ UT Southwestern Medical Center The future of medicine, today. From abeattie at au1.ibm.com Thu Jul 6 06:07:07 2017 From: abeattie at au1.ibm.com (Andrew Beattie) Date: Thu, 6 Jul 2017 05:07:07 +0000 Subject: [gpfsug-discuss] Scale / Perfmon / Grafana - services running but no data Message-ID: An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: Image.14992893800360.png Type: image/png Size: 431718 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: Image.14992893800362.png Type: image/png Size: 1001127 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: Image.14993172756190.png Type: image/png Size: 381651 bytes Desc: not available URL: From neil.wilson at metoffice.gov.uk Fri Jul 7 10:18:40 2017 From: neil.wilson at metoffice.gov.uk (Wilson, Neil) Date: Fri, 7 Jul 2017 09:18:40 +0000 Subject: [gpfsug-discuss] Scale / Perfmon / Grafana - services running but no data In-Reply-To: References: Message-ID: Hi Andrew, Have you created new dashboards for GPFS? This shows you how to do it https://www.ibm.com/developerworks/community/wikis/home?lang=en#!/wiki/General%20Parallel%20File%20System%20(GPFS)/page/Creating%20Grafana%20dashboard Alternatively there are some predefined dashboards here https://www.ibm.com/developerworks/community/wikis/home?lang=en#!/wiki/General%20Parallel%20File%20System%20%28GPFS%29/page/Importing%20predefined%20Grafana%20dashboards that you can import and have a play around with? Regards Neil Neil Wilson Senior IT Practitioner Storage Team IT Services Met Office FitzRoy Road Exeter Devon EX1 3PB United Kingdom Tel: +44 (0)1392 885959 Email: neil.wilson at metoffice.gov.uk Website www.metoffice.gov.uk Our magazine Barometer is now available online at http://www.metoffice.gov.uk/barometer/ P Please consider the environment before printing this e-mail. Thank you. From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Andrew Beattie Sent: 06 July 2017 06:07 To: gpfsug-discuss at spectrumscale.org Subject: [gpfsug-discuss] Scale / Perfmon / Grafana - services running but no data Greetings, I'm currently setting up Grafana to interact with one of our Scale Clusters and i've followed the knowledge centre link in terms of setup. https://www.ibm.com/support/knowledgecenter/en/STXKQY_4.2.3/com.ibm.spectrum.scale.v4r23.doc/bl1adv_setuppmbridgeforgrafana.htm However while everything appears to be working i'm not seeing any data coming through the reports within the grafana server, even though I can see data in the Scale GUI The current environment: [root at sc01n02 ~]# mmlscluster GPFS cluster information ======================== GPFS cluster name: sc01.spectrum GPFS cluster id: 18085710661892594990 GPFS UID domain: sc01.spectrum Remote shell command: /usr/bin/ssh Remote file copy command: /usr/bin/scp Repository type: CCR Node Daemon node name IP address Admin node name Designation ------------------------------------------------------------------ 1 sc01n01 10.2.12.11 sc01n01 quorum-manager-perfmon 2 sc01n02 10.2.12.12 sc01n02 quorum-manager-perfmon 3 sc01n03 10.2.12.13 sc01n03 quorum-manager-perfmon [root at sc01n02 ~]# [root at sc01n02 ~]# mmlsconfig Configuration data for cluster sc01.spectrum: --------------------------------------------- clusterName sc01.spectrum clusterId 18085710661892594990 autoload yes profile gpfsProtocolDefaults dmapiFileHandleSize 32 minReleaseLevel 4.2.2.0 ccrEnabled yes cipherList AUTHONLY maxblocksize 16M [cesNodes] maxMBpS 5000 numaMemoryInterleave yes enforceFilesetQuotaOnRoot yes workerThreads 512 [common] tscCmdPortRange 60000-61000 cesSharedRoot /ibm/cesSharedRoot/ces cifsBypassTraversalChecking yes syncSambaMetadataOps yes cifsBypassShareLocksOnRename yes adminMode central File systems in cluster sc01.spectrum: -------------------------------------- /dev/cesSharedRoot /dev/icos_demo /dev/scale01 [root at sc01n02 ~]# [root at sc01n02 ~]# systemctl status pmcollector ? pmcollector.service - LSB: Start the ZIMon performance monitor collector. Loaded: loaded (/etc/rc.d/init.d/pmcollector) Active: active (running) since Tue 2017-05-30 08:46:32 AEST; 1 months 6 days ago Docs: man:systemd-sysv-generator(8) Main PID: 2693 (ZIMonCollector) CGroup: /system.slice/pmcollector.service ??2693 /opt/IBM/zimon/ZIMonCollector -C /opt/IBM/zimon/ZIMonCollector.cfg... ??2698 python /opt/IBM/zimon/bin/pmmonitor.py -f /opt/IBM/zimon/syshealth... May 30 08:46:32 sc01n02 systemd[1]: Starting LSB: Start the ZIMon performance mon...... May 30 08:46:32 sc01n02 pmcollector[2584]: Starting performance monitor collector... May 30 08:46:32 sc01n02 systemd[1]: Started LSB: Start the ZIMon performance moni...r.. Hint: Some lines were ellipsized, use -l to show in full. From Grafana Server: [cid:image002.jpg at 01D2F70A.17F595F0] when I send a set of files to the cluster (3.8GB) I can see performance metrics within the Scale GUI [cid:image004.jpg at 01D2F70A.17F595F0] yet from the Grafana Dashboard im not seeing any data points [cid:image006.jpg at 01D2F70A.17F595F0] Can anyone provide some hints as to what might be happening? Regards, Andrew Beattie Software Defined Storage - IT Specialist Phone: 614-2133-7927 E-mail: abeattie at au1.ibm.com -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image002.jpg Type: image/jpeg Size: 14522 bytes Desc: image002.jpg URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image004.jpg Type: image/jpeg Size: 60060 bytes Desc: image004.jpg URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image006.jpg Type: image/jpeg Size: 25781 bytes Desc: image006.jpg URL: From olaf.weiser at de.ibm.com Fri Jul 7 10:18:13 2017 From: olaf.weiser at de.ibm.com (Olaf Weiser) Date: Fri, 7 Jul 2017 09:18:13 +0000 Subject: [gpfsug-discuss] Scale / Perfmon / Grafana - services running but no data In-Reply-To: References: Message-ID: An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/png Size: 431718 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/png Size: 1001127 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/png Size: 381651 bytes Desc: not available URL: From r.sobey at imperial.ac.uk Fri Jul 7 13:01:39 2017 From: r.sobey at imperial.ac.uk (Sobey, Richard A) Date: Fri, 7 Jul 2017 12:01:39 +0000 Subject: [gpfsug-discuss] Requirement to keep gpfs.smb the same on all nodes In-Reply-To: References: Message-ID: Just following up on this, has anyone successfully deployed Protocols (SMB) on RHEL 7.3 with the 4.2.3-2 packages? Richard From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Sobey, Richard A Sent: 04 July 2017 12:12 To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] Requirement to keep gpfs.smb the same on all nodes OK Simon, thanks. I suppose we're all in the same boat having to get change management approval etc! From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Simon Thompson (IT Research Support) Sent: 04 July 2017 12:09 To: gpfsug main discussion list > Subject: Re: [gpfsug-discuss] Requirement to keep gpfs.smb the same on all nodes AFAIK. Always. We have had the service eat itself BTW by having different code releases and trying this. Yes its a PITA that we have to get a change approval for it (so we don't do it as often as we should)... The upgrade process upgrades the SMB registry, we have also seen the CTDB lock stuff break when they are not running the same code release, so now we just don't do this. Simon From: > on behalf of "Sobey, Richard A" > Reply-To: "gpfsug-discuss at spectrumscale.org" > Date: Tuesday, 4 July 2017 at 11:54 To: "gpfsug-discuss at spectrumscale.org" > Subject: [gpfsug-discuss] Requirement to keep gpfs.smb the same on all nodes Hi all, For how long has this requirement been in force, and why? https://www.ibm.com/support/knowledgecenter/en/STXKQY_4.2.3/com.ibm.spectrum.scale.v4r23.doc/bl1ins_updatingsmb.htm All protocol nodes running the SMB service must have the same version of gpfs.smb installed at any time. This requires a brief outage of the SMB service to upgrade gpfs.smb to the newer version across all protocol nodes. The procedure outlined here is intended to reduce the outage to a minimum. Previously I've upgraded nodes one at a time over the course of a few days. Is the impact just that we won't be supported, or will a hole open up beneath my feet and swallow me whole? I really don't fancy the headache of getting approvals to get an outage of even 5 minutes at 6am.... Cheers Richard -------------- next part -------------- An HTML attachment was scrubbed... URL: From kkr at lbl.gov Fri Jul 7 23:32:40 2017 From: kkr at lbl.gov (Kristy Kallback-Rose) Date: Fri, 7 Jul 2017 15:32:40 -0700 Subject: [gpfsug-discuss] Hold the Date - Spectrum Scale Day @ HPCXXL (Sept 2017, NYC) Message-ID: Hello, More details will be provided as they become available, but just so you can make a placeholder on your calendar, there will be a Spectrum Scale Day the week of September 25th - 29th, likely Thursday, September 28th. This will be a part of the larger HPCXXL meeting (https://www.spxxl.org/?q=New-York-City-2017 ). You may recall this group was formerly called SPXXL and the website is in the process of transitioning to the new name (and getting a new certificate). You will be able to attend *just* the Spectrum Scale day if that is the only portion of the event you would like to attend. The NYC location is great for Spectrum Scale events because many IBMers, including developers, can come in from Poughkeepsie. More as we get closer to the date and details are settled. Cheers, Kristy -------------- next part -------------- An HTML attachment was scrubbed... URL: From a.khiredine at meteo.dz Sun Jul 9 08:26:44 2017 From: a.khiredine at meteo.dz (Atmane) Date: Sun, 9 Jul 2017 08:26:44 +0100 Subject: [gpfsug-discuss] GPFS Storage Server (GSS) Message-ID: From a.khiredine at meteo.dz Sun Jul 9 09:00:07 2017 From: a.khiredine at meteo.dz (Atmane) Date: Sun, 9 Jul 2017 09:00:07 +0100 Subject: [gpfsug-discuss] get free space in GSS Message-ID: Dear all, My name is Khiredine Atmane and I am a HPC system administrator at the National Office of Meteorology Algeria . We have a GSS24 running gss2.5.10.3-3b and gpfs-4.2.0.3. GSS configuration: 4 enclosures, 6 SSDs, 1 empty slots, 239 disks total, 0 NVRAM partitions disks = 3Tb SSD = 200 Gb df -h Filesystem Size Used Avail Use% Mounted on /dev/gpfs1 49T 18T 31T 38% /gpfs1 /dev/gpfs2 53T 13T 40T 25% /gpfs2 /dev/gpfs3 25T 4.9T 20T 21% /gpfs3 /dev/gpfs4 11T 133M 11T 1% /gpfs4 /dev/gpfs5 323T 34T 290T 11% /gpfs5 Total Is 461 To I think we have more space Could anyone make recommendation to troubleshoot find how many free space in GSS ? How to find the available space ? Thank you! Atmane -- Atmane Khiredine HPC System Admin | Office National de la M?t?orologie T?l : +213 21 50 73 93 Poste 303 | Fax : +213 21 50 79 40 | E-mail : a.khiredine at meteo.dz From laurence at qsplace.co.uk Sun Jul 9 09:58:05 2017 From: laurence at qsplace.co.uk (Laurence Horrocks-Barlow) Date: Sun, 09 Jul 2017 09:58:05 +0100 Subject: [gpfsug-discuss] get free space in GSS In-Reply-To: References: Message-ID: You can check the recovery groups to see if there is any remaining space. I don't have access to my test system to confirm the syntax however if memory serves. Run mmlsrecoverygroup to get a list of all the recovery groups then: mmlsrecoverygroup -L This will list all your declustered arrays and their free space. Their might be another method, however this way has always worked well for me. -- Lauz On 9 July 2017 09:00:07 BST, Atmane wrote: >Dear all, > >My name is Khiredine Atmane and I am a HPC system administrator at the > >National Office of Meteorology Algeria . We have a GSS24 running >gss2.5.10.3-3b and gpfs-4.2.0.3. > >GSS configuration: 4 enclosures, 6 SSDs, 1 empty slots, 239 disks >total, 0 >NVRAM partitions > >disks = 3Tb >SSD = 200 Gb >df -h >Filesystem Size Used Avail Use% Mounted on > >/dev/gpfs1 49T 18T 31T 38% /gpfs1 >/dev/gpfs2 53T 13T 40T 25% /gpfs2 >/dev/gpfs3 25T 4.9T 20T 21% /gpfs3 >/dev/gpfs4 11T 133M 11T 1% /gpfs4 >/dev/gpfs5 323T 34T 290T 11% /gpfs5 > >Total Is 461 To > >I think we have more space >Could anyone make recommendation to troubleshoot find how many free >space >in GSS ? >How to find the available space ? >Thank you! > >Atmane > > > >-- >Atmane Khiredine >HPC System Admin | Office National de la M?t?orologie >T?l : +213 21 50 73 93 Poste 303 | Fax : +213 21 50 79 40 | E-mail : >a.khiredine at meteo.dz >_______________________________________________ >gpfsug-discuss mailing list >gpfsug-discuss at spectrumscale.org >http://gpfsug.org/mailman/listinfo/gpfsug-discuss -- Sent from my Android device with K-9 Mail. Please excuse my brevity. -------------- next part -------------- An HTML attachment was scrubbed... URL: From a.khiredine at meteo.dz Sun Jul 9 13:26:26 2017 From: a.khiredine at meteo.dz (atmane khiredine) Date: Sun, 9 Jul 2017 12:26:26 +0000 Subject: [gpfsug-discuss] get free space in GSS In-Reply-To: References: , Message-ID: <4B32CB5C696F2849BDEF7DF9EACE884B5B41A465@SDEB-EXC02.meteo.dz> thank you very much for replying. I can not find the free space Here is the output of mmlsrecoverygroup [root at server1 ~]#mmlsrecoverygroup declustered arrays with recovery group vdisks vdisks servers ------------------ ----------- ------ ------- BB1RGL 3 18 server1,server2 BB1RGR 3 18 server2,server1 -------------------------------------------------------------- [root at server ~]# mmlsrecoverygroup BB1RGL -L declustered recovery group arrays vdisks pdisks format version ----------------- ----------- ------ ------ -------------- BB1RGL 3 18 119 4.2.0.1 declustered needs replace scrub background activity array service vdisks pdisks spares threshold free space duration task progress priority ----------- ------- ------ ------ ------ --------- ---------- -------- ------------------------- LOG no 1 3 0,0 1 558 GiB 14 days scrub 51% low DA1 no 11 58 2,31 2 12 GiB 14 days scrub 78% low DA2 no 6 58 2,31 2 4096 MiB 14 days scrub 10% low declustered checksum vdisk RAID code array vdisk size block size granularity state remarks ------------------ ------------------ ----------- ---------- ---------- ----------- ----- ------- gss0_logtip 3WayReplication LOG 128 MiB 1 MiB 512 ok logTip gss0_loghome 4WayReplication DA1 40 GiB 1 MiB 512 ok log BB1RGL_GPFS4_META1 4WayReplication DA1 451 GiB 1 MiB 32 KiB ok BB1RGL_GPFS4_DATA1 8+2p DA1 5133 GiB 1 MiB 32 KiB ok BB1RGL_GPFS1_META1 4WayReplication DA1 451 GiB 1 MiB 32 KiB ok BB1RGL_GPFS1_DATA1 8+2p DA1 12 TiB 1 MiB 32 KiB ok BB1RGL_GPFS3_META1 4WayReplication DA1 451 GiB 1 MiB 32 KiB ok BB1RGL_GPFS3_DATA1 8+2p DA1 12 TiB 1 MiB 32 KiB ok BB1RGL_GPFS2_META1 4WayReplication DA1 451 GiB 1 MiB 32 KiB ok BB1RGL_GPFS2_DATA1 8+2p DA1 13 TiB 2 MiB 32 KiB ok BB1RGL_GPFS2_META2 4WayReplication DA2 451 GiB 1 MiB 32 KiB ok BB1RGL_GPFS2_DATA2 8+2p DA2 13 TiB 2 MiB 32 KiB ok BB1RGL_GPFS1_META2 4WayReplication DA2 451 GiB 1 MiB 32 KiB ok BB1RGL_GPFS1_DATA2 8+2p DA2 12 TiB 1 MiB 32 KiB ok BB1RGL_GPFS5_META1 4WayReplication DA1 750 GiB 1 MiB 32 KiB ok BB1RGL_GPFS5_DATA1 8+2p DA1 70 TiB 16 MiB 32 KiB ok BB1RGL_GPFS5_META2 4WayReplication DA2 750 GiB 1 MiB 32 KiB ok BB1RGL_GPFS5_DATA2 8+2p DA2 90 TiB 16 MiB 32 KiB ok config data declustered array VCD spares actual rebuild spare space remarks ------------------ ------------------ ------------- --------------------------------- ---------------- rebuild space DA1 31 34 pdisk rebuild space DA2 31 35 pdisk config data max disk group fault tolerance actual disk group fault tolerance remarks ------------------ --------------------------------- --------------------------------- ---------------- rg descriptor 1 enclosure + 1 drawer 1 enclosure + 1 drawer limiting fault tolerance system index 2 enclosure 1 enclosure + 1 drawer limited by rg descriptor vdisk max disk group fault tolerance actual disk group fault tolerance remarks ------------------ --------------------------------- --------------------------------- ---------------- gss0_logtip 2 enclosure 1 enclosure + 1 drawer limited by rg descriptor gss0_loghome 1 enclosure + 1 drawer 1 enclosure + 1 drawer BB1RGL_GPFS4_META1 1 enclosure + 1 drawer 1 enclosure + 1 drawer BB1RGL_GPFS4_DATA1 2 drawer 2 drawer BB1RGL_GPFS1_META1 1 enclosure + 1 drawer 1 enclosure + 1 drawer BB1RGL_GPFS1_DATA1 2 drawer 2 drawer BB1RGL_GPFS3_META1 1 enclosure + 1 drawer 1 enclosure + 1 drawer BB1RGL_GPFS3_DATA1 2 drawer 2 drawer BB1RGL_GPFS2_META1 1 enclosure + 1 drawer 1 enclosure + 1 drawer BB1RGL_GPFS2_DATA1 2 drawer 2 drawer BB1RGL_GPFS2_META2 3 enclosure 1 enclosure + 1 drawer limited by rg descriptor BB1RGL_GPFS2_DATA2 2 drawer 2 drawer BB1RGL_GPFS1_META2 3 enclosure 1 enclosure + 1 drawer limited by rg descriptor BB1RGL_GPFS1_DATA2 2 drawer 2 drawer BB1RGL_GPFS5_META1 1 enclosure + 1 drawer 1 enclosure + 1 drawer BB1RGL_GPFS5_DATA1 2 drawer 2 drawer BB1RGL_GPFS5_META2 3 enclosure 1 enclosure + 1 drawer limited by rg descriptor BB1RGL_GPFS5_DATA2 2 drawer 2 drawer active recovery group server servers ----------------------------------------------- ------- server1 server1,server2 Atmane Khiredine HPC System Administrator | Office National de la M?t?orologie T?l : +213 21 50 73 93 # 303 | Fax : +213 21 50 79 40 | E-mail : a.khiredine at meteo.dz ________________________________ De : Laurence Horrocks-Barlow [laurence at qsplace.co.uk] Envoy? : dimanche 9 juillet 2017 09:58 ? : gpfsug main discussion list; atmane khiredine; gpfsug-discuss at spectrumscale.org Objet : Re: [gpfsug-discuss] get free space in GSS You can check the recovery groups to see if there is any remaining space. I don't have access to my test system to confirm the syntax however if memory serves. Run mmlsrecoverygroup to get a list of all the recovery groups then: mmlsrecoverygroup -L This will list all your declustered arrays and their free space. Their might be another method, however this way has always worked well for me. -- Lauz On 9 July 2017 09:00:07 BST, Atmane wrote: Dear all, My name is Khiredine Atmane and I am a HPC system administrator at the National Office of Meteorology Algeria . We have a GSS24 running gss2.5.10.3-3b and gpfs-4.2.0.3. GSS configuration: 4 enclosures, 6 SSDs, 1 empty slots, 239 disks total, 0 NVRAM partitions disks = 3Tb SSD = 200 Gb df -h Filesystem Size Used Avail Use% Mounted on /dev/gpfs1 49T 18T 31T 38% /gpfs1 /dev/gpfs2 53T 13T 40T 25% /gpfs2 /dev/gpfs3 25T 4.9T 20T 21% /gpfs3 /dev/gpfs4 11T 133M 11T 1% /gpfs4 /dev/gpfs5 323T 34T 290T 11% /gpfs5 Total Is 461 To I think we have more space Could anyone make recommendation to troubleshoot find how many free space in GSS ? How to find the available space ? Thank you! Atmane -- Sent from my Android device with K-9 Mail. Please excuse my brevity. From janfrode at tanso.net Sun Jul 9 17:45:32 2017 From: janfrode at tanso.net (Jan-Frode Myklebust) Date: Sun, 09 Jul 2017 16:45:32 +0000 Subject: [gpfsug-discuss] get free space in GSS In-Reply-To: <4B32CB5C696F2849BDEF7DF9EACE884B5B41A465@SDEB-EXC02.meteo.dz> References: <4B32CB5C696F2849BDEF7DF9EACE884B5B41A465@SDEB-EXC02.meteo.dz> Message-ID: You had it here: [root at server ~]# mmlsrecoverygroup BB1RGL -L declustered recovery group arrays vdisks pdisks format version ----------------- ----------- ------ ------ -------------- BB1RGL 3 18 119 4.2.0.1 declustered needs replace scrub background activity array service vdisks pdisks spares threshold free space duration task progress priority ----------- ------- ------ ------ ------ --------- ---------- -------- ------------------------- LOG no 1 3 0,0 1 558 GiB 14 days scrub 51% low DA1 no 11 58 2,31 2 12 GiB 14 days scrub 78% low DA2 no 6 58 2,31 2 4096 MiB 14 days scrub 10% low 12 GiB in DA1, and 4096 MiB i DA2, but effectively you'll get less when you add a raidCode to the vdisk. Best way to use it id to just don't specify a size to the vdisk, and max possible size will be used. -jf s?n. 9. jul. 2017 kl. 14.26 skrev atmane khiredine : > thank you very much for replying. I can not find the free space > > Here is the output of mmlsrecoverygroup > > [root at server1 ~]#mmlsrecoverygroup > > declustered > arrays with > recovery group vdisks vdisks servers > ------------------ ----------- ------ ------- > BB1RGL 3 18 server1,server2 > BB1RGR 3 18 server2,server1 > -------------------------------------------------------------- > [root at server ~]# mmlsrecoverygroup BB1RGL -L > > declustered > recovery group arrays vdisks pdisks format version > ----------------- ----------- ------ ------ -------------- > BB1RGL 3 18 119 4.2.0.1 > > declustered needs replace > scrub background activity > array service vdisks pdisks spares threshold free space > duration task progress priority > ----------- ------- ------ ------ ------ --------- ---------- > -------- ------------------------- > LOG no 1 3 0,0 1 558 GiB 14 > days scrub 51% low > DA1 no 11 58 2,31 2 12 GiB 14 > days scrub 78% low > DA2 no 6 58 2,31 2 4096 MiB 14 > days scrub 10% low > > declustered > checksum > vdisk RAID code array vdisk size block > size granularity state remarks > ------------------ ------------------ ----------- ---------- > ---------- ----------- ----- ------- > gss0_logtip 3WayReplication LOG 128 MiB 1 > MiB 512 ok logTip > gss0_loghome 4WayReplication DA1 40 GiB 1 > MiB 512 ok log > BB1RGL_GPFS4_META1 4WayReplication DA1 451 GiB 1 > MiB 32 KiB ok > BB1RGL_GPFS4_DATA1 8+2p DA1 5133 GiB 1 > MiB 32 KiB ok > BB1RGL_GPFS1_META1 4WayReplication DA1 451 GiB 1 > MiB 32 KiB ok > BB1RGL_GPFS1_DATA1 8+2p DA1 12 TiB 1 > MiB 32 KiB ok > BB1RGL_GPFS3_META1 4WayReplication DA1 451 GiB 1 > MiB 32 KiB ok > BB1RGL_GPFS3_DATA1 8+2p DA1 12 TiB 1 > MiB 32 KiB ok > BB1RGL_GPFS2_META1 4WayReplication DA1 451 GiB 1 > MiB 32 KiB ok > BB1RGL_GPFS2_DATA1 8+2p DA1 13 TiB 2 > MiB 32 KiB ok > BB1RGL_GPFS2_META2 4WayReplication DA2 451 GiB 1 > MiB 32 KiB ok > BB1RGL_GPFS2_DATA2 8+2p DA2 13 TiB 2 > MiB 32 KiB ok > BB1RGL_GPFS1_META2 4WayReplication DA2 451 GiB 1 > MiB 32 KiB ok > BB1RGL_GPFS1_DATA2 8+2p DA2 12 TiB 1 > MiB 32 KiB ok > BB1RGL_GPFS5_META1 4WayReplication DA1 750 GiB 1 > MiB 32 KiB ok > BB1RGL_GPFS5_DATA1 8+2p DA1 70 TiB 16 > MiB 32 KiB ok > BB1RGL_GPFS5_META2 4WayReplication DA2 750 GiB 1 > MiB 32 KiB ok > BB1RGL_GPFS5_DATA2 8+2p DA2 90 TiB 16 > MiB 32 KiB ok > > config data declustered array VCD spares actual rebuild > spare space remarks > ------------------ ------------------ ------------- > --------------------------------- ---------------- > rebuild space DA1 31 34 pdisk > rebuild space DA2 31 35 pdisk > > > config data max disk group fault tolerance actual disk group > fault tolerance remarks > ------------------ --------------------------------- > --------------------------------- ---------------- > rg descriptor 1 enclosure + 1 drawer 1 enclosure + 1 > drawer limiting fault tolerance > system index 2 enclosure 1 enclosure + 1 > drawer limited by rg descriptor > > vdisk max disk group fault tolerance actual disk group > fault tolerance remarks > ------------------ --------------------------------- > --------------------------------- ---------------- > gss0_logtip 2 enclosure 1 enclosure + 1 > drawer limited by rg descriptor > gss0_loghome 1 enclosure + 1 drawer 1 enclosure + 1 > drawer > BB1RGL_GPFS4_META1 1 enclosure + 1 drawer 1 enclosure + 1 > drawer > BB1RGL_GPFS4_DATA1 2 drawer 2 drawer > BB1RGL_GPFS1_META1 1 enclosure + 1 drawer 1 enclosure + 1 > drawer > BB1RGL_GPFS1_DATA1 2 drawer 2 drawer > BB1RGL_GPFS3_META1 1 enclosure + 1 drawer 1 enclosure + 1 > drawer > BB1RGL_GPFS3_DATA1 2 drawer 2 drawer > BB1RGL_GPFS2_META1 1 enclosure + 1 drawer 1 enclosure + 1 > drawer > BB1RGL_GPFS2_DATA1 2 drawer 2 drawer > BB1RGL_GPFS2_META2 3 enclosure 1 enclosure + 1 > drawer limited by rg descriptor > BB1RGL_GPFS2_DATA2 2 drawer 2 drawer > BB1RGL_GPFS1_META2 3 enclosure 1 enclosure + 1 > drawer limited by rg descriptor > BB1RGL_GPFS1_DATA2 2 drawer 2 drawer > BB1RGL_GPFS5_META1 1 enclosure + 1 drawer 1 enclosure + 1 > drawer > BB1RGL_GPFS5_DATA1 2 drawer 2 drawer > BB1RGL_GPFS5_META2 3 enclosure 1 enclosure + 1 > drawer limited by rg descriptor > BB1RGL_GPFS5_DATA2 2 drawer 2 drawer > > active recovery group server servers > ----------------------------------------------- ------- > server1 server1,server2 > > > Atmane Khiredine > HPC System Administrator | Office National de la M?t?orologie > T?l : +213 21 50 73 93 # 303 | Fax : +213 21 50 79 40 | E-mail : > a.khiredine at meteo.dz > ________________________________ > De : Laurence Horrocks-Barlow [laurence at qsplace.co.uk] > Envoy? : dimanche 9 juillet 2017 09:58 > ? : gpfsug main discussion list; atmane khiredine; > gpfsug-discuss at spectrumscale.org > Objet : Re: [gpfsug-discuss] get free space in GSS > > You can check the recovery groups to see if there is any remaining space. > > I don't have access to my test system to confirm the syntax however if > memory serves. > > Run mmlsrecoverygroup to get a list of all the recovery groups then: > > mmlsrecoverygroup -L > > This will list all your declustered arrays and their free space. > > Their might be another method, however this way has always worked well for > me. > > -- Lauz > > > > On 9 July 2017 09:00:07 BST, Atmane wrote: > > Dear all, > > My name is Khiredine Atmane and I am a HPC system administrator at the > National Office of Meteorology Algeria . We have a GSS24 running > gss2.5.10.3-3b and gpfs-4.2.0.3. > > GSS configuration: 4 enclosures, 6 SSDs, 1 empty slots, 239 disks total, 0 > NVRAM partitions > > disks = 3Tb > SSD = 200 Gb > df -h > Filesystem Size Used Avail Use% Mounted on > > /dev/gpfs1 49T 18T 31T 38% /gpfs1 > /dev/gpfs2 53T 13T 40T 25% /gpfs2 > /dev/gpfs3 25T 4.9T 20T 21% /gpfs3 > /dev/gpfs4 11T 133M 11T 1% /gpfs4 > /dev/gpfs5 323T 34T 290T 11% /gpfs5 > > Total Is 461 To > > I think we have more space > Could anyone make recommendation to troubleshoot find how many free space > in GSS ? > How to find the available space ? > Thank you! > > Atmane > > > > -- > Sent from my Android device with K-9 Mail. Please excuse my brevity. > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: From kums at us.ibm.com Sun Jul 9 17:52:02 2017 From: kums at us.ibm.com (Kumaran Rajaram) Date: Sun, 9 Jul 2017 12:52:02 -0400 Subject: [gpfsug-discuss] get free space in GSS In-Reply-To: <4B32CB5C696F2849BDEF7DF9EACE884B5B41A465@SDEB-EXC02.meteo.dz> References: , <4B32CB5C696F2849BDEF7DF9EACE884B5B41A465@SDEB-EXC02.meteo.dz> Message-ID: Hi Atmane, >> I can not find the free space Based on your output below, your setup currently has two recovery groups BB1RGL and BB1RGR. Issue "mmlsrecoverygroup BB1RGL -L" and "mmlsrecoverygroup BB1RGR -L" to obtain free space in each DA. Based on your "mmlsrecoverygroup BB1RGL -L" output below, BB1RGL "DA1" has 12GiB and "DA2" has 4GiB free space. The metadataOnly and dataOnly vdisk/NSD are created from DA1 and DA2. declustered needs replace scrub background activity array service vdisks pdisks spares threshold free space duration task progress priority ----------- ------- ------ ------ ------ --------- ---------- -------- ------------------------- LOG no 1 3 0,0 1 558 GiB 14 days scrub 51% low DA1 no 11 58 2,31 2 12 GiB 14 days scrub 78% low DA2 no 6 58 2,31 2 4096 MiB 14 days scrub 10% low In addition, you may use "mmlsnsd" to obtain mapping of file-system to vdisk/NSD + use "mmdf " command to query user or available capacity on a GPFS file system. Hope this helps, -Kums From: atmane khiredine To: Laurence Horrocks-Barlow , "gpfsug main discussion list" Date: 07/09/2017 08:27 AM Subject: Re: [gpfsug-discuss] get free space in GSS Sent by: gpfsug-discuss-bounces at spectrumscale.org thank you very much for replying. I can not find the free space Here is the output of mmlsrecoverygroup [root at server1 ~]#mmlsrecoverygroup declustered arrays with recovery group vdisks vdisks servers ------------------ ----------- ------ ------- BB1RGL 3 18 server1,server2 BB1RGR 3 18 server2,server1 -------------------------------------------------------------- [root at server ~]# mmlsrecoverygroup BB1RGL -L declustered recovery group arrays vdisks pdisks format version ----------------- ----------- ------ ------ -------------- BB1RGL 3 18 119 4.2.0.1 declustered needs replace scrub background activity array service vdisks pdisks spares threshold free space duration task progress priority ----------- ------- ------ ------ ------ --------- ---------- -------- ------------------------- LOG no 1 3 0,0 1 558 GiB 14 days scrub 51% low DA1 no 11 58 2,31 2 12 GiB 14 days scrub 78% low DA2 no 6 58 2,31 2 4096 MiB 14 days scrub 10% low declustered checksum vdisk RAID code array vdisk size block size granularity state remarks ------------------ ------------------ ----------- ---------- ---------- ----------- ----- ------- gss0_logtip 3WayReplication LOG 128 MiB 1 MiB 512 ok logTip gss0_loghome 4WayReplication DA1 40 GiB 1 MiB 512 ok log BB1RGL_GPFS4_META1 4WayReplication DA1 451 GiB 1 MiB 32 KiB ok BB1RGL_GPFS4_DATA1 8+2p DA1 5133 GiB 1 MiB 32 KiB ok BB1RGL_GPFS1_META1 4WayReplication DA1 451 GiB 1 MiB 32 KiB ok BB1RGL_GPFS1_DATA1 8+2p DA1 12 TiB 1 MiB 32 KiB ok BB1RGL_GPFS3_META1 4WayReplication DA1 451 GiB 1 MiB 32 KiB ok BB1RGL_GPFS3_DATA1 8+2p DA1 12 TiB 1 MiB 32 KiB ok BB1RGL_GPFS2_META1 4WayReplication DA1 451 GiB 1 MiB 32 KiB ok BB1RGL_GPFS2_DATA1 8+2p DA1 13 TiB 2 MiB 32 KiB ok BB1RGL_GPFS2_META2 4WayReplication DA2 451 GiB 1 MiB 32 KiB ok BB1RGL_GPFS2_DATA2 8+2p DA2 13 TiB 2 MiB 32 KiB ok BB1RGL_GPFS1_META2 4WayReplication DA2 451 GiB 1 MiB 32 KiB ok BB1RGL_GPFS1_DATA2 8+2p DA2 12 TiB 1 MiB 32 KiB ok BB1RGL_GPFS5_META1 4WayReplication DA1 750 GiB 1 MiB 32 KiB ok BB1RGL_GPFS5_DATA1 8+2p DA1 70 TiB 16 MiB 32 KiB ok BB1RGL_GPFS5_META2 4WayReplication DA2 750 GiB 1 MiB 32 KiB ok BB1RGL_GPFS5_DATA2 8+2p DA2 90 TiB 16 MiB 32 KiB ok config data declustered array VCD spares actual rebuild spare space remarks ------------------ ------------------ ------------- --------------------------------- ---------------- rebuild space DA1 31 34 pdisk rebuild space DA2 31 35 pdisk config data max disk group fault tolerance actual disk group fault tolerance remarks ------------------ --------------------------------- --------------------------------- ---------------- rg descriptor 1 enclosure + 1 drawer 1 enclosure + 1 drawer limiting fault tolerance system index 2 enclosure 1 enclosure + 1 drawer limited by rg descriptor vdisk max disk group fault tolerance actual disk group fault tolerance remarks ------------------ --------------------------------- --------------------------------- ---------------- gss0_logtip 2 enclosure 1 enclosure + 1 drawer limited by rg descriptor gss0_loghome 1 enclosure + 1 drawer 1 enclosure + 1 drawer BB1RGL_GPFS4_META1 1 enclosure + 1 drawer 1 enclosure + 1 drawer BB1RGL_GPFS4_DATA1 2 drawer 2 drawer BB1RGL_GPFS1_META1 1 enclosure + 1 drawer 1 enclosure + 1 drawer BB1RGL_GPFS1_DATA1 2 drawer 2 drawer BB1RGL_GPFS3_META1 1 enclosure + 1 drawer 1 enclosure + 1 drawer BB1RGL_GPFS3_DATA1 2 drawer 2 drawer BB1RGL_GPFS2_META1 1 enclosure + 1 drawer 1 enclosure + 1 drawer BB1RGL_GPFS2_DATA1 2 drawer 2 drawer BB1RGL_GPFS2_META2 3 enclosure 1 enclosure + 1 drawer limited by rg descriptor BB1RGL_GPFS2_DATA2 2 drawer 2 drawer BB1RGL_GPFS1_META2 3 enclosure 1 enclosure + 1 drawer limited by rg descriptor BB1RGL_GPFS1_DATA2 2 drawer 2 drawer BB1RGL_GPFS5_META1 1 enclosure + 1 drawer 1 enclosure + 1 drawer BB1RGL_GPFS5_DATA1 2 drawer 2 drawer BB1RGL_GPFS5_META2 3 enclosure 1 enclosure + 1 drawer limited by rg descriptor BB1RGL_GPFS5_DATA2 2 drawer 2 drawer active recovery group server servers ----------------------------------------------- ------- server1 server1,server2 Atmane Khiredine HPC System Administrator | Office National de la M?t?orologie T?l : +213 21 50 73 93 # 303 | Fax : +213 21 50 79 40 | E-mail : a.khiredine at meteo.dz ________________________________ De : Laurence Horrocks-Barlow [laurence at qsplace.co.uk] Envoy? : dimanche 9 juillet 2017 09:58 ? : gpfsug main discussion list; atmane khiredine; gpfsug-discuss at spectrumscale.org Objet : Re: [gpfsug-discuss] get free space in GSS You can check the recovery groups to see if there is any remaining space. I don't have access to my test system to confirm the syntax however if memory serves. Run mmlsrecoverygroup to get a list of all the recovery groups then: mmlsrecoverygroup -L This will list all your declustered arrays and their free space. Their might be another method, however this way has always worked well for me. -- Lauz On 9 July 2017 09:00:07 BST, Atmane wrote: Dear all, My name is Khiredine Atmane and I am a HPC system administrator at the National Office of Meteorology Algeria . We have a GSS24 running gss2.5.10.3-3b and gpfs-4.2.0.3. GSS configuration: 4 enclosures, 6 SSDs, 1 empty slots, 239 disks total, 0 NVRAM partitions disks = 3Tb SSD = 200 Gb df -h Filesystem Size Used Avail Use% Mounted on /dev/gpfs1 49T 18T 31T 38% /gpfs1 /dev/gpfs2 53T 13T 40T 25% /gpfs2 /dev/gpfs3 25T 4.9T 20T 21% /gpfs3 /dev/gpfs4 11T 133M 11T 1% /gpfs4 /dev/gpfs5 323T 34T 290T 11% /gpfs5 Total Is 461 To I think we have more space Could anyone make recommendation to troubleshoot find how many free space in GSS ? How to find the available space ? Thank you! Atmane -- Sent from my Android device with K-9 Mail. Please excuse my brevity. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From a.khiredine at meteo.dz Mon Jul 10 10:39:27 2017 From: a.khiredine at meteo.dz (Atmane) Date: Mon, 10 Jul 2017 10:39:27 +0100 Subject: [gpfsug-discuss] New Version Of GSS 3.1b 16-Feb-2017 Message-ID: Dear all, There is a new version of GSS Is there someone who made the update ? thanks Lenovo System x GPFS Storage Server (GSS) Version 3.1b 16-Feb-2017 What?s new in Lenovo GSS, Version 3.1 ? New features: - RHEL 7.2 ? GSS Expandability ? Online addition of more JBODs to an existing GSS building block (max. 6 JBOD total) ? Must be same JBOD type and drive type as in the existing building block ? Selectable Spectrum Scale (GPFS) software version and edition ?Four GSS tarballs, for Spectrum Scale {Standard or Advanced Edition} @ {v4.1.1 or v4.2.1} ? Hardware news: ? 10TB drive support: two JBOD MTMs (0796-HCJ/16X and 0796-HCK/17X), drive FRU (01GV110), no drive option ? Withdrawal of the 3TB drive models (0796-HC3/07X and 0796-HC4/08X) ? GSS22 in xConfig (no more need for special-bid) ? Software and firmware news: ? Update of IBM Spectrum Scale v4.2.1 to latest PTF level ? Update of Intel OPA from 10.1 to 10.2 (incl. performance fixes) ? Refresh of server and adapter FW levels to Scalable Infrastructure ?16C? recommended levels ? Not much news this time, as ?16C? FW is almost identical to ?16B - List GPFS RPM gpfs.adv-4.2.1-2.12.x86_64.rpm gpfs.base-4.2.1-2.12.x86_64.rpm gpfs.callhome-4.2.1-1.000.el7.noarch.rpm gpfs.callhome-ecc-client-4.2.1-1.000.noarch.rpm gpfs.crypto-4.2.1-2.12.x86_64.rpm gpfs.docs-4.2.1-2.12.noarch.rpm gpfs.ext-4.2.1-2.12.x86_64.rpm gpfs.gnr-4.2.1-2.12.x86_64.rpm gpfs.gnr.base-1.0.0-0.x86_64.rpm gpfs.gpl-4.2.1-2.12.noarch.rpm gpfs.gskit-8.0.50-57.x86_64.rpm gpfs.gss.firmware-4.2.0-5.x86_64.rpm gpfs.gss.pmcollector-4.2.2-2.el7.x86_64.rpm gpfs.gss.pmsensors-4.2.2-2.el7.x86_64.rpm gpfs.gui-4.2.1-2.3.noarch.rpm gpfs.java-4.2.2-2.x86_64.rpm gpfs.msg.en_US-4.2.1-2.12.noarch.rpm -- Atmane Khiredine HPC System Admin | Office National de la M?t?orologie T?l : +213 21 50 73 93 Poste 303 | Fax : +213 21 50 79 40 | E-mail : a.khiredine at meteo.dz From Greg.Lehmann at csiro.au Tue Jul 11 05:54:39 2017 From: Greg.Lehmann at csiro.au (Greg.Lehmann at csiro.au) Date: Tue, 11 Jul 2017 04:54:39 +0000 Subject: [gpfsug-discuss] Requirement to keep gpfs.smb the same on all nodes In-Reply-To: References: Message-ID: <4c9ae144c1114b85b7f2cdc27eefd749@exch1-cdc.nexus.csiro.au> Yes, although it is early days for us and I would not say we have finished testing as yet. We have upgraded twice to get there from 4.2.3-0. It seems OK and I have not noticed any changes from 4.2.3.0. Greg From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Sobey, Richard A Sent: Friday, 7 July 2017 10:02 PM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] Requirement to keep gpfs.smb the same on all nodes Just following up on this, has anyone successfully deployed Protocols (SMB) on RHEL 7.3 with the 4.2.3-2 packages? Richard From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Sobey, Richard A Sent: 04 July 2017 12:12 To: gpfsug main discussion list > Subject: Re: [gpfsug-discuss] Requirement to keep gpfs.smb the same on all nodes OK Simon, thanks. I suppose we're all in the same boat having to get change management approval etc! From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Simon Thompson (IT Research Support) Sent: 04 July 2017 12:09 To: gpfsug main discussion list > Subject: Re: [gpfsug-discuss] Requirement to keep gpfs.smb the same on all nodes AFAIK. Always. We have had the service eat itself BTW by having different code releases and trying this. Yes its a PITA that we have to get a change approval for it (so we don't do it as often as we should)... The upgrade process upgrades the SMB registry, we have also seen the CTDB lock stuff break when they are not running the same code release, so now we just don't do this. Simon From: > on behalf of "Sobey, Richard A" > Reply-To: "gpfsug-discuss at spectrumscale.org" > Date: Tuesday, 4 July 2017 at 11:54 To: "gpfsug-discuss at spectrumscale.org" > Subject: [gpfsug-discuss] Requirement to keep gpfs.smb the same on all nodes Hi all, For how long has this requirement been in force, and why? https://www.ibm.com/support/knowledgecenter/en/STXKQY_4.2.3/com.ibm.spectrum.scale.v4r23.doc/bl1ins_updatingsmb.htm All protocol nodes running the SMB service must have the same version of gpfs.smb installed at any time. This requires a brief outage of the SMB service to upgrade gpfs.smb to the newer version across all protocol nodes. The procedure outlined here is intended to reduce the outage to a minimum. Previously I've upgraded nodes one at a time over the course of a few days. Is the impact just that we won't be supported, or will a hole open up beneath my feet and swallow me whole? I really don't fancy the headache of getting approvals to get an outage of even 5 minutes at 6am.... Cheers Richard -------------- next part -------------- An HTML attachment was scrubbed... URL: From heiner.billich at psi.ch Tue Jul 11 10:36:39 2017 From: heiner.billich at psi.ch (Billich Heinrich Rainer (PSI)) Date: Tue, 11 Jul 2017 09:36:39 +0000 Subject: [gpfsug-discuss] does AFM support NFS via RDMA Message-ID: <3478917F-7864-4D84-847F-1ED249098AD9@psi.ch> Hello, We run AFM using NFS as transport between home and cache. Using IP-over-Infiniband we see a throughput between 1 and 2 GB/s. This is not bad but far from what a native IB link provides ? 6GB/s . Does AFM?s nfs client on gateway nodes support NFS using RDMA? I would like to try. Or should we try to tune nfs and the IP stack ? I wonder if anybody got throughput above 2 GB/s using IPoIB and FDR between two nodes? We can?t use a native gpfs multicluster mount ? this links home and cache much too strong: If home fails cache will unmount the cache fileset ? this is what I get from the manuals. We run spectrum scale 4.2.2/4.2.3 on Redhat 7. Thank you, Heiner Billich -- Paul Scherrer Institut Heiner Billich WHGA 106 CH 5232 Villigen 056 310 36 02 From abeattie at au1.ibm.com Tue Jul 11 11:14:37 2017 From: abeattie at au1.ibm.com (Andrew Beattie) Date: Tue, 11 Jul 2017 10:14:37 +0000 Subject: [gpfsug-discuss] does AFM support NFS via RDMA In-Reply-To: <3478917F-7864-4D84-847F-1ED249098AD9@psi.ch> References: <3478917F-7864-4D84-847F-1ED249098AD9@psi.ch> Message-ID: An HTML attachment was scrubbed... URL: From bbanister at jumptrading.com Tue Jul 11 15:46:42 2017 From: bbanister at jumptrading.com (Bryan Banister) Date: Tue, 11 Jul 2017 14:46:42 +0000 Subject: [gpfsug-discuss] does AFM support NFS via RDMA In-Reply-To: References: <3478917F-7864-4D84-847F-1ED249098AD9@psi.ch> Message-ID: <44d78fdff53f457a998f240cdf4510d0@jumptrading.com> Sounds like a very interesting topic for an upcoming GPFS UG meeting? say SC?17? -B From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Andrew Beattie Sent: Tuesday, July 11, 2017 5:15 AM To: gpfsug-discuss at spectrumscale.org Cc: gpfsug-discuss at spectrumscale.org; jake.carroll at uq.edu.au Subject: Re: [gpfsug-discuss] does AFM support NFS via RDMA Bilich, Reach out to Jake Carrol at Uni of QLD UQ have been playing with NFS over 10GB / 40GB and 100GB Ethernet and there is LOTS of tuning that you can do to improve how things work Regards, Andrew Beattie Software Defined Storage - IT Specialist Phone: 614-2133-7927 E-mail: abeattie at au1.ibm.com ----- Original message ----- From: "Billich Heinrich Rainer (PSI)" > Sent by: gpfsug-discuss-bounces at spectrumscale.org To: "gpfsug-discuss at spectrumscale.org" > Cc: Subject: [gpfsug-discuss] does AFM support NFS via RDMA Date: Tue, Jul 11, 2017 7:36 PM Hello, We run AFM using NFS as transport between home and cache. Using IP-over-Infiniband we see a throughput between 1 and 2 GB/s. This is not bad but far from what a native IB link provides ? 6GB/s . Does AFM?s nfs client on gateway nodes support NFS using RDMA? I would like to try. Or should we try to tune nfs and the IP stack ? I wonder if anybody got throughput above 2 GB/s using IPoIB and FDR between two nodes? We can?t use a native gpfs multicluster mount ? this links home and cache much too strong: If home fails cache will unmount the cache fileset ? this is what I get from the manuals. We run spectrum scale 4.2.2/4.2.3 on Redhat 7. Thank you, Heiner Billich -- Paul Scherrer Institut Heiner Billich WHGA 106 CH 5232 Villigen 056 310 36 02 _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss ________________________________ Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you are hereby notified that any review, dissemination or copying of this email is strictly prohibited, and to please notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request or solicitation of any kind to buy, sell, subscribe, redeem or perform any type of transaction of a financial product. -------------- next part -------------- An HTML attachment was scrubbed... URL: From jake.carroll at uq.edu.au Tue Jul 11 22:38:43 2017 From: jake.carroll at uq.edu.au (Jake Carroll) Date: Tue, 11 Jul 2017 21:38:43 +0000 Subject: [gpfsug-discuss] does AFM support NFS via RDMA In-Reply-To: <44d78fdff53f457a998f240cdf4510d0@jumptrading.com> References: <3478917F-7864-4D84-847F-1ED249098AD9@psi.ch> <44d78fdff53f457a998f240cdf4510d0@jumptrading.com> Message-ID: <72D0CC62-8663-4072-AFA1-735D75EEBBE1@uq.edu.au> I?ll be there! From: Bryan Banister Date: Wednesday, 12 July 2017 at 12:46 am To: gpfsug main discussion list Cc: Jake Carroll Subject: RE: [gpfsug-discuss] does AFM support NFS via RDMA Sounds like a very interesting topic for an upcoming GPFS UG meeting? say SC?17? -B From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Andrew Beattie Sent: Tuesday, July 11, 2017 5:15 AM To: gpfsug-discuss at spectrumscale.org Cc: gpfsug-discuss at spectrumscale.org; jake.carroll at uq.edu.au Subject: Re: [gpfsug-discuss] does AFM support NFS via RDMA Bilich, Reach out to Jake Carrol at Uni of QLD UQ have been playing with NFS over 10GB / 40GB and 100GB Ethernet and there is LOTS of tuning that you can do to improve how things work Regards, Andrew Beattie Software Defined Storage - IT Specialist Phone: 614-2133-7927 E-mail: abeattie at au1.ibm.com ----- Original message ----- From: "Billich Heinrich Rainer (PSI)" > Sent by: gpfsug-discuss-bounces at spectrumscale.org To: "gpfsug-discuss at spectrumscale.org" > Cc: Subject: [gpfsug-discuss] does AFM support NFS via RDMA Date: Tue, Jul 11, 2017 7:36 PM Hello, We run AFM using NFS as transport between home and cache. Using IP-over-Infiniband we see a throughput between 1 and 2 GB/s. This is not bad but far from what a native IB link provides ? 6GB/s . Does AFM?s nfs client on gateway nodes support NFS using RDMA? I would like to try. Or should we try to tune nfs and the IP stack ? I wonder if anybody got throughput above 2 GB/s using IPoIB and FDR between two nodes? We can?t use a native gpfs multicluster mount ? this links home and cache much too strong: If home fails cache will unmount the cache fileset ? this is what I get from the manuals. We run spectrum scale 4.2.2/4.2.3 on Redhat 7. Thank you, Heiner Billich -- Paul Scherrer Institut Heiner Billich WHGA 106 CH 5232 Villigen 056 310 36 02 _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss ________________________________ Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you are hereby notified that any review, dissemination or copying of this email is strictly prohibited, and to please notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request or solicitation of any kind to buy, sell, subscribe, redeem or perform any type of transaction of a financial product. -------------- next part -------------- An HTML attachment was scrubbed... URL: From kkr at lbl.gov Tue Jul 11 23:07:49 2017 From: kkr at lbl.gov (Kristy Kallback-Rose) Date: Tue, 11 Jul 2017 15:07:49 -0700 Subject: [gpfsug-discuss] does AFM support NFS via RDMA In-Reply-To: <44d78fdff53f457a998f240cdf4510d0@jumptrading.com> References: <3478917F-7864-4D84-847F-1ED249098AD9@psi.ch> <44d78fdff53f457a998f240cdf4510d0@jumptrading.com> Message-ID: <9BA6A8E3-D633-4DFF-826F-5ACE49361694@lbl.gov> Sounds good. Is someone willing to take on this talk? User-driven talks on real experiences are always welcome. Cheers, Kristy > On Jul 11, 2017, at 7:46 AM, Bryan Banister wrote: > > Sounds like a very interesting topic for an upcoming GPFS UG meeting? say SC?17? > -B > > From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org ] On Behalf Of Andrew Beattie > Sent: Tuesday, July 11, 2017 5:15 AM > To: gpfsug-discuss at spectrumscale.org > Cc: gpfsug-discuss at spectrumscale.org ; jake.carroll at uq.edu.au > Subject: Re: [gpfsug-discuss] does AFM support NFS via RDMA > > Bilich, > > Reach out to Jake Carrol at Uni of QLD > > UQ have been playing with NFS over 10GB / 40GB and 100GB Ethernet > and there is LOTS of tuning that you can do to improve how things work > > Regards, > Andrew Beattie > Software Defined Storage - IT Specialist > Phone: 614-2133-7927 > E-mail: abeattie at au1.ibm.com > > > ----- Original message ----- > From: "Billich Heinrich Rainer (PSI)" > > Sent by: gpfsug-discuss-bounces at spectrumscale.org > To: "gpfsug-discuss at spectrumscale.org " > > Cc: > Subject: [gpfsug-discuss] does AFM support NFS via RDMA > Date: Tue, Jul 11, 2017 7:36 PM > > Hello, > > We run AFM using NFS as transport between home and cache. Using IP-over-Infiniband we see a throughput between 1 and 2 GB/s. This is not bad but far from what a native IB link provides ? 6GB/s . Does AFM?s nfs client on gateway nodes support NFS using RDMA? I would like to try. Or should we try to tune nfs and the IP stack ? I wonder if anybody got throughput above 2 GB/s using IPoIB and FDR between two nodes? > > We can?t use a native gpfs multicluster mount ? this links home and cache much too strong: If home fails cache will unmount the cache fileset ? this is what I get from the manuals. > > We run spectrum scale 4.2.2/4.2.3 on Redhat 7. > > Thank you, > > Heiner Billich > > -- > Paul Scherrer Institut > Heiner Billich > WHGA 106 > CH 5232 Villigen > 056 310 36 02 > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > > Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you are hereby notified that any review, dissemination or copying of this email is strictly prohibited, and to please notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request or solicitation of any kind to buy, sell, subscribe, redeem or perform any type of transaction of a financial product. > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From Robert.Oesterlin at nuance.com Wed Jul 12 17:06:40 2017 From: Robert.Oesterlin at nuance.com (Oesterlin, Robert) Date: Wed, 12 Jul 2017 16:06:40 +0000 Subject: [gpfsug-discuss] WekaIO Unveils Cloud-Native Scalable File System Message-ID: Interesting. Performance is one thing, but how usable. IBM, watch your back :-) ?WekaIO is the world?s fastest distributed file system, processing four times the workload compared to IBM Spectrum Scale measured on Standard Performance Evaluation Corp. (SPEC) SFS 2014, an independent industry benchmark. Utilizing only 120 cloud compute instances with locally attached storage, WekaIO completed 1,000 simultaneous software builds compared to 240 on IBM?s high-end FlashSystem 900.? https://www.hpcwire.com/off-the-wire/wekaio-unveils-cloud-native-scalable-file-system/ Bob Oesterlin Sr Principal Storage Engineer, Nuance 507-269-0413 -------------- next part -------------- An HTML attachment was scrubbed... URL: From oehmes at gmail.com Wed Jul 12 18:24:19 2017 From: oehmes at gmail.com (Sven Oehme) Date: Wed, 12 Jul 2017 17:24:19 +0000 Subject: [gpfsug-discuss] WekaIO Unveils Cloud-Native Scalable File System In-Reply-To: References: Message-ID: while i really like competition on SpecSFS, the claims from the WekaIO people are lets say 'alternative facts' at best The Spectrum Scale results were done on 4 Nodes with 2 Flash Storage devices attached, they compare this to a WekaIO system with 14 times more memory (14 TB vs 1TB) , 120 SSD's (vs 64 Flashcore Modules) across 15 times more compute nodes (60 vs 4) . said all this, the article claims 1000 builds, while the actual submission only delivers 500 --> https://www.spec.org/sfs2014/results/sfs2014.html so they need 14 times more memory and cores and 2 times flash to show twice as many builds at double the response time, i leave this to everybody who understands this facts to judge how great that result really is. Said all this, Spectrum Scale scales almost linear if you double the nodes , network and storage accordingly, so there is no reason to believe we couldn't easily beat this, its just a matter of assemble the HW in a lab and run the test. btw we scale to 10k+ nodes , 2500 times the number we used in our publication :-D Sven On Wed, Jul 12, 2017 at 9:06 AM Oesterlin, Robert < Robert.Oesterlin at nuance.com> wrote: > Interesting. Performance is one thing, but how usable. IBM, watch your > back :-) > > > > *?WekaIO is the world?s fastest distributed file system, processing four > times the workload compared to IBM Spectrum Scale measured on Standard > Performance Evaluation Corp. (SPEC) SFS 2014, an independent industry > benchmark. Utilizing only 120 cloud compute instances with locally attached > storage, WekaIO completed 1,000 simultaneous software builds compared to > 240 on IBM?s high-end FlashSystem 900.?* > > > > > https://www.hpcwire.com/off-the-wire/wekaio-unveils-cloud-native-scalable-file-system/ > > > > Bob Oesterlin > Sr Principal Storage Engineer, Nuance > 507-269-0413 <(507)%20269-0413> > > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ewahl at osc.edu Wed Jul 12 19:20:06 2017 From: ewahl at osc.edu (Edward Wahl) Date: Wed, 12 Jul 2017 14:20:06 -0400 Subject: [gpfsug-discuss] WekaIO Unveils Cloud-Native Scalable File System In-Reply-To: References: Message-ID: <20170712142006.297cc9f2@osc.edu> Ah benchmarks... There are Lies, damn Lies, and then benchmarks. I've been in HPC a while on both the vendor (Cray) and customer side, and until I see Lustre, BeeGFS, Spectrum Scale, StorNext, OrangeFS, CEPH, Gluster, 'Flash in the pan v1', etc. all run on the EXACT same hardware I take ALL benchmarks with a POUND of salt. Too easy to finagle whatever result you want. Besides, benchmarks and real world performance are vastly different unless you are using IO kernels based on your local apps as your benchmark. I have a feeling MANY of the folks on this list feel similarly. ;) I recall when we figured out how someone cheated a SPEC test once by only using the inner-track of drives. ^_^ Ed On Wed, 12 Jul 2017 16:06:40 +0000 "Oesterlin, Robert" wrote: > Interesting. Performance is one thing, but how usable. IBM, watch your > back :-) > > ?WekaIO is the world?s fastest distributed file system, processing four times > the workload compared to IBM Spectrum Scale measured on Standard Performance > Evaluation Corp. (SPEC) SFS 2014, an independent industry benchmark. > Utilizing only 120 cloud compute instances with locally attached storage, > WekaIO completed 1,000 simultaneous software builds compared to 240 on IBM?s > high-end FlashSystem 900.? > > https://www.hpcwire.com/off-the-wire/wekaio-unveils-cloud-native-scalable-file-system/ > > Bob Oesterlin > Sr Principal Storage Engineer, Nuance > 507-269-0413 -- Ed Wahl Ohio Supercomputer Center 614-292-9302 From r.sobey at imperial.ac.uk Wed Jul 12 19:20:32 2017 From: r.sobey at imperial.ac.uk (Sobey, Richard A) Date: Wed, 12 Jul 2017 18:20:32 +0000 Subject: [gpfsug-discuss] WekaIO Unveils Cloud-Native Scalable File System In-Reply-To: References: Message-ID: I'm reading it as "WeakIO" which probably isn't a good thing.. both in the context of my eyesight and the negative connotation of the product :) ________________________________ From: gpfsug-discuss-bounces at spectrumscale.org on behalf of Oesterlin, Robert Sent: 12 July 2017 17:06 To: gpfsug main discussion list Subject: [gpfsug-discuss] WekaIO Unveils Cloud-Native Scalable File System Interesting. Performance is one thing, but how usable. IBM, watch your back :-) ?WekaIO is the world?s fastest distributed file system, processing four times the workload compared to IBM Spectrum Scale measured on Standard Performance Evaluation Corp. (SPEC) SFS 2014, an independent industry benchmark. Utilizing only 120 cloud compute instances with locally attached storage, WekaIO completed 1,000 simultaneous software builds compared to 240 on IBM?s high-end FlashSystem 900.? https://www.hpcwire.com/off-the-wire/wekaio-unveils-cloud-native-scalable-file-system/ Bob Oesterlin Sr Principal Storage Engineer, Nuance 507-269-0413 -------------- next part -------------- An HTML attachment was scrubbed... URL: From Robert.Oesterlin at nuance.com Wed Jul 12 19:27:12 2017 From: Robert.Oesterlin at nuance.com (Oesterlin, Robert) Date: Wed, 12 Jul 2017 18:27:12 +0000 Subject: [gpfsug-discuss] WekaIO Unveils Cloud-Native Scalable File System Message-ID: <92349D18-3614-4235-B30C-ADCCE3782CDD@nuance.com> Ah yes - Sven keeping us honest! Bob Oesterlin Sr Principal Storage Engineer, Nuance From: on behalf of Sven Oehme Reply-To: gpfsug main discussion list Date: Wednesday, July 12, 2017 at 12:24 PM To: gpfsug main discussion list Subject: [EXTERNAL] Re: [gpfsug-discuss] WekaIO Unveils Cloud-Native Scalable File System while i really like competition on SpecSFS, the claims from the WekaIO people are lets say 'alternative facts' at best The Spectrum Scale results were done on 4 Nodes with 2 Flash Storage devices attached, they compare this to a WekaIO system with 14 times more memory (14 TB vs 1TB) , 120 SSD's (vs 64 Flashcore Modules) across 15 times more compute nodes (60 vs 4) . said all this, the article claims 1000 builds, while the actual submission only delivers 500 --> https://www.spec.org/sfs2014/results/sfs2014.html -------------- next part -------------- An HTML attachment was scrubbed... URL: From sannaik2 at in.ibm.com Fri Jul 14 06:55:30 2017 From: sannaik2 at in.ibm.com (Sandeep Naik1) Date: Fri, 14 Jul 2017 11:25:30 +0530 Subject: [gpfsug-discuss] get free space in GSS In-Reply-To: References: , <4B32CB5C696F2849BDEF7DF9EACE884B5B41A465@SDEB-EXC02.meteo.dz> Message-ID: Hi Atmane, There can be two meaning of available free space? One what is available on existing filesystem. For this you rightly referred to df -h command o/p. This is the actual free space available in already created filesystem. Filesystem Size Used Avail Use% Mounted on /dev/gpfs1 49T 18T 31T 38% /gpfs1 /dev/gpfs2 53T 13T 40T 25% /gpfs2 /dev/gpfs3 25T 4.9T 20T 21% /gpfs3 /dev/gpfs4 11T 133M 11T 1% /gpfs4 /dev/gpfs5 323T 34T 290T 11% /gpfs5 The other is free space available in DA. For which as every one said use mmlsrecoverygroup -L Please note that is will give you raw free capacity. For usable free capacity in DA you have to add RAID over head. But based on your o/p you have very little/no free space left in DA. [root at server1 ~]#mmlsrecoverygroup declustered arrays with recovery group vdisks vdisks servers ------------------ ----------- ------ ------- BB1RGL 3 18 server1,server2 BB1RGR 3 18 server2,server1 -------------------------------------------------------------- [root at server ~]# mmlsrecoverygroup BB1RGL -L declustered recovery group arrays vdisks pdisks format version ----------------- ----------- ------ ------ -------------- BB1RGL 3 18 119 4.2.0.1 declustered needs replace scrub background activity array service vdisks pdisks spares threshold free space duration task progress priority ----------- ------- ------ ------ ------ --------- ---------- -------- ------------------------- LOG no 1 3 0,0 1 558 GiB 14 days scrub 51% low DA1 no 11 58 2,31 2 12 GiB 14 days scrub 78% low DA2 no 6 58 2,31 2 4096 MiB 14 days scrub 10% low Thanks, Sandeep Naik Elastic Storage server / GPFS Test ETZ-B, Hinjewadi Pune India (+91) 8600994314 From: "Kumaran Rajaram" To: gpfsug main discussion list , atmane khiredine Date: 09/07/2017 10:22 PM Subject: Re: [gpfsug-discuss] get free space in GSS Sent by: gpfsug-discuss-bounces at spectrumscale.org Hi Atmane, >> I can not find the free space Based on your output below, your setup currently has two recovery groups BB1RGL and BB1RGR. Issue "mmlsrecoverygroup BB1RGL -L" and "mmlsrecoverygroup BB1RGR -L" to obtain free space in each DA. Based on your "mmlsrecoverygroup BB1RGL -L" output below, BB1RGL "DA1" has 12GiB and "DA2" has 4GiB free space. The metadataOnly and dataOnly vdisk/NSD are created from DA1 and DA2. declustered needs replace scrub background activity array service vdisks pdisks spares threshold free space duration task progress priority ----------- ------- ------ ------ ------ --------- ---------- -------- ------------------------- LOG no 1 3 0,0 1 558 GiB 14 days scrub 51% low DA1 no 11 58 2,31 2 12 GiB 14 days scrub 78% low DA2 no 6 58 2,31 2 4096 MiB 14 days scrub 10% low In addition, you may use "mmlsnsd" to obtain mapping of file-system to vdisk/NSD + use "mmdf " command to query user or available capacity on a GPFS file system. Hope this helps, -Kums From: atmane khiredine To: Laurence Horrocks-Barlow , "gpfsug main discussion list" Date: 07/09/2017 08:27 AM Subject: Re: [gpfsug-discuss] get free space in GSS Sent by: gpfsug-discuss-bounces at spectrumscale.org thank you very much for replying. I can not find the free space Here is the output of mmlsrecoverygroup [root at server1 ~]#mmlsrecoverygroup declustered arrays with recovery group vdisks vdisks servers ------------------ ----------- ------ ------- BB1RGL 3 18 server1,server2 BB1RGR 3 18 server2,server1 -------------------------------------------------------------- [root at server ~]# mmlsrecoverygroup BB1RGL -L declustered recovery group arrays vdisks pdisks format version ----------------- ----------- ------ ------ -------------- BB1RGL 3 18 119 4.2.0.1 declustered needs replace scrub background activity array service vdisks pdisks spares threshold free space duration task progress priority ----------- ------- ------ ------ ------ --------- ---------- -------- ------------------------- LOG no 1 3 0,0 1 558 GiB 14 days scrub 51% low DA1 no 11 58 2,31 2 12 GiB 14 days scrub 78% low DA2 no 6 58 2,31 2 4096 MiB 14 days scrub 10% low declustered checksum vdisk RAID code array vdisk size block size granularity state remarks ------------------ ------------------ ----------- ---------- ---------- ----------- ----- ------- gss0_logtip 3WayReplication LOG 128 MiB 1 MiB 512 ok logTip gss0_loghome 4WayReplication DA1 40 GiB 1 MiB 512 ok log BB1RGL_GPFS4_META1 4WayReplication DA1 451 GiB 1 MiB 32 KiB ok BB1RGL_GPFS4_DATA1 8+2p DA1 5133 GiB 1 MiB 32 KiB ok BB1RGL_GPFS1_META1 4WayReplication DA1 451 GiB 1 MiB 32 KiB ok BB1RGL_GPFS1_DATA1 8+2p DA1 12 TiB 1 MiB 32 KiB ok BB1RGL_GPFS3_META1 4WayReplication DA1 451 GiB 1 MiB 32 KiB ok BB1RGL_GPFS3_DATA1 8+2p DA1 12 TiB 1 MiB 32 KiB ok BB1RGL_GPFS2_META1 4WayReplication DA1 451 GiB 1 MiB 32 KiB ok BB1RGL_GPFS2_DATA1 8+2p DA1 13 TiB 2 MiB 32 KiB ok BB1RGL_GPFS2_META2 4WayReplication DA2 451 GiB 1 MiB 32 KiB ok BB1RGL_GPFS2_DATA2 8+2p DA2 13 TiB 2 MiB 32 KiB ok BB1RGL_GPFS1_META2 4WayReplication DA2 451 GiB 1 MiB 32 KiB ok BB1RGL_GPFS1_DATA2 8+2p DA2 12 TiB 1 MiB 32 KiB ok BB1RGL_GPFS5_META1 4WayReplication DA1 750 GiB 1 MiB 32 KiB ok BB1RGL_GPFS5_DATA1 8+2p DA1 70 TiB 16 MiB 32 KiB ok BB1RGL_GPFS5_META2 4WayReplication DA2 750 GiB 1 MiB 32 KiB ok BB1RGL_GPFS5_DATA2 8+2p DA2 90 TiB 16 MiB 32 KiB ok config data declustered array VCD spares actual rebuild spare space remarks ------------------ ------------------ ------------- --------------------------------- ---------------- rebuild space DA1 31 34 pdisk rebuild space DA2 31 35 pdisk config data max disk group fault tolerance actual disk group fault tolerance remarks ------------------ --------------------------------- --------------------------------- ---------------- rg descriptor 1 enclosure + 1 drawer 1 enclosure + 1 drawer limiting fault tolerance system index 2 enclosure 1 enclosure + 1 drawer limited by rg descriptor vdisk max disk group fault tolerance actual disk group fault tolerance remarks ------------------ --------------------------------- --------------------------------- ---------------- gss0_logtip 2 enclosure 1 enclosure + 1 drawer limited by rg descriptor gss0_loghome 1 enclosure + 1 drawer 1 enclosure + 1 drawer BB1RGL_GPFS4_META1 1 enclosure + 1 drawer 1 enclosure + 1 drawer BB1RGL_GPFS4_DATA1 2 drawer 2 drawer BB1RGL_GPFS1_META1 1 enclosure + 1 drawer 1 enclosure + 1 drawer BB1RGL_GPFS1_DATA1 2 drawer 2 drawer BB1RGL_GPFS3_META1 1 enclosure + 1 drawer 1 enclosure + 1 drawer BB1RGL_GPFS3_DATA1 2 drawer 2 drawer BB1RGL_GPFS2_META1 1 enclosure + 1 drawer 1 enclosure + 1 drawer BB1RGL_GPFS2_DATA1 2 drawer 2 drawer BB1RGL_GPFS2_META2 3 enclosure 1 enclosure + 1 drawer limited by rg descriptor BB1RGL_GPFS2_DATA2 2 drawer 2 drawer BB1RGL_GPFS1_META2 3 enclosure 1 enclosure + 1 drawer limited by rg descriptor BB1RGL_GPFS1_DATA2 2 drawer 2 drawer BB1RGL_GPFS5_META1 1 enclosure + 1 drawer 1 enclosure + 1 drawer BB1RGL_GPFS5_DATA1 2 drawer 2 drawer BB1RGL_GPFS5_META2 3 enclosure 1 enclosure + 1 drawer limited by rg descriptor BB1RGL_GPFS5_DATA2 2 drawer 2 drawer active recovery group server servers ----------------------------------------------- ------- server1 server1,server2 Atmane Khiredine HPC System Administrator | Office National de la M?t?orologie T?l : +213 21 50 73 93 # 303 | Fax : +213 21 50 79 40 | E-mail : a.khiredine at meteo.dz ________________________________ De : Laurence Horrocks-Barlow [laurence at qsplace.co.uk] Envoy? : dimanche 9 juillet 2017 09:58 ? : gpfsug main discussion list; atmane khiredine; gpfsug-discuss at spectrumscale.org Objet : Re: [gpfsug-discuss] get free space in GSS You can check the recovery groups to see if there is any remaining space. I don't have access to my test system to confirm the syntax however if memory serves. Run mmlsrecoverygroup to get a list of all the recovery groups then: mmlsrecoverygroup -L This will list all your declustered arrays and their free space. Their might be another method, however this way has always worked well for me. -- Lauz On 9 July 2017 09:00:07 BST, Atmane wrote: Dear all, My name is Khiredine Atmane and I am a HPC system administrator at the National Office of Meteorology Algeria . We have a GSS24 running gss2.5.10.3-3b and gpfs-4.2.0.3. GSS configuration: 4 enclosures, 6 SSDs, 1 empty slots, 239 disks total, 0 NVRAM partitions disks = 3Tb SSD = 200 Gb df -h Filesystem Size Used Avail Use% Mounted on /dev/gpfs1 49T 18T 31T 38% /gpfs1 /dev/gpfs2 53T 13T 40T 25% /gpfs2 /dev/gpfs3 25T 4.9T 20T 21% /gpfs3 /dev/gpfs4 11T 133M 11T 1% /gpfs4 /dev/gpfs5 323T 34T 290T 11% /gpfs5 Total Is 461 To I think we have more space Could anyone make recommendation to troubleshoot find how many free space in GSS ? How to find the available space ? Thank you! Atmane -- Sent from my Android device with K-9 Mail. Please excuse my brevity. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From S.J.Thompson at bham.ac.uk Mon Jul 17 13:13:58 2017 From: S.J.Thompson at bham.ac.uk (Simon Thompson (IT Research Support)) Date: Mon, 17 Jul 2017 12:13:58 +0000 Subject: [gpfsug-discuss] Job Vacancy: Research Storage Systems Senior Specialist/Specialist Message-ID: Hi all, Members of this group may be particularly interested in the role "Research Storage Systems Senior Specialist/Specialist"... As part of the University of Birmingham's investment in our ability to support outstanding research by providing technical computing facilities, we are expanding the team and currently have 6 vacancies. I've provided a short description of each post, but please do follow the links where you will find the full job description attached at the bottom of the page. For some of the posts, they are graded either at 7 or 8 and will be appointed based upon skills and experience, the expectation is that if the appointment is made at grade 7 that as the successful candidate grows into the role, we should be able to regrade up. Research Storage Systems Senior Specialist/Specialist: https://goo.gl/NsL1EG Responsible for the delivery and maintenance of research storage systems, focussed on the delivery of Spectrum Scale storage systems and data protection. (this is available either as a grade 8 or grade 7 post depending on skills and experience so may suit someone wishing to grow into the senior role) HPC Specialist post (Research Systems Administrator / Senior Research Systems Administrator): https://goo.gl/1SxM4j Helping to deliver and operationally support the technical computing environments, with a focus on supporting and delivery of HPC and HTC services. (this is available either as a grade 7 or grade 8 post depending on skills and experience so may suit someone wishing to grow into the senior role) Research Computing (Analytics): https://goo.gl/uCNdMH Helping our researchers to understand data analytics and supporting their research Senior Research Software Engineer: https://goo.gl/dcGgAz Working with research groups to develop and deliver bespoke software solutions to support their research Research Training and Engagement Officer: https://goo.gl/U48m7z Helping with the delivery and coordination of training and engagement works to support users helping ensure they are able to use the facilities to support their research. Research IT Partner in the College of Arts and Law: https://goo.gl/A7czEA Providing technical knowledge and skills to support project delivery through research bid preparation to successful solution delivery. Simon From cgirda at wustl.edu Mon Jul 17 20:40:42 2017 From: cgirda at wustl.edu (Chakravarthy Girda) Date: Mon, 17 Jul 2017 14:40:42 -0500 Subject: [gpfsug-discuss] Setting up IBM Spectrum Scale performance monitoring bridge for Grafana Message-ID: Hello Team, This is Chakri from Washu at STL. Thank you for the great opportunity to join this group. I am trying to setup performance monitoring for our GPFS cluster. As part of the project configured pmcollector and pmsensors on our GPFS cluster. 1. Created a 'spectrumscale' data-source bridge on our grafana ( NOT SET TO DEFAULT ) https://www.ibm.com/support/knowledgecenter/en/STXKQY_4.2.3/com.ibm.spectrum.scale.v4r23.doc/bl1adv_setuppmbridgeforgrafana.htm 2. Created a new dash-board by importing the pre-built dashboard. https://www.ibm.com/developerworks/community/wikis/home?lang=en#!/wiki/General%20Parallel%20File%20System%20%28GPFS%29/page/Importing%20predefined%20Grafana%20dashboards Here is the issue. I don't get any graph updates if I don't set "spectrumscale" as DEFAULT data-source but that is breaking rest of the graphs ( we have ton of dashboards). So I had to uncheck the "spectrumscale" as default data-source. If I go and set the "data-source" manually to "spectrumscale" on the pre-built dashboard graphs. I see the wheel spinning but no updates. Any ideas? Thank you Chakri From Robert.Oesterlin at nuance.com Tue Jul 18 12:45:38 2017 From: Robert.Oesterlin at nuance.com (Oesterlin, Robert) Date: Tue, 18 Jul 2017 11:45:38 +0000 Subject: [gpfsug-discuss] Setting up IBM Spectrum Scale performance monitoring bridge for Grafana Message-ID: Hi Chakri If you?re getting the ?ole ?spinning wheel? on your dashboard, then it?s one of two things: 1) The Grafana bridge is not running 2) The dashboard is requesting a metric that isn?t available. Assuming that you?ve verified that the pmcollector/pmsensor setup is work right in your cluster, I?d then start looking at the log files for the Grafana Bridge and the pmcollector to see if you can determine if either is producing an error - like the metric wasn?t found. The other thing to try is setup a small test graph with a known metric being collected by you pmsensor configuration, rather than try one of Helene?s default dashboards, which are fairly complex. Drop me a note directly if you need to. Bob Oesterlin Sr Principal Storage Engineer, Nuance From cgirda at wustl.edu Tue Jul 18 15:57:05 2017 From: cgirda at wustl.edu (Chakravarthy Girda) Date: Tue, 18 Jul 2017 09:57:05 -0500 Subject: [gpfsug-discuss] Setting up IBM Spectrum Scale performance monitoring bridge for Grafana In-Reply-To: References: Message-ID: Bob, Found the issue to be with https is getting blocked with "direct" connection. Switched it to proxy on the bridge-port. That helped and now I can see graphs. Thank you Chakri On 7/18/17 6:45 AM, Oesterlin, Robert wrote: > Hi Chakri > > If you?re getting the ?ole ?spinning wheel? on your dashboard, then it?s one of two things: > > 1) The Grafana bridge is not running > 2) The dashboard is requesting a metric that isn?t available. > > Assuming that you?ve verified that the pmcollector/pmsensor setup is work right in your cluster, I?d then start looking at the log files for the Grafana Bridge and the pmcollector to see if you can determine if either is producing an error - like the metric wasn?t found. The other thing to try is setup a small test graph with a known metric being collected by you pmsensor configuration, rather than try one of Helene?s default dashboards, which are fairly complex. > > Drop me a note directly if you need to. > > Bob Oesterlin > Sr Principal Storage Engineer, Nuance > > > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss From david_johnson at brown.edu Tue Jul 18 18:21:06 2017 From: david_johnson at brown.edu (David Johnson) Date: Tue, 18 Jul 2017 13:21:06 -0400 Subject: [gpfsug-discuss] mmsysmon.py revisited Message-ID: We also noticed a fair amount of CPU time accumulated by mmsysmon.py on our diskless compute nodes. I read the earlier query, where it was answered: > ces == Cluster Export Services, mmsysmon.py comes from mmcesmon. It is used for managing export services of GPFS. If it is killed, your nfs/smb etc will be out of work. > Their overhead is small and they are very important. Don't attempt to kill them. > Our question is this ? we don?t run the latest ?protocols", our NFS is CNFS, and our CIFS is clustered CIFS. I can understand it might be needed with Ganesha, but on every node? Why in the world would I be getting this daemon running on all client nodes, when I didn?t install the ?protocols" version of the distribution? We have release 4.2.2 at the moment. How can we disable this? Thanks, ? ddj -------------- next part -------------- An HTML attachment was scrubbed... URL: From jonathon.anderson at colorado.edu Tue Jul 18 18:51:21 2017 From: jonathon.anderson at colorado.edu (Jonathon A Anderson) Date: Tue, 18 Jul 2017 17:51:21 +0000 Subject: [gpfsug-discuss] mmsysmon.py revisited In-Reply-To: References: Message-ID: There?s no official way to cleanly disable it so far as I know yet; but you can defacto disable it by deleting /var/mmfs/mmsysmon/mmsysmonitor.conf. It?s a huge problem. I don?t understand why it hasn?t been given much credit by dev or support. ~jonathon On 7/18/17, 11:21 AM, "gpfsug-discuss-bounces at spectrumscale.org on behalf of David Johnson" wrote: We also noticed a fair amount of CPU time accumulated by mmsysmon.py on our diskless compute nodes. I read the earlier query, where it was answered: ces == Cluster Export Services, mmsysmon.py comes from mmcesmon. It is used for managing export services of GPFS. If it is killed, your nfs/smb etc will be out of work. Their overhead is small and they are very important. Don't attempt to kill them. Our question is this ? we don?t run the latest ?protocols", our NFS is CNFS, and our CIFS is clustered CIFS. I can understand it might be needed with Ganesha, but on every node? Why in the world would I be getting this daemon running on all client nodes, when I didn?t install the ?protocols" version of the distribution? We have release 4.2.2 at the moment. How can we disable this? Thanks, ? ddj From S.J.Thompson at bham.ac.uk Tue Jul 18 20:21:46 2017 From: S.J.Thompson at bham.ac.uk (Simon Thompson (IT Research Support)) Date: Tue, 18 Jul 2017 19:21:46 +0000 Subject: [gpfsug-discuss] SOBAR questions In-Reply-To: References: Message-ID: So just following up on my questions from January. We tried to do 2. I.e. Restore to a new file-system with different block sizes. It got part way through creating the file-sets on the new SOBAR file-system and then GPFS asserts and crashes... We weren't actually intentionally trying to move block sizes, but because we were restoring from a traditional SAN based system to a shiny new GNR based system, we'd manually done the FS create steps. I have a PMR open now. I don't know if someone internally in IBM actually tried this after my emails, as apparently there is a similar internal defect which is ~6 months old... Simon From: > on behalf of Marc A Kaplan > Reply-To: "gpfsug-discuss at spectrumscale.org" > Date: Friday, 20 January 2017 at 17:57 To: "gpfsug-discuss at spectrumscale.org" > Subject: Re: [gpfsug-discuss] SOBAR questions I worked on some aspects of SOBAR, but without studying and testing the commands - I'm not in a position right now to give simple definitive answers - having said that.... Generally your questions are reasonable and the answer is: "Yes it should be possible to do that, but you might be going a bit beyond the design point.., so you'll need to try it out on a (smaller) test system with some smaller tedst files. Point by point. 1. If SOBAR is unable to restore a particular file, perhaps because the premigration did not complete -- you should only lose that particular file, and otherwise "keep going". 2. I think SOBAR helps you build a similar file system to the original, including block sizes. So you'd have to go in and tweak the file system creation step(s). I think this is reasonable... If you hit a problem... IMO that would be a fair APAR. 3. Similar to 2. From: "Simon Thompson (Research Computing - IT Services)" > To: "gpfsug-discuss at spectrumscale.org" > Date: 01/20/2017 10:44 AM Subject: [gpfsug-discuss] SOBAR questions Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ We've recently been looking at deploying SOBAR to support DR of some of our file-systems, I have some questions (as ever!) that I can't see are clearly documented, so was wondering if anyone has any insight on this. 1. If we elect not to premigrate certain files, are we still able to use SOBAR? We are happy to take a hit that those files will never be available again, but some are multi TB files which change daily and we can't stream to tape effectively. 2. When doing a restore, does the block size of the new SOBAR'd to file-system have to match? For example the old FS was 1MB blocks, the new FS we create with 2MB blocks. Will this work (this strikes me as one way we might be able to migrate an FS to a new block size?)? 3. If the file-system was originally created with an older GPFS code but has since been upgraded, does restore work, and does it matter what client code? E.g. We have a file-system that was originally 3.5.x, its been upgraded over time to 4.2.2.0. Will this work if the client code was say 4.2.2.5 (with an appropriate FS version). E.g. Mmlsfs lists, "13.01 (3.5.0.0) Original file system version" and "16.00 (4.2.2.0) Current file system version". Say there was 4.2.2.5 which created version 16.01 file-system as the new FS, what would happen? This sort of detail is missing from: https://www.ibm.com/support/knowledgecenter/STXKQY_4.2.2/com.ibm.spectrum.s cale.v4r22.doc/bl1adv_sobarrestore.htm But is probably quite important for us to know! Thanks Simon _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From leslie.james.elliott at gmail.com Wed Jul 19 08:22:49 2017 From: leslie.james.elliott at gmail.com (leslie elliott) Date: Wed, 19 Jul 2017 17:22:49 +1000 Subject: [gpfsug-discuss] AFM over NFS Message-ID: we are having a problem linking a target to a fileset we are able to manually connect with NFSv4 to the correct path on an NFS export down a particular subdirectory path, but when when we create a fileset with this same path as an afmTarget it connects with NFSv3 and actually connects to the top of the export even though mmafmctl displays the extended path information are we able to tell AFM to connect with NFSv4 in any way to work around this problem the NFS comes from a closed system, we can not change the configuration on it to fix the problem on the target thanks leslie -------------- next part -------------- An HTML attachment was scrubbed... URL: From Greg.Lehmann at csiro.au Wed Jul 19 08:53:58 2017 From: Greg.Lehmann at csiro.au (Greg.Lehmann at csiro.au) Date: Wed, 19 Jul 2017 07:53:58 +0000 Subject: [gpfsug-discuss] Scale / Perfmon / Grafana - services running but no data In-Reply-To: References: Message-ID: I?m having a play with this now too. Has anybody coded a systemd unit to handle step 2b in the knowledge centre article ? bridge creation on the gpfs side? It would save me a bit of effort. I?m also wondering about the CherryPy version. It looks like this has been developed on SLES which has the newer version mentioned as a standard package and yet RHEL with an older version of CherryPy is perhaps more common as it seems to have the best support for features of GPFS, like object and block protocols. Maybe SLES is in favour now? Cheers, Greg From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Andrew Beattie Sent: Thursday, 6 July 2017 3:07 PM To: gpfsug-discuss at spectrumscale.org Subject: [gpfsug-discuss] Scale / Perfmon / Grafana - services running but no data Greetings, I'm currently setting up Grafana to interact with one of our Scale Clusters and i've followed the knowledge centre link in terms of setup. https://www.ibm.com/support/knowledgecenter/en/STXKQY_4.2.3/com.ibm.spectrum.scale.v4r23.doc/bl1adv_setuppmbridgeforgrafana.htm However while everything appears to be working i'm not seeing any data coming through the reports within the grafana server, even though I can see data in the Scale GUI The current environment: [root at sc01n02 ~]# mmlscluster GPFS cluster information ======================== GPFS cluster name: sc01.spectrum GPFS cluster id: 18085710661892594990 GPFS UID domain: sc01.spectrum Remote shell command: /usr/bin/ssh Remote file copy command: /usr/bin/scp Repository type: CCR Node Daemon node name IP address Admin node name Designation ------------------------------------------------------------------ 1 sc01n01 10.2.12.11 sc01n01 quorum-manager-perfmon 2 sc01n02 10.2.12.12 sc01n02 quorum-manager-perfmon 3 sc01n03 10.2.12.13 sc01n03 quorum-manager-perfmon [root at sc01n02 ~]# [root at sc01n02 ~]# mmlsconfig Configuration data for cluster sc01.spectrum: --------------------------------------------- clusterName sc01.spectrum clusterId 18085710661892594990 autoload yes profile gpfsProtocolDefaults dmapiFileHandleSize 32 minReleaseLevel 4.2.2.0 ccrEnabled yes cipherList AUTHONLY maxblocksize 16M [cesNodes] maxMBpS 5000 numaMemoryInterleave yes enforceFilesetQuotaOnRoot yes workerThreads 512 [common] tscCmdPortRange 60000-61000 cesSharedRoot /ibm/cesSharedRoot/ces cifsBypassTraversalChecking yes syncSambaMetadataOps yes cifsBypassShareLocksOnRename yes adminMode central File systems in cluster sc01.spectrum: -------------------------------------- /dev/cesSharedRoot /dev/icos_demo /dev/scale01 [root at sc01n02 ~]# [root at sc01n02 ~]# systemctl status pmcollector ? pmcollector.service - LSB: Start the ZIMon performance monitor collector. Loaded: loaded (/etc/rc.d/init.d/pmcollector) Active: active (running) since Tue 2017-05-30 08:46:32 AEST; 1 months 6 days ago Docs: man:systemd-sysv-generator(8) Main PID: 2693 (ZIMonCollector) CGroup: /system.slice/pmcollector.service ??2693 /opt/IBM/zimon/ZIMonCollector -C /opt/IBM/zimon/ZIMonCollector.cfg... ??2698 python /opt/IBM/zimon/bin/pmmonitor.py -f /opt/IBM/zimon/syshealth... May 30 08:46:32 sc01n02 systemd[1]: Starting LSB: Start the ZIMon performance mon...... May 30 08:46:32 sc01n02 pmcollector[2584]: Starting performance monitor collector... May 30 08:46:32 sc01n02 systemd[1]: Started LSB: Start the ZIMon performance moni...r.. Hint: Some lines were ellipsized, use -l to show in full. From Grafana Server: [cid:image002.jpg at 01D300B7.CFE73E50] when I send a set of files to the cluster (3.8GB) I can see performance metrics within the Scale GUI [cid:image004.jpg at 01D300B7.CFE73E50] yet from the Grafana Dashboard im not seeing any data points [cid:image006.jpg at 01D300B7.CFE73E50] Can anyone provide some hints as to what might be happening? Regards, Andrew Beattie Software Defined Storage - IT Specialist Phone: 614-2133-7927 E-mail: abeattie at au1.ibm.com -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image002.jpg Type: image/jpeg Size: 19427 bytes Desc: image002.jpg URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image004.jpg Type: image/jpeg Size: 84412 bytes Desc: image004.jpg URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image006.jpg Type: image/jpeg Size: 37285 bytes Desc: image006.jpg URL: From janfrode at tanso.net Wed Jul 19 12:09:48 2017 From: janfrode at tanso.net (Jan-Frode Myklebust) Date: Wed, 19 Jul 2017 11:09:48 +0000 Subject: [gpfsug-discuss] SOBAR questions In-Reply-To: References: Message-ID: Nils Haustein did such a migration from v7000 Unified to ESS last year. Used SOBAR to avoid recalls from HSM. I believe he wrote a whitepaper on the process.. -jf tir. 18. jul. 2017 kl. 21.21 skrev Simon Thompson (IT Research Support) < S.J.Thompson at bham.ac.uk>: > So just following up on my questions from January. > > We tried to do 2. I.e. Restore to a new file-system with different block > sizes. It got part way through creating the file-sets on the new SOBAR > file-system and then GPFS asserts and crashes... We weren't actually > intentionally trying to move block sizes, but because we were restoring > from a traditional SAN based system to a shiny new GNR based system, we'd > manually done the FS create steps. > > I have a PMR open now. I don't know if someone internally in IBM actually > tried this after my emails, as apparently there is a similar internal > defect which is ~6 months old... > > Simon > > From: on behalf of Marc A > Kaplan > Reply-To: "gpfsug-discuss at spectrumscale.org" < > gpfsug-discuss at spectrumscale.org> > Date: Friday, 20 January 2017 at 17:57 > > To: "gpfsug-discuss at spectrumscale.org" > Subject: Re: [gpfsug-discuss] SOBAR questions > > I worked on some aspects of SOBAR, but without studying and testing the > commands - I'm not in a position right now to give simple definitive > answers - > having said that.... > > Generally your questions are reasonable and the answer is: "Yes it should > be possible to do that, but you might be going a bit beyond the design > point.., > so you'll need to try it out on a (smaller) test system with some smaller > tedst files. > > Point by point. > > 1. If SOBAR is unable to restore a particular file, perhaps because the > premigration did not complete -- you should only lose that particular file, > and otherwise "keep going". > > 2. I think SOBAR helps you build a similar file system to the original, > including block sizes. So you'd have to go in and tweak the file system > creation step(s). > I think this is reasonable... If you hit a problem... IMO that would be a > fair APAR. > > 3. Similar to 2. > > > > > > From: "Simon Thompson (Research Computing - IT Services)" < > S.J.Thompson at bham.ac.uk> > To: "gpfsug-discuss at spectrumscale.org" < > gpfsug-discuss at spectrumscale.org> > Date: 01/20/2017 10:44 AM > Subject: [gpfsug-discuss] SOBAR questions > Sent by: gpfsug-discuss-bounces at spectrumscale.org > ------------------------------ > > > > We've recently been looking at deploying SOBAR to support DR of some of > our file-systems, I have some questions (as ever!) that I can't see are > clearly documented, so was wondering if anyone has any insight on this. > > 1. If we elect not to premigrate certain files, are we still able to use > SOBAR? We are happy to take a hit that those files will never be available > again, but some are multi TB files which change daily and we can't stream > to tape effectively. > > 2. When doing a restore, does the block size of the new SOBAR'd to > file-system have to match? For example the old FS was 1MB blocks, the new > FS we create with 2MB blocks. Will this work (this strikes me as one way > we might be able to migrate an FS to a new block size?)? > > 3. If the file-system was originally created with an older GPFS code but > has since been upgraded, does restore work, and does it matter what client > code? E.g. We have a file-system that was originally 3.5.x, its been > upgraded over time to 4.2.2.0. Will this work if the client code was say > 4.2.2.5 (with an appropriate FS version). E.g. Mmlsfs lists, "13.01 > (3.5.0.0) Original file system version" and "16.00 (4.2.2.0) Current file > system version". Say there was 4.2.2.5 which created version 16.01 > file-system as the new FS, what would happen? > > This sort of detail is missing from: > https://www.ibm.com/support/knowledgecenter/STXKQY_4.2.2/com.ibm.spectrum.s > cale.v4r22.doc/bl1adv_sobarrestore.htm > > But is probably quite important for us to know! > > Thanks > > Simon > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: From Robert.Oesterlin at nuance.com Wed Jul 19 12:26:43 2017 From: Robert.Oesterlin at nuance.com (Oesterlin, Robert) Date: Wed, 19 Jul 2017 11:26:43 +0000 Subject: [gpfsug-discuss] Scale / Perfmon / Grafana - services running but no data In-Reply-To: References: Message-ID: Getting this: python zimonGrafanaIntf.py ?s < pmcollector host> via system is a bit of a tricky process, since this process will abort unless the pmcollector is fully up. With a large database, I?ve seen it take 3-5 mins for pmcollector to fully initialize. I?m sure a simple ?sleep and try again? wrapper would take care of that. It?s on my lengthy to-do list! On the CherryPy version - I run the bridge on my RH/Centos system with python 3.4 and used ?pip install cherrypy? and it picked up the latest version. Seems to work just fine. Bob Oesterlin Sr Principal Storage Engineer, Nuance From: on behalf of "Greg.Lehmann at csiro.au" Reply-To: gpfsug main discussion list Date: Wednesday, July 19, 2017 at 2:54 AM To: "gpfsug-discuss at spectrumscale.org" Subject: [EXTERNAL] Re: [gpfsug-discuss] Scale / Perfmon / Grafana - services running but no data I?m having a play with this now too. Has anybody coded a systemd unit to handle step 2b in the knowledge centre article ? bridge creation on the gpfs side? It would save me a bit of effort. I?m also wondering about the CherryPy version. It looks like this has been developed on SLES which has the newer version mentioned as a standard package and yet RHEL with an older version of CherryPy is perhaps more common as it seems to have the best support for features of GPFS, like object and block protocols. Maybe SLES is in favour now? -------------- next part -------------- An HTML attachment was scrubbed... URL: From MDIETZ at de.ibm.com Wed Jul 19 14:05:49 2017 From: MDIETZ at de.ibm.com (Mathias Dietz) Date: Wed, 19 Jul 2017 15:05:49 +0200 Subject: [gpfsug-discuss] mmsysmon.py revisited In-Reply-To: References: Message-ID: thanks for the feedback. Let me clarify what mmsysmon is doing. Since IBM Spectrum Scale 4.2.1 the mmsysmon process is used for the overall health monitoring and CES failover handling. Even without CES it is an essential part of the system because it monitors the individual components and provides health state information and error events. This information is needed by other Spectrum Scale components (mmhealth command, the IBM Spectrum Scale GUI, Support tools, Install Toolkit,..) and therefore disabling mmsysmon will impact them. > It?s a huge problem. I don?t understand why it hasn?t been given > much credit by dev or support. Over the last couple of month, the development team has put a strong focus on this topic. In order to monitor the health of the individual components, mmsysmon listens for notifications/callback but also has to do some polling. We are trying to reduce the polling overhead constantly and replace polling with notifications when possible. Several improvements have been added to 4.2.3, including the ability to configure the polling frequency to reduce the overhead. (mmhealth config interval) See https://www.ibm.com/support/knowledgecenter/en/STXKQY_4.2.3/com.ibm.spectrum.scale.v4r23.doc/bl1adm_mmhealth.htm In addition a new option has been introduced to clock align the monitoring threads in order to reduce CPU jitter. Nevertheless, we don't see significant CPU consumption by mmsysmon on our test systems. It might be a problem specific to your system environment or a wrong configuration therefore please get in contact with IBM support to analyze the root cause of the high usage. Kind regards Mathias Dietz IBM Spectrum Scale - Release Lead Architect and RAS Architect gpfsug-discuss-bounces at spectrumscale.org wrote on 07/18/2017 07:51:21 PM: > From: Jonathon A Anderson > To: gpfsug main discussion list > Date: 07/18/2017 07:51 PM > Subject: Re: [gpfsug-discuss] mmsysmon.py revisited > Sent by: gpfsug-discuss-bounces at spectrumscale.org > > There?s no official way to cleanly disable it so far as I know yet; > but you can defacto disable it by deleting /var/mmfs/mmsysmon/ > mmsysmonitor.conf. > > It?s a huge problem. I don?t understand why it hasn?t been given > much credit by dev or support. > > ~jonathon > > > On 7/18/17, 11:21 AM, "gpfsug-discuss-bounces at spectrumscale.org on > behalf of David Johnson" on behalf of david_johnson at brown.edu> wrote: > > > > > We also noticed a fair amount of CPU time accumulated by mmsysmon.py on > our diskless compute nodes. I read the earlier query, where it > was answered: > > > > > ces == Cluster Export Services, mmsysmon.py comes from > mmcesmon. It is used for managing export services of GPFS. If it is > killed, your nfs/smb etc will be out of work. > Their overhead is small and they are very important. Don't > attempt to kill them. > > > > > > > Our question is this ? we don?t run the latest ?protocols", our > NFS is CNFS, and our CIFS is clustered CIFS. > I can understand it might be needed with Ganesha, but on every node? > > > Why in the world would I be getting this daemon running on all > client nodes, when I didn?t install the ?protocols" version > of the distribution? We have release 4.2.2 at the moment. How > can we disable this? > > > Thanks, > ? ddj > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From david_johnson at brown.edu Wed Jul 19 14:28:23 2017 From: david_johnson at brown.edu (David Johnson) Date: Wed, 19 Jul 2017 09:28:23 -0400 Subject: [gpfsug-discuss] mmsysmon.py revisited In-Reply-To: References: Message-ID: I have opened a PMR, and the official response reflects what you just posted. In addition, it seems there are some performance issues with Python 2 that will be improved with eventual migration to Python 3. I was unaware of the mmhealth functions that the mmsysmon daemon provides. The impact we were seeing was some variation in MPI benchmark results when the nodes were fully loaded. I suppose it would be possible to turn off mmsysmon during the benchmarking, but I appreciate the effort at streamlining the monitor service. Cutting back on fork/exec, better python, less polling, more notifications? all good. Thanks for the details, ? ddj > On Jul 19, 2017, at 9:05 AM, Mathias Dietz wrote: > > thanks for the feedback. > > Let me clarify what mmsysmon is doing. > Since IBM Spectrum Scale 4.2.1 the mmsysmon process is used for the overall health monitoring and CES failover handling. > Even without CES it is an essential part of the system because it monitors the individual components and provides health state information and error events. > This information is needed by other Spectrum Scale components (mmhealth command, the IBM Spectrum Scale GUI, Support tools, Install Toolkit,..) and therefore disabling mmsysmon will impact them. > > > It?s a huge problem. I don?t understand why it hasn?t been given > > much credit by dev or support. > > Over the last couple of month, the development team has put a strong focus on this topic. > In order to monitor the health of the individual components, mmsysmon listens for notifications/callback but also has to do some polling. > We are trying to reduce the polling overhead constantly and replace polling with notifications when possible. > > Several improvements have been added to 4.2.3, including the ability to configure the polling frequency to reduce the overhead. (mmhealth config interval) > See https://www.ibm.com/support/knowledgecenter/en/STXKQY_4.2.3/com.ibm.spectrum.scale.v4r23.doc/bl1adm_mmhealth.htm > In addition a new option has been introduced to clock align the monitoring threads in order to reduce CPU jitter. > > Nevertheless, we don't see significant CPU consumption by mmsysmon on our test systems. > It might be a problem specific to your system environment or a wrong configuration therefore please get in contact with IBM support to analyze the root cause of the high usage. > > Kind regards > > Mathias Dietz > > IBM Spectrum Scale - Release Lead Architect and RAS Architect > > > gpfsug-discuss-bounces at spectrumscale.org wrote on 07/18/2017 07:51:21 PM: > > > From: Jonathon A Anderson > > To: gpfsug main discussion list > > Date: 07/18/2017 07:51 PM > > Subject: Re: [gpfsug-discuss] mmsysmon.py revisited > > Sent by: gpfsug-discuss-bounces at spectrumscale.org > > > > There?s no official way to cleanly disable it so far as I know yet; > > but you can defacto disable it by deleting /var/mmfs/mmsysmon/ > > mmsysmonitor.conf. > > > > It?s a huge problem. I don?t understand why it hasn?t been given > > much credit by dev or support. > > > > ~jonathon > > > > > > On 7/18/17, 11:21 AM, "gpfsug-discuss-bounces at spectrumscale.org on > > behalf of David Johnson" > on behalf of david_johnson at brown.edu> wrote: > > > > > > > > > > We also noticed a fair amount of CPU time accumulated by mmsysmon.py on > > our diskless compute nodes. I read the earlier query, where it > > was answered: > > > > > > > > > > ces == Cluster Export Services, mmsysmon.py comes from > > mmcesmon. It is used for managing export services of GPFS. If it is > > killed, your nfs/smb etc will be out of work. > > Their overhead is small and they are very important. Don't > > attempt to kill them. > > > > > > > > > > > > > > Our question is this ? we don?t run the latest ?protocols", our > > NFS is CNFS, and our CIFS is clustered CIFS. > > I can understand it might be needed with Ganesha, but on every node? > > > > > > Why in the world would I be getting this daemon running on all > > client nodes, when I didn?t install the ?protocols" version > > of the distribution? We have release 4.2.2 at the moment. How > > can we disable this? > > > > > > Thanks, > > ? ddj > > > > > > _______________________________________________ > > gpfsug-discuss mailing list > > gpfsug-discuss at spectrumscale.org > > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From mdharris at us.ibm.com Wed Jul 19 15:40:17 2017 From: mdharris at us.ibm.com (Michael D Harris) Date: Wed, 19 Jul 2017 10:40:17 -0400 Subject: [gpfsug-discuss] mmsysmon.py revisited In-Reply-To: References: Message-ID: Hi David, Re: "The impact we were seeing was some variation in MPI benchmark results when the nodes were fully loaded." MPI workloads show the most mmhealth impact. Specifically the more sensitive the workload is to jitter the higher the potential impact. The mmhealth config interval, as per Mathias's link, is a scalar applied to all monitor interval values in the configuration file. As such it currently modifies the server side monitoring and health reporting in addition to mitigating mpi client impact. So "medium" == 5 is a good perhaps reasonable value - whereas the "slow" == 10 scalar may be too infrequent for your server side monitoring and reporting (so your 30 second update becomes 5 minutes). The clock alignment that Mathias mentioned is a new investigatory undocumented tool for MPI workloads. It nearly completely removes all mmhealth MPI jitter while retaining default monitor intervals. It also naturally generates thundering herds of all client reporting to the quorum nodes. So while you may mitigate the client MPI jitter you may severely impact the server throughput on those intervals if not also exceed connection and thread limits. Configuring "clients" separately from "servers" without resorting to alignment is another area of investigation. I'm not familiar with your PMR but as Mathias mentioned "mmhealth config interval medium" would be a good start. In testing that Kums and I have done the "mmhealth config interval medium" value provides mitigation almost as good as the mentioned clock alignment for MPI for say a psnap with barrier type workload . Regards, Mike Harris IBM Spectrum Scale - Core Team From: gpfsug-discuss-request at spectrumscale.org To: gpfsug-discuss at spectrumscale.org Date: 07/19/2017 09:28 AM Subject: gpfsug-discuss Digest, Vol 66, Issue 30 Sent by: gpfsug-discuss-bounces at spectrumscale.org Send gpfsug-discuss mailing list submissions to gpfsug-discuss at spectrumscale.org To subscribe or unsubscribe via the World Wide Web, visit http://gpfsug.org/mailman/listinfo/gpfsug-discuss or, via email, send a message with subject or body 'help' to gpfsug-discuss-request at spectrumscale.org You can reach the person managing the list at gpfsug-discuss-owner at spectrumscale.org When replying, please edit your Subject line so it is more specific than "Re: Contents of gpfsug-discuss digest..." Today's Topics: 1. Re: mmsysmon.py revisited (Mathias Dietz) 2. Re: mmsysmon.py revisited (David Johnson) ---------------------------------------------------------------------- Message: 1 Date: Wed, 19 Jul 2017 15:05:49 +0200 From: "Mathias Dietz" To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] mmsysmon.py revisited Message-ID: Content-Type: text/plain; charset="iso-8859-1" thanks for the feedback. Let me clarify what mmsysmon is doing. Since IBM Spectrum Scale 4.2.1 the mmsysmon process is used for the overall health monitoring and CES failover handling. Even without CES it is an essential part of the system because it monitors the individual components and provides health state information and error events. This information is needed by other Spectrum Scale components (mmhealth command, the IBM Spectrum Scale GUI, Support tools, Install Toolkit,..) and therefore disabling mmsysmon will impact them. > It?s a huge problem. I don?t understand why it hasn?t been given > much credit by dev or support. Over the last couple of month, the development team has put a strong focus on this topic. In order to monitor the health of the individual components, mmsysmon listens for notifications/callback but also has to do some polling. We are trying to reduce the polling overhead constantly and replace polling with notifications when possible. Several improvements have been added to 4.2.3, including the ability to configure the polling frequency to reduce the overhead. (mmhealth config interval) See https://www.ibm.com/support/knowledgecenter/en/STXKQY_4.2.3/com.ibm.spectrum.scale.v4r23.doc/bl1adm_mmhealth.htm In addition a new option has been introduced to clock align the monitoring threads in order to reduce CPU jitter. Nevertheless, we don't see significant CPU consumption by mmsysmon on our test systems. It might be a problem specific to your system environment or a wrong configuration therefore please get in contact with IBM support to analyze the root cause of the high usage. Kind regards Mathias Dietz IBM Spectrum Scale - Release Lead Architect and RAS Architect gpfsug-discuss-bounces at spectrumscale.org wrote on 07/18/2017 07:51:21 PM: > From: Jonathon A Anderson > To: gpfsug main discussion list > Date: 07/18/2017 07:51 PM > Subject: Re: [gpfsug-discuss] mmsysmon.py revisited > Sent by: gpfsug-discuss-bounces at spectrumscale.org > > There?s no official way to cleanly disable it so far as I know yet; > but you can defacto disable it by deleting /var/mmfs/mmsysmon/ > mmsysmonitor.conf. > > It?s a huge problem. I don?t understand why it hasn?t been given > much credit by dev or support. > > ~jonathon > > > On 7/18/17, 11:21 AM, "gpfsug-discuss-bounces at spectrumscale.org on > behalf of David Johnson" on behalf of david_johnson at brown.edu> wrote: > > > > > We also noticed a fair amount of CPU time accumulated by mmsysmon.py on > our diskless compute nodes. I read the earlier query, where it > was answered: > > > > > ces == Cluster Export Services, mmsysmon.py comes from > mmcesmon. It is used for managing export services of GPFS. If it is > killed, your nfs/smb etc will be out of work. > Their overhead is small and they are very important. Don't > attempt to kill them. > > > > > > > Our question is this ? we don?t run the latest ?protocols", our > NFS is CNFS, and our CIFS is clustered CIFS. > I can understand it might be needed with Ganesha, but on every node? > > > Why in the world would I be getting this daemon running on all > client nodes, when I didn?t install the ?protocols" version > of the distribution? We have release 4.2.2 at the moment. How > can we disable this? > > > Thanks, > ? ddj > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: < http://gpfsug.org/pipermail/gpfsug-discuss/attachments/20170719/8c0e33e9/attachment-0001.html > ------------------------------ Message: 2 Date: Wed, 19 Jul 2017 09:28:23 -0400 From: David Johnson To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] mmsysmon.py revisited Message-ID: Content-Type: text/plain; charset="utf-8" I have opened a PMR, and the official response reflects what you just posted. In addition, it seems there are some performance issues with Python 2 that will be improved with eventual migration to Python 3. I was unaware of the mmhealth functions that the mmsysmon daemon provides. The impact we were seeing was some variation in MPI benchmark results when the nodes were fully loaded. I suppose it would be possible to turn off mmsysmon during the benchmarking, but I appreciate the effort at streamlining the monitor service. Cutting back on fork/exec, better python, less polling, more notifications? all good. Thanks for the details, ? ddj > On Jul 19, 2017, at 9:05 AM, Mathias Dietz wrote: > > thanks for the feedback. > > Let me clarify what mmsysmon is doing. > Since IBM Spectrum Scale 4.2.1 the mmsysmon process is used for the overall health monitoring and CES failover handling. > Even without CES it is an essential part of the system because it monitors the individual components and provides health state information and error events. > This information is needed by other Spectrum Scale components (mmhealth command, the IBM Spectrum Scale GUI, Support tools, Install Toolkit,..) and therefore disabling mmsysmon will impact them. > > > It?s a huge problem. I don?t understand why it hasn?t been given > > much credit by dev or support. > > Over the last couple of month, the development team has put a strong focus on this topic. > In order to monitor the health of the individual components, mmsysmon listens for notifications/callback but also has to do some polling. > We are trying to reduce the polling overhead constantly and replace polling with notifications when possible. > > Several improvements have been added to 4.2.3, including the ability to configure the polling frequency to reduce the overhead. (mmhealth config interval) > See https://www.ibm.com/support/knowledgecenter/en/STXKQY_4.2.3/com.ibm.spectrum.scale.v4r23.doc/bl1adm_mmhealth.htm < https://www.ibm.com/support/knowledgecenter/en/STXKQY_4.2.3/com.ibm.spectrum.scale.v4r23.doc/bl1adm_mmhealth.htm > > In addition a new option has been introduced to clock align the monitoring threads in order to reduce CPU jitter. > > Nevertheless, we don't see significant CPU consumption by mmsysmon on our test systems. > It might be a problem specific to your system environment or a wrong configuration therefore please get in contact with IBM support to analyze the root cause of the high usage. > > Kind regards > > Mathias Dietz > > IBM Spectrum Scale - Release Lead Architect and RAS Architect > > > gpfsug-discuss-bounces at spectrumscale.org wrote on 07/18/2017 07:51:21 PM: > > > From: Jonathon A Anderson > > To: gpfsug main discussion list > > Date: 07/18/2017 07:51 PM > > Subject: Re: [gpfsug-discuss] mmsysmon.py revisited > > Sent by: gpfsug-discuss-bounces at spectrumscale.org > > > > There?s no official way to cleanly disable it so far as I know yet; > > but you can defacto disable it by deleting /var/mmfs/mmsysmon/ > > mmsysmonitor.conf. > > > > It?s a huge problem. I don?t understand why it hasn?t been given > > much credit by dev or support. > > > > ~jonathon > > > > > > On 7/18/17, 11:21 AM, "gpfsug-discuss-bounces at spectrumscale.org on > > behalf of David Johnson" > on behalf of david_johnson at brown.edu> wrote: > > > > > > > > > > We also noticed a fair amount of CPU time accumulated by mmsysmon.py on > > our diskless compute nodes. I read the earlier query, where it > > was answered: > > > > > > > > > > ces == Cluster Export Services, mmsysmon.py comes from > > mmcesmon. It is used for managing export services of GPFS. If it is > > killed, your nfs/smb etc will be out of work. > > Their overhead is small and they are very important. Don't > > attempt to kill them. > > > > > > > > > > > > > > Our question is this ? we don?t run the latest ?protocols", our > > NFS is CNFS, and our CIFS is clustered CIFS. > > I can understand it might be needed with Ganesha, but on every node? > > > > > > Why in the world would I be getting this daemon running on all > > client nodes, when I didn?t install the ?protocols" version > > of the distribution? We have release 4.2.2 at the moment. How > > can we disable this? > > > > > > Thanks, > > ? ddj > > > > > > _______________________________________________ > > gpfsug-discuss mailing list > > gpfsug-discuss at spectrumscale.org > > http://gpfsug.org/mailman/listinfo/gpfsug-discuss < http://gpfsug.org/mailman/listinfo/gpfsug-discuss> > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: < http://gpfsug.org/pipermail/gpfsug-discuss/attachments/20170719/669c525b/attachment.html > ------------------------------ _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss End of gpfsug-discuss Digest, Vol 66, Issue 30 ********************************************** -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: From jonathon.anderson at colorado.edu Wed Jul 19 18:52:14 2017 From: jonathon.anderson at colorado.edu (Jonathon A Anderson) Date: Wed, 19 Jul 2017 17:52:14 +0000 Subject: [gpfsug-discuss] mmsysmon.py revisited In-Reply-To: References: Message-ID: > It might be a problem specific to your system environment or a wrong configuration therefore please get in contact with IBM support to analyze the root cause of the high usage. I suspect it?s actually a result of frequent IO interrupts causing jitter in conflict with MPI on the shared Intel Omni-Path network, in our case. We?ve already tried pursuing support on this through our vendor, DDN, and got no-where. Eventually we were the ones who tried killing mmsysmon, and that fixed our problem. The official company line of ?we don't see significant CPU consumption by mmsysmon on our test systems? isn?t helping. Do you have a test system with OPA? ~jonathon On 7/19/17, 7:05 AM, "gpfsug-discuss-bounces at spectrumscale.org on behalf of Mathias Dietz" wrote: thanks for the feedback. Let me clarify what mmsysmon is doing. Since IBM Spectrum Scale 4.2.1 the mmsysmon process is used for the overall health monitoring and CES failover handling. Even without CES it is an essential part of the system because it monitors the individual components and provides health state information and error events. This information is needed by other Spectrum Scale components (mmhealth command, the IBM Spectrum Scale GUI, Support tools, Install Toolkit,..) and therefore disabling mmsysmon will impact them. > It?s a huge problem. I don?t understand why it hasn?t been given > much credit by dev or support. Over the last couple of month, the development team has put a strong focus on this topic. In order to monitor the health of the individual components, mmsysmon listens for notifications/callback but also has to do some polling. We are trying to reduce the polling overhead constantly and replace polling with notifications when possible. Several improvements have been added to 4.2.3, including the ability to configure the polling frequency to reduce the overhead. (mmhealth config interval) See https://www.ibm.com/support/knowledgecenter/en/STXKQY_4.2.3/com.ibm.spectrum.scale.v4r23.doc/bl1adm_mmhealth.htm In addition a new option has been introduced to clock align the monitoring threads in order to reduce CPU jitter. Nevertheless, we don't see significant CPU consumption by mmsysmon on our test systems. It might be a problem specific to your system environment or a wrong configuration therefore please get in contact with IBM support to analyze the root cause of the high usage. Kind regards Mathias Dietz IBM Spectrum Scale - Release Lead Architect and RAS Architect gpfsug-discuss-bounces at spectrumscale.org wrote on 07/18/2017 07:51:21 PM: > From: Jonathon A Anderson > To: gpfsug main discussion list > Date: 07/18/2017 07:51 PM > Subject: Re: [gpfsug-discuss] mmsysmon.py revisited > Sent by: gpfsug-discuss-bounces at spectrumscale.org > > There?s no official way to cleanly disable it so far as I know yet; > but you can defacto disable it by deleting /var/mmfs/mmsysmon/ > mmsysmonitor.conf. > > It?s a huge problem. I don?t understand why it hasn?t been given > much credit by dev or support. > > ~jonathon > > > On 7/18/17, 11:21 AM, "gpfsug-discuss-bounces at spectrumscale.org on > behalf of David Johnson" on behalf of david_johnson at brown.edu> wrote: > > > > > We also noticed a fair amount of CPU time accumulated by mmsysmon.py on > our diskless compute nodes. I read the earlier query, where it > was answered: > > > > > ces == Cluster Export Services, mmsysmon.py comes from > mmcesmon. It is used for managing export services of GPFS. If it is > killed, your nfs/smb etc will be out of work. > Their overhead is small and they are very important. Don't > attempt to kill them. > > > > > > > Our question is this ? we don?t run the latest ?protocols", our > NFS is CNFS, and our CIFS is clustered CIFS. > I can understand it might be needed with Ganesha, but on every node? > > > Why in the world would I be getting this daemon running on all > client nodes, when I didn?t install the ?protocols" version > of the distribution? We have release 4.2.2 at the moment. How > can we disable this? > > > Thanks, > ? ddj > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss From david_johnson at brown.edu Wed Jul 19 19:12:37 2017 From: david_johnson at brown.edu (david_johnson at brown.edu) Date: Wed, 19 Jul 2017 14:12:37 -0400 Subject: [gpfsug-discuss] mmsysmon.py revisited In-Reply-To: References: Message-ID: <85E16CE6-FAF6-4C8E-80A9-3ED66580BD21@brown.edu> We have FDR14 Mellanox fabric, probably similar interrupt load as OPA. -- ddj Dave Johnson On Jul 19, 2017, at 1:52 PM, Jonathon A Anderson wrote: >> It might be a problem specific to your system environment or a wrong configuration therefore please get in contact with IBM support to analyze the root cause of the high usage. > > I suspect it?s actually a result of frequent IO interrupts causing jitter in conflict with MPI on the shared Intel Omni-Path network, in our case. > > We?ve already tried pursuing support on this through our vendor, DDN, and got no-where. Eventually we were the ones who tried killing mmsysmon, and that fixed our problem. > > The official company line of ?we don't see significant CPU consumption by mmsysmon on our test systems? isn?t helping. Do you have a test system with OPA? > > ~jonathon > > > On 7/19/17, 7:05 AM, "gpfsug-discuss-bounces at spectrumscale.org on behalf of Mathias Dietz" wrote: > > thanks for the feedback. > > Let me clarify what mmsysmon is doing. > Since IBM Spectrum Scale 4.2.1 the mmsysmon process is used for the overall health monitoring and CES failover handling. > Even without CES it is an essential part of the system because it monitors the individual components and provides health state information and error events. > > This information is needed by other Spectrum Scale components (mmhealth command, the IBM Spectrum Scale GUI, Support tools, Install Toolkit,..) and therefore disabling mmsysmon will impact them. > > >> It?s a huge problem. I don?t understand why it hasn?t been given > >> much credit by dev or support. > > Over the last couple of month, the development team has put a strong focus on this topic. > > In order to monitor the health of the individual components, mmsysmon listens for notifications/callback but also has to do some polling. > We are trying to reduce the polling overhead constantly and replace polling with notifications when possible. > > > Several improvements have been added to 4.2.3, including the ability to configure the polling frequency to reduce the overhead. (mmhealth config interval) > > See https://www.ibm.com/support/knowledgecenter/en/STXKQY_4.2.3/com.ibm.spectrum.scale.v4r23.doc/bl1adm_mmhealth.htm > In addition a new option has been introduced to clock align the monitoring threads in order to reduce CPU jitter. > > > Nevertheless, we don't see significant CPU consumption by mmsysmon on our test systems. > > It might be a problem specific to your system environment or a wrong configuration therefore please get in contact with IBM support to analyze the root cause of the high usage. > > Kind regards > > Mathias Dietz > > IBM Spectrum Scale - Release Lead Architect and RAS Architect > > > > gpfsug-discuss-bounces at spectrumscale.org wrote on 07/18/2017 07:51:21 PM: > >> From: Jonathon A Anderson >> To: gpfsug main discussion list >> Date: 07/18/2017 07:51 PM >> Subject: Re: [gpfsug-discuss] mmsysmon.py revisited >> Sent by: gpfsug-discuss-bounces at spectrumscale.org >> >> There?s no official way to cleanly disable it so far as I know yet; >> but you can defacto disable it by deleting /var/mmfs/mmsysmon/ >> mmsysmonitor.conf. >> >> It?s a huge problem. I don?t understand why it hasn?t been given >> much credit by dev or support. >> >> ~jonathon >> >> >> On 7/18/17, 11:21 AM, "gpfsug-discuss-bounces at spectrumscale.org on >> behalf of David Johnson" > on behalf of david_johnson at brown.edu> wrote: >> >> >> >> >> We also noticed a fair amount of CPU time accumulated by mmsysmon.py on >> our diskless compute nodes. I read the earlier query, where it >> was answered: >> >> >> >> >> ces == Cluster Export Services, mmsysmon.py comes from >> mmcesmon. It is used for managing export services of GPFS. If it is >> killed, your nfs/smb etc will be out of work. >> Their overhead is small and they are very important. Don't >> attempt to kill them. >> >> >> >> >> >> >> Our question is this ? we don?t run the latest ?protocols", our >> NFS is CNFS, and our CIFS is clustered CIFS. >> I can understand it might be needed with Ganesha, but on every node? >> >> >> Why in the world would I be getting this daemon running on all >> client nodes, when I didn?t install the ?protocols" version >> of the distribution? We have release 4.2.2 at the moment. How >> can we disable this? >> >> >> Thanks, >> ? ddj >> >> >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at spectrumscale.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss From jonathon.anderson at colorado.edu Wed Jul 19 19:29:22 2017 From: jonathon.anderson at colorado.edu (Jonathon A Anderson) Date: Wed, 19 Jul 2017 18:29:22 +0000 Subject: [gpfsug-discuss] mmsysmon.py revisited In-Reply-To: <85E16CE6-FAF6-4C8E-80A9-3ED66580BD21@brown.edu> References: <85E16CE6-FAF6-4C8E-80A9-3ED66580BD21@brown.edu> Message-ID: OPA behaves _significantly_ differently from Mellanox IB. OPA uses the host CPU for packet processing, whereas Mellanox IB uses a discrete asic on the HBA. As a result, OPA is much more sensitive to task placement and interrupts, in our experience, because the host CPU load competes with the fabric IO processing load. ~jonathon On 7/19/17, 12:12 PM, "gpfsug-discuss-bounces at spectrumscale.org on behalf of david_johnson at brown.edu" wrote: We have FDR14 Mellanox fabric, probably similar interrupt load as OPA. -- ddj Dave Johnson On Jul 19, 2017, at 1:52 PM, Jonathon A Anderson wrote: >> It might be a problem specific to your system environment or a wrong configuration therefore please get in contact with IBM support to analyze the root cause of the high usage. > > I suspect it?s actually a result of frequent IO interrupts causing jitter in conflict with MPI on the shared Intel Omni-Path network, in our case. > > We?ve already tried pursuing support on this through our vendor, DDN, and got no-where. Eventually we were the ones who tried killing mmsysmon, and that fixed our problem. > > The official company line of ?we don't see significant CPU consumption by mmsysmon on our test systems? isn?t helping. Do you have a test system with OPA? > > ~jonathon > > > On 7/19/17, 7:05 AM, "gpfsug-discuss-bounces at spectrumscale.org on behalf of Mathias Dietz" wrote: > > thanks for the feedback. > > Let me clarify what mmsysmon is doing. > Since IBM Spectrum Scale 4.2.1 the mmsysmon process is used for the overall health monitoring and CES failover handling. > Even without CES it is an essential part of the system because it monitors the individual components and provides health state information and error events. > > This information is needed by other Spectrum Scale components (mmhealth command, the IBM Spectrum Scale GUI, Support tools, Install Toolkit,..) and therefore disabling mmsysmon will impact them. > > >> It?s a huge problem. I don?t understand why it hasn?t been given > >> much credit by dev or support. > > Over the last couple of month, the development team has put a strong focus on this topic. > > In order to monitor the health of the individual components, mmsysmon listens for notifications/callback but also has to do some polling. > We are trying to reduce the polling overhead constantly and replace polling with notifications when possible. > > > Several improvements have been added to 4.2.3, including the ability to configure the polling frequency to reduce the overhead. (mmhealth config interval) > > See https://www.ibm.com/support/knowledgecenter/en/STXKQY_4.2.3/com.ibm.spectrum.scale.v4r23.doc/bl1adm_mmhealth.htm > In addition a new option has been introduced to clock align the monitoring threads in order to reduce CPU jitter. > > > Nevertheless, we don't see significant CPU consumption by mmsysmon on our test systems. > > It might be a problem specific to your system environment or a wrong configuration therefore please get in contact with IBM support to analyze the root cause of the high usage. > > Kind regards > > Mathias Dietz > > IBM Spectrum Scale - Release Lead Architect and RAS Architect > > > > gpfsug-discuss-bounces at spectrumscale.org wrote on 07/18/2017 07:51:21 PM: > >> From: Jonathon A Anderson >> To: gpfsug main discussion list >> Date: 07/18/2017 07:51 PM >> Subject: Re: [gpfsug-discuss] mmsysmon.py revisited >> Sent by: gpfsug-discuss-bounces at spectrumscale.org >> >> There?s no official way to cleanly disable it so far as I know yet; >> but you can defacto disable it by deleting /var/mmfs/mmsysmon/ >> mmsysmonitor.conf. >> >> It?s a huge problem. I don?t understand why it hasn?t been given >> much credit by dev or support. >> >> ~jonathon >> >> >> On 7/18/17, 11:21 AM, "gpfsug-discuss-bounces at spectrumscale.org on >> behalf of David Johnson" > on behalf of david_johnson at brown.edu> wrote: >> >> >> >> >> We also noticed a fair amount of CPU time accumulated by mmsysmon.py on >> our diskless compute nodes. I read the earlier query, where it >> was answered: >> >> >> >> >> ces == Cluster Export Services, mmsysmon.py comes from >> mmcesmon. It is used for managing export services of GPFS. If it is >> killed, your nfs/smb etc will be out of work. >> Their overhead is small and they are very important. Don't >> attempt to kill them. >> >> >> >> >> >> >> Our question is this ? we don?t run the latest ?protocols", our >> NFS is CNFS, and our CIFS is clustered CIFS. >> I can understand it might be needed with Ganesha, but on every node? >> >> >> Why in the world would I be getting this daemon running on all >> client nodes, when I didn?t install the ?protocols" version >> of the distribution? We have release 4.2.2 at the moment. How >> can we disable this? >> >> >> Thanks, >> ? ddj >> >> >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at spectrumscale.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss From john.hearns at asml.com Thu Jul 20 08:39:29 2017 From: john.hearns at asml.com (John Hearns) Date: Thu, 20 Jul 2017 07:39:29 +0000 Subject: [gpfsug-discuss] mmsysmon.py revisited In-Reply-To: References: <85E16CE6-FAF6-4C8E-80A9-3ED66580BD21@brown.edu> Message-ID: This is really interesting. I know we can look at the interrupt rates of course, but is there a way we can quantify the effects of interrupts / OS jitter here? -----Original Message----- From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Jonathon A Anderson Sent: Wednesday, July 19, 2017 8:29 PM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] mmsysmon.py revisited OPA behaves _significantly_ differently from Mellanox IB. OPA uses the host CPU for packet processing, whereas Mellanox IB uses a discrete asic on the HBA. As a result, OPA is much more sensitive to task placement and interrupts, in our experience, because the host CPU load competes with the fabric IO processing load. ~jonathon On 7/19/17, 12:12 PM, "gpfsug-discuss-bounces at spectrumscale.org on behalf of david_johnson at brown.edu" wrote: We have FDR14 Mellanox fabric, probably similar interrupt load as OPA. -- ddj Dave Johnson On Jul 19, 2017, at 1:52 PM, Jonathon A Anderson wrote: >> It might be a problem specific to your system environment or a wrong configuration therefore please get in contact with IBM support to analyze the root cause of the high usage. > > I suspect it?s actually a result of frequent IO interrupts causing jitter in conflict with MPI on the shared Intel Omni-Path network, in our case. > > We?ve already tried pursuing support on this through our vendor, DDN, and got no-where. Eventually we were the ones who tried killing mmsysmon, and that fixed our problem. > > The official company line of ?we don't see significant CPU consumption by mmsysmon on our test systems? isn?t helping. Do you have a test system with OPA? > > ~jonathon > > > On 7/19/17, 7:05 AM, "gpfsug-discuss-bounces at spectrumscale.org on behalf of Mathias Dietz" wrote: > > thanks for the feedback. > > Let me clarify what mmsysmon is doing. > Since IBM Spectrum Scale 4.2.1 the mmsysmon process is used for the overall health monitoring and CES failover handling. > Even without CES it is an essential part of the system because it monitors the individual components and provides health state information and error events. > > This information is needed by other Spectrum Scale components (mmhealth command, the IBM Spectrum Scale GUI, Support tools, Install Toolkit,..) and therefore disabling mmsysmon will impact them. > > >> It?s a huge problem. I don?t understand why it hasn?t been given > >> much credit by dev or support. > > Over the last couple of month, the development team has put a strong focus on this topic. > > In order to monitor the health of the individual components, mmsysmon listens for notifications/callback but also has to do some polling. > We are trying to reduce the polling overhead constantly and replace polling with notifications when possible. > > > Several improvements have been added to 4.2.3, including the ability to configure the polling frequency to reduce the overhead. (mmhealth config interval) > > See https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.ibm.com%2Fsupport%2Fknowledgecenter%2Fen%2FSTXKQY_4.2.3%2Fcom.ibm.spectrum.scale.v4r23.doc%2Fbl1adm_mmhealth.htm&data=01%7C01%7Cjohn.hearns%40asml.com%7C63ff97a625e4499cb8ba08d4ced41754%7Caf73baa8f5944eb2a39d93e96cad61fc%7C1&sdata=Uzdg4ogcQwidNfi8TMp%2FdCMqnSLTFxU4y8n2ub%2F28xQ%3D&reserved=0 > In addition a new option has been introduced to clock align the monitoring threads in order to reduce CPU jitter. > > > Nevertheless, we don't see significant CPU consumption by mmsysmon on our test systems. > > It might be a problem specific to your system environment or a wrong configuration therefore please get in contact with IBM support to analyze the root cause of the high usage. > > Kind regards > > Mathias Dietz > > IBM Spectrum Scale - Release Lead Architect and RAS Architect > > > > gpfsug-discuss-bounces at spectrumscale.org wrote on 07/18/2017 07:51:21 PM: > >> From: Jonathon A Anderson >> To: gpfsug main discussion list >> Date: 07/18/2017 07:51 PM >> Subject: Re: [gpfsug-discuss] mmsysmon.py revisited >> Sent by: gpfsug-discuss-bounces at spectrumscale.org >> >> There?s no official way to cleanly disable it so far as I know yet; >> but you can defacto disable it by deleting /var/mmfs/mmsysmon/ >> mmsysmonitor.conf. >> >> It?s a huge problem. I don?t understand why it hasn?t been given >> much credit by dev or support. >> >> ~jonathon >> >> >> On 7/18/17, 11:21 AM, "gpfsug-discuss-bounces at spectrumscale.org on >> behalf of David Johnson" > on behalf of david_johnson at brown.edu> wrote: >> >> >> >> >> We also noticed a fair amount of CPU time accumulated by mmsysmon.py on >> our diskless compute nodes. I read the earlier query, where it >> was answered: >> >> >> >> >> ces == Cluster Export Services, mmsysmon.py comes from >> mmcesmon. It is used for managing export services of GPFS. If it is >> killed, your nfs/smb etc will be out of work. >> Their overhead is small and they are very important. Don't >> attempt to kill them. >> >> >> >> >> >> >> Our question is this ? we don?t run the latest ?protocols", our >> NFS is CNFS, and our CIFS is clustered CIFS. >> I can understand it might be needed with Ganesha, but on every node? >> >> >> Why in the world would I be getting this daemon running on all >> client nodes, when I didn?t install the ?protocols" version >> of the distribution? We have release 4.2.2 at the moment. How >> can we disable this? >> >> >> Thanks, >> ? ddj >> >> >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at spectrumscale.org >> https://emea01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fgpfsug.org%2Fmailman%2Flistinfo%2Fgpfsug-discuss&data=01%7C01%7Cjohn.hearns%40asml.com%7C63ff97a625e4499cb8ba08d4ced41754%7Caf73baa8f5944eb2a39d93e96cad61fc%7C1&sdata=9yatmHdcxwy%2FD%2FoZuLVDkPjpIeS9F7crTLl2MoUUIyo%3D&reserved=0 > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > https://emea01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fgpfsug.org%2Fmailman%2Flistinfo%2Fgpfsug-discuss&data=01%7C01%7Cjohn.hearns%40asml.com%7C63ff97a625e4499cb8ba08d4ced41754%7Caf73baa8f5944eb2a39d93e96cad61fc%7C1&sdata=9yatmHdcxwy%2FD%2FoZuLVDkPjpIeS9F7crTLl2MoUUIyo%3D&reserved=0 _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://emea01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fgpfsug.org%2Fmailman%2Flistinfo%2Fgpfsug-discuss&data=01%7C01%7Cjohn.hearns%40asml.com%7C63ff97a625e4499cb8ba08d4ced41754%7Caf73baa8f5944eb2a39d93e96cad61fc%7C1&sdata=9yatmHdcxwy%2FD%2FoZuLVDkPjpIeS9F7crTLl2MoUUIyo%3D&reserved=0 _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://emea01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fgpfsug.org%2Fmailman%2Flistinfo%2Fgpfsug-discuss&data=01%7C01%7Cjohn.hearns%40asml.com%7C63ff97a625e4499cb8ba08d4ced41754%7Caf73baa8f5944eb2a39d93e96cad61fc%7C1&sdata=9yatmHdcxwy%2FD%2FoZuLVDkPjpIeS9F7crTLl2MoUUIyo%3D&reserved=0 -- The information contained in this communication and any attachments is confidential and may be privileged, and is for the sole use of the intended recipient(s). Any unauthorized review, use, disclosure or distribution is prohibited. Unless explicitly stated otherwise in the body of this communication or the attachment thereto (if any), the information is provided on an AS-IS basis without any express or implied warranties or liabilities. To the extent you are relying on this information, you are doing so at your own risk. If you are not the intended recipient, please notify the sender immediately by replying to this message and destroy all copies of this message and any attachments. Neither the sender nor the company/group of companies he or she represents shall be liable for the proper and complete transmission of the information contained in this communication, or for any delay in its receipt. From MDIETZ at de.ibm.com Thu Jul 20 10:30:50 2017 From: MDIETZ at de.ibm.com (Mathias Dietz) Date: Thu, 20 Jul 2017 11:30:50 +0200 Subject: [gpfsug-discuss] mmsysmon.py revisited In-Reply-To: References: Message-ID: Jonathon, its important to separate the two issues "high CPU consumption" and "CPU Jitter". As mentioned, we are aware of the CPU jitter issue and already put several improvements in place. (more to come with the next release) Did you try with a lower polling frequency and/or enabling clock alignment as Mike suggested ? Non-MPI workloads are usually not impacted by CPU jitter, but might be impacted by high CPU consumption. But we don't see such such high CPU consumption in the lab and therefore ask affected customers to get in contact with IBM support to find the root cause. Kind regards Mathias Dietz IBM Spectrum Scale - Release Lead Architect and RAS Architect gpfsug-discuss-bounces at spectrumscale.org wrote on 07/19/2017 07:52:14 PM: > From: Jonathon A Anderson > To: gpfsug main discussion list > Date: 07/19/2017 07:52 PM > Subject: Re: [gpfsug-discuss] mmsysmon.py revisited > Sent by: gpfsug-discuss-bounces at spectrumscale.org > > > It might be a problem specific to your system environment or a > wrong configuration therefore please get in contact with IBM support > to analyze the root cause of the high usage. > > I suspect it?s actually a result of frequent IO interrupts causing > jitter in conflict with MPI on the shared Intel Omni-Path network, > in our case. > > We?ve already tried pursuing support on this through our vendor, > DDN, and got no-where. Eventually we were the ones who tried killing > mmsysmon, and that fixed our problem. > > The official company line of ?we don't see significant CPU > consumption by mmsysmon on our test systems? isn?t helping. Do you > have a test system with OPA? > > ~jonathon > > > On 7/19/17, 7:05 AM, "gpfsug-discuss-bounces at spectrumscale.org on > behalf of Mathias Dietz" on behalf of MDIETZ at de.ibm.com> wrote: > > thanks for the feedback. > > Let me clarify what mmsysmon is doing. > Since IBM Spectrum Scale 4.2.1 the mmsysmon process is used for > the overall health monitoring and CES failover handling. > Even without CES it is an essential part of the system because > it monitors the individual components and provides health state > information and error events. > > This information is needed by other Spectrum Scale components > (mmhealth command, the IBM Spectrum Scale GUI, Support tools, > Install Toolkit,..) and therefore disabling mmsysmon will impact them. > > > > It?s a huge problem. I don?t understand why it hasn?t been given > > > much credit by dev or support. > > Over the last couple of month, the development team has put a > strong focus on this topic. > > In order to monitor the health of the individual components, > mmsysmon listens for notifications/callback but also has to do some polling. > We are trying to reduce the polling overhead constantly and > replace polling with notifications when possible. > > > Several improvements have been added to 4.2.3, including the > ability to configure the polling frequency to reduce the overhead. > (mmhealth config interval) > > See https://www.ibm.com/support/knowledgecenter/en/STXKQY_4.2.3/ > com.ibm.spectrum.scale.v4r23.doc/bl1adm_mmhealth.htm > In addition a new option has been introduced to clock align the > monitoring threads in order to reduce CPU jitter. > > > Nevertheless, we don't see significant CPU consumption by > mmsysmon on our test systems. > > It might be a problem specific to your system environment or a > wrong configuration therefore please get in contact with IBM support > to analyze the root cause of the high usage. > > Kind regards > > Mathias Dietz > > IBM Spectrum Scale - Release Lead Architect and RAS Architect > > > > gpfsug-discuss-bounces at spectrumscale.org wrote on 07/18/2017 07:51:21 PM: > > > From: Jonathon A Anderson > > To: gpfsug main discussion list > > Date: 07/18/2017 07:51 PM > > Subject: Re: [gpfsug-discuss] mmsysmon.py revisited > > Sent by: gpfsug-discuss-bounces at spectrumscale.org > > > > There?s no official way to cleanly disable it so far as I know yet; > > but you can defacto disable it by deleting /var/mmfs/mmsysmon/ > > mmsysmonitor.conf. > > > > It?s a huge problem. I don?t understand why it hasn?t been given > > much credit by dev or support. > > > > ~jonathon > > > > > > On 7/18/17, 11:21 AM, "gpfsug-discuss-bounces at spectrumscale.org on > > behalf of David Johnson" > on behalf of david_johnson at brown.edu> wrote: > > > > > > > > > > We also noticed a fair amount of CPU time accumulated by > mmsysmon.py on > > our diskless compute nodes. I read the earlier query, where it > > was answered: > > > > > > > > > > ces == Cluster Export Services, mmsysmon.py comes from > > mmcesmon. It is used for managing export services of GPFS. If it is > > killed, your nfs/smb etc will be out of work. > > Their overhead is small and they are very important. Don't > > attempt to kill them. > > > > > > > > > > > > > > Our question is this ? we don?t run the latest ?protocols", our > > NFS is CNFS, and our CIFS is clustered CIFS. > > I can understand it might be needed with Ganesha, but on > every node? > > > > > > Why in the world would I be getting this daemon running on all > > client nodes, when I didn?t install the ?protocols" version > > of the distribution? We have release 4.2.2 at the moment. How > > can we disable this? > > > > > > Thanks, > > ? ddj > > > > > > _______________________________________________ > > gpfsug-discuss mailing list > > gpfsug-discuss at spectrumscale.org > > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From cgirda at wustl.edu Thu Jul 20 15:57:14 2017 From: cgirda at wustl.edu (Chakravarthy Girda) Date: Thu, 20 Jul 2017 09:57:14 -0500 Subject: [gpfsug-discuss] pmmonitor - ERROR Message-ID: <40d1f871-6081-4db3-8d6a-cec816266a00@wustl.edu> Hi There, I was running a bridge port services to push my stats to grafana. It was running fine until we started some rigorous IOPS testing on the cluster. Now its failing to start with the following error. Questions: 1. Any clues on it fix? 2. Is there anyway I can run this in a service/daemon mode rather than running in a screen session? [root at linuscs107 zimonGrafanaIntf]# python zimonGrafanaIntf.py -s linuscs107.gsc.wustl.edu Failed to initialize MetadataHandler, please check log file for reason #cat pmmonitor.log 2017-07-20 09:41:29,384 - pmmonitor - ERROR - GPFSCapacityUtil::execute: Error sending query in execute, quitting 2017-07-20 09:41:29,552 - pmmonitor - ERROR - GPFSCapacityUtil::execute: Error sending query in execute, quitting 2017-07-20 09:41:30,047 - pmmonitor - ERROR - GPFSCapacityUtil::execute: Error sending query in execute, quitting Thank you Chakri From Robert.Oesterlin at nuance.com Thu Jul 20 16:06:48 2017 From: Robert.Oesterlin at nuance.com (Oesterlin, Robert) Date: Thu, 20 Jul 2017 15:06:48 +0000 Subject: [gpfsug-discuss] mmsysmon and CCR Message-ID: I recently ran into an issue where the frequency of mmsysmon polling (GPFS 4.2.2) was causing issues with CCR updates. I eventually ended decreasing the polling interval to 30 mins (I don?t have any CES) which seemed to solve the issue. So, if you have a large cluster, be on the lookout for CCR issues, if you have that configured. Bob Oesterlin Sr Principal Storage Engineer, Nuance -------------- next part -------------- An HTML attachment was scrubbed... URL: From cgirda at wustl.edu Thu Jul 20 17:38:25 2017 From: cgirda at wustl.edu (Chakravarthy Girda) Date: Thu, 20 Jul 2017 11:38:25 -0500 Subject: [gpfsug-discuss] pmmonitor - ERROR In-Reply-To: <40d1f871-6081-4db3-8d6a-cec816266a00@wustl.edu> References: <40d1f871-6081-4db3-8d6a-cec816266a00@wustl.edu> Message-ID: <31b9b441-f51c-c0d1-11e0-b01a070f9e4e@wustl.edu> cat zserver.log 2017-07-20 11:21:59,001 - zimonGrafanaIntf - ERROR - Could not initialize the QueryHandler, GetHandler::initializeTables failed (errno: None, errmsg: Unable to connect to 10.100.3.150 on port 9084, error number: 111, error code: ECONNREFUSED) 2017-07-20 11:32:29,090 - zimonGrafanaIntf - ERROR - Could not initialize the QueryHandler, GetHandler::initializeTables failed (errno: None, errmsg: Unable to connect to 10.100.3.150 on port 9084, error number: 111, error code: ECONNREFUSED) Thank you Chakri On 7/20/17 9:57 AM, Chakravarthy Girda wrote: > Hi There, > > I was running a bridge port services to push my stats to grafana. It > was running fine until we started some rigorous IOPS testing on the > cluster. Now its failing to start with the following error. > > Questions: > > 1. Any clues on it fix? > 2. Is there anyway I can run this in a service/daemon mode rather than > running in a screen session? > > > [root at linuscs107 zimonGrafanaIntf]# python zimonGrafanaIntf.py -s > linuscs107.gsc.wustl.edu > Failed to initialize MetadataHandler, please check log file for reason > > #cat pmmonitor.log > > 2017-07-20 09:41:29,384 - pmmonitor - ERROR - GPFSCapacityUtil::execute: > Error sending query in execute, quitting > 2017-07-20 09:41:29,552 - pmmonitor - ERROR - GPFSCapacityUtil::execute: > Error sending query in execute, quitting > 2017-07-20 09:41:30,047 - pmmonitor - ERROR - GPFSCapacityUtil::execute: > Error sending query in execute, quitting > > > Thank you > Chakri > > > > > From Robert.Oesterlin at nuance.com Thu Jul 20 17:50:12 2017 From: Robert.Oesterlin at nuance.com (Oesterlin, Robert) Date: Thu, 20 Jul 2017 16:50:12 +0000 Subject: [gpfsug-discuss] pmmonitor - ERROR Message-ID: <3377EEBD-7F1A-4269-83EE-458362762457@nuance.com> This looks like the Grafana bridge could not connect to the pmcollector process - is it running normally? See if some of the normal ?mmprefmon? commands work and/or look at the log file on the pmcollector node. (under /var/log/zimon) You will also see this node when the pmcollector process is still initializing. (reading in the existing database, not ready to service requests) Bob Oesterlin Sr Principal Storage Engineer, Nuance On 7/20/17, 11:38 AM, "gpfsug-discuss-bounces at spectrumscale.org on behalf of Chakravarthy Girda" wrote: 2017-07-20 11:32:29,090 - zimonGrafanaIntf - ERROR - Could not initialize the QueryHandler, GetHandler::initializeTables failed (errno: None, errmsg: Unable to connect to 10.100.3.150 on port 9084, error number: 111, error code: ECONNREFUSED) From mdharris at us.ibm.com Thu Jul 20 17:55:56 2017 From: mdharris at us.ibm.com (Michael D Harris) Date: Thu, 20 Jul 2017 12:55:56 -0400 Subject: [gpfsug-discuss] gpfsug-discuss Digest, Vol 66, Issue 34 In-Reply-To: References: Message-ID: Hi Bob, The CCR monitor interval is addressed in 4.2.3 or 4.2.3 ptf1 Regards, Mike Harris Spectrum Scale Development - Core Team -------------- next part -------------- An HTML attachment was scrubbed... URL: From cgirda at wustl.edu Thu Jul 20 18:12:09 2017 From: cgirda at wustl.edu (Chakravarthy Girda) Date: Thu, 20 Jul 2017 12:12:09 -0500 Subject: [gpfsug-discuss] pmmonitor - ERROR In-Reply-To: <3377EEBD-7F1A-4269-83EE-458362762457@nuance.com> References: <3377EEBD-7F1A-4269-83EE-458362762457@nuance.com> Message-ID: Bob, Your correct. Found the issues with pmcollector services. Fixed issues with pmcollector, resolved the issues. Thank you Chakri On 7/20/17 11:50 AM, Oesterlin, Robert wrote: > You will also see this node when the pmcollector process is still initializing. (reading in the existing database, not ready to service requests) From cgirda at wustl.edu Thu Jul 20 18:30:03 2017 From: cgirda at wustl.edu (Chakravarthy Girda) Date: Thu, 20 Jul 2017 12:30:03 -0500 Subject: [gpfsug-discuss] pmmonitor - ERROR In-Reply-To: References: <3377EEBD-7F1A-4269-83EE-458362762457@nuance.com> Message-ID: <576fe053-e356-dc5c-e71d-18ef96f7ccaa@wustl.edu> Bob, Actually the pmcollector service died in 5min. 2017-07-20 12:11:29,639 - pmmonitor - WARNING - GPFSCapacityUtil: received zero inode/pool total size 2017-07-20 12:16:29,470 - pmmonitor - WARNING - GPFSCapacityUtil: received zero inode/pool total size 2017-07-20 12:16:29,639 - pmmonitor - WARNING - GPFSCapacityUtil: received zero inode/pool total size 2017-07-20 12:21:29,384 - pmmonitor - ERROR - QueryHandler: Socket connection broken, received no data 2017-07-20 12:21:29,384 - pmmonitor - ERROR - GPFSCapacityUtil::execute: Error sending query in execute, quitting 2017-07-20 12:21:29,552 - pmmonitor - ERROR - GPFSCapacityUtil::execute: Error sending query in execute, quitting 2017-07-20 12:21:30,047 - pmmonitor - ERROR - GPFSCapacityUtil::execute: Error sending query in execute, quitting Thank you Chakri On 7/20/17 12:12 PM, Chakravarthy Girda wrote: > Bob, > > Your correct. Found the issues with pmcollector services. Fixed issues > with pmcollector, resolved the issues. > > > Thank you > > Chakri > > > On 7/20/17 11:50 AM, Oesterlin, Robert wrote: >> You will also see this node when the pmcollector process is still initializing. (reading in the existing database, not ready to service requests) From cgirda at wustl.edu Thu Jul 20 21:03:56 2017 From: cgirda at wustl.edu (Chakravarthy Girda) Date: Thu, 20 Jul 2017 15:03:56 -0500 Subject: [gpfsug-discuss] pmmonitor - ERROR In-Reply-To: <576fe053-e356-dc5c-e71d-18ef96f7ccaa@wustl.edu> References: <3377EEBD-7F1A-4269-83EE-458362762457@nuance.com> <576fe053-e356-dc5c-e71d-18ef96f7ccaa@wustl.edu> Message-ID: For now I switched the "zimonGrafanaIntf" to port "4262". So far it didn't crash the pmcollector. Will wait for some more time to ensure its working. * Can we start this process in a daemon or service mode? Thank you Chakri On 7/20/17 12:30 PM, Chakravarthy Girda wrote: > Bob, > > Actually the pmcollector service died in 5min. > > 2017-07-20 12:11:29,639 - pmmonitor - WARNING - GPFSCapacityUtil: > received zero inode/pool total size > 2017-07-20 12:16:29,470 - pmmonitor - WARNING - GPFSCapacityUtil: > received zero inode/pool total size > 2017-07-20 12:16:29,639 - pmmonitor - WARNING - GPFSCapacityUtil: > received zero inode/pool total size > 2017-07-20 12:21:29,384 - pmmonitor - ERROR - QueryHandler: Socket > connection broken, received no data > 2017-07-20 12:21:29,384 - pmmonitor - ERROR - GPFSCapacityUtil::execute: > Error sending query in execute, quitting > 2017-07-20 12:21:29,552 - pmmonitor - ERROR - GPFSCapacityUtil::execute: > Error sending query in execute, quitting > 2017-07-20 12:21:30,047 - pmmonitor - ERROR - GPFSCapacityUtil::execute: > Error sending query in execute, quitting > > Thank you > Chakri > > > On 7/20/17 12:12 PM, Chakravarthy Girda wrote: >> Bob, >> >> Your correct. Found the issues with pmcollector services. Fixed issues >> with pmcollector, resolved the issues. >> >> >> Thank you >> >> Chakri >> >> >> On 7/20/17 11:50 AM, Oesterlin, Robert wrote: >>> You will also see this node when the pmcollector process is still initializing. (reading in the existing database, not ready to service requests) > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss From cgirda at wustl.edu Thu Jul 20 21:42:09 2017 From: cgirda at wustl.edu (Chakravarthy Girda) Date: Thu, 20 Jul 2017 15:42:09 -0500 Subject: [gpfsug-discuss] zimonGrafanaIntf template variable Message-ID: <00372fdc-a0b7-26ac-84c1-aa32c78e4261@wustl.edu> Hi, I imported the pre-built grafana dashboard. https://www.ibm.com/developerworks/community/wikis/form/anonymous/api/wiki/fa32927c-e904-49cc-a4cc-870bcc8e307c/page/a180eb7e-9161-4e07-a6e4-35a0a076f7b3/attachment/5e9a5886-5bd9-4a6f-919e-bc66d16760cf/media/default%20dashboards%20set.zip Get updates from few graphs but not all. I realize that I need to update the template variables. Eg:- I get into the "File Systems View" Variable ( gpfsMetrics_fs1 ) --> Query ( gpfsMetrics_fs1 ) Regex ( /.*[^gpfs_fs_inode_used|gpfs_fs_inode_alloc|gpfs_fs_inode_free|gpfs_fs_inode_max]/ ) Question: * How can I execute the above Query and regex to fix the issues. * Is there any document on CLI options? Thank you Chakri -------------- next part -------------- An HTML attachment was scrubbed... URL: From valdis.kletnieks at vt.edu Fri Jul 21 22:13:17 2017 From: valdis.kletnieks at vt.edu (valdis.kletnieks at vt.edu) Date: Fri, 21 Jul 2017 17:13:17 -0400 Subject: [gpfsug-discuss] GPFS, LTFS/EE and data-in-inode? Message-ID: <28986.1500671597@turing-police.cc.vt.edu> So we're running GPFS 4.2.2.3 and LTFS/EE 1.2.3 to use as an archive service. Inode size is 4K, and we had a requirement to encrypt-at-rest, so encryption is in play as well. Data is replicated 2x and fragment size is 32K. I was investigating how much data-in-inode would help deal with users who put large trees of small files into the archive (yes, I know we can use applypolicy with external programs to tarball offending directories, but that's a separate discussion ;) ## ls -ls * 64 -rw-r--r-- 1 root root 2048 Jul 21 14:47 random.data 64 -rw-r--r-- 1 root root 512 Jul 21 14:48 random.data.1 64 -rw-r--r-- 1 root root 128 Jul 21 14:50 random.data.2 64 -rw-r--r-- 1 root root 32 Jul 21 14:50 random.data.3 64 -rw-r--r-- 1 root root 16 Jul 21 14:50 random.data.4 Hmm.. I was expecting at least *some* of these to fit in the inode, and not take 2 32K blocks... ## mmlsattr -d -L random.data.4 file name: random.data.4 metadata replication: 2 max 2 data replication: 2 max 2 immutable: no appendOnly: no flags: storage pool name: system fileset name: root snapshot name: creation time: Fri Jul 21 14:50:51 2017 Misc attributes: ARCHIVE Encrypted: yes gpfs.Encryption: 0x4541 (... another 296 hex digits) EncPar 'AES:256:XTS:FEK:HMACSHA512' type: wrapped FEK WrpPar 'AES:KWRAP' CmbPar 'XORHMACSHA512' KEY-97c7f4b7-06cb-4a53-b317-1c187432dc62:archKEY1_gpfsG1 Hmm.. Doesn't *look* like enough extended attributes to prevent storing even 16 bytes in the inode, should be room for around 3.5K minus the above 250 bytes or so of attributes.... What am I missing here? Does "encrypted" or LTFS/EE disable data-in-inode? -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 486 bytes Desc: not available URL: From oehmes at gmail.com Fri Jul 21 23:04:32 2017 From: oehmes at gmail.com (Sven Oehme) Date: Fri, 21 Jul 2017 22:04:32 +0000 Subject: [gpfsug-discuss] GPFS, LTFS/EE and data-in-inode? In-Reply-To: <28986.1500671597@turing-police.cc.vt.edu> References: <28986.1500671597@turing-police.cc.vt.edu> Message-ID: Hi, i talked with a few others to confirm this, but unfortunate this is a limitation of the code today (maybe not well documented which we will look into). Encryption only encrypts data blocks, it doesn't encrypt metadata. Hence, if encryption is enabled, we don't store data in the inode, because then it wouldn't be encrypted. For the same reason HAWC and encryption are incompatible. Sven On Fri, Jul 21, 2017 at 2:13 PM wrote: > So we're running GPFS 4.2.2.3 and LTFS/EE 1.2.3 to use as an archive > service. > Inode size is 4K, and we had a requirement to encrypt-at-rest, so > encryption > is in play as well. Data is replicated 2x and fragment size is 32K. > > I was investigating how much data-in-inode would help deal with users who > put > large trees of small files into the archive (yes, I know we can use > applypolicy > with external programs to tarball offending directories, but that's a > separate > discussion ;) > > ## ls -ls * > 64 -rw-r--r-- 1 root root 2048 Jul 21 14:47 random.data > 64 -rw-r--r-- 1 root root 512 Jul 21 14:48 random.data.1 > 64 -rw-r--r-- 1 root root 128 Jul 21 14:50 random.data.2 > 64 -rw-r--r-- 1 root root 32 Jul 21 14:50 random.data.3 > 64 -rw-r--r-- 1 root root 16 Jul 21 14:50 random.data.4 > > Hmm.. I was expecting at least *some* of these to fit in the inode, and > not take 2 32K blocks... > > ## mmlsattr -d -L random.data.4 > file name: random.data.4 > metadata replication: 2 max 2 > data replication: 2 max 2 > immutable: no > appendOnly: no > flags: > storage pool name: system > fileset name: root > snapshot name: > creation time: Fri Jul 21 14:50:51 2017 > Misc attributes: ARCHIVE > Encrypted: yes > gpfs.Encryption: 0x4541 (... another 296 hex digits) > EncPar 'AES:256:XTS:FEK:HMACSHA512' > type: wrapped FEK WrpPar 'AES:KWRAP' CmbPar 'XORHMACSHA512' > KEY-97c7f4b7-06cb-4a53-b317-1c187432dc62:archKEY1_gpfsG1 > > Hmm.. Doesn't *look* like enough extended attributes to prevent storing > even > 16 bytes in the inode, should be room for around 3.5K minus the above 250 > bytes > or so of attributes.... > > What am I missing here? Does "encrypted" or LTFS/EE disable data-in-inode? > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: From valdis.kletnieks at vt.edu Fri Jul 21 23:24:13 2017 From: valdis.kletnieks at vt.edu (valdis.kletnieks at vt.edu) Date: Fri, 21 Jul 2017 18:24:13 -0400 Subject: [gpfsug-discuss] GPFS, LTFS/EE and data-in-inode? In-Reply-To: References: <28986.1500671597@turing-police.cc.vt.edu> Message-ID: <33069.1500675853@turing-police.cc.vt.edu> On Fri, 21 Jul 2017 22:04:32 -0000, Sven Oehme said: > i talked with a few others to confirm this, but unfortunate this is a > limitation of the code today (maybe not well documented which we will look > into). Encryption only encrypts data blocks, it doesn't encrypt metadata. > Hence, if encryption is enabled, we don't store data in the inode, because > then it wouldn't be encrypted. For the same reason HAWC and encryption are > incompatible. I can live with that restriction if it's documented better, thanks... -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 486 bytes Desc: not available URL: From p.childs at qmul.ac.uk Mon Jul 24 10:29:49 2017 From: p.childs at qmul.ac.uk (Peter Childs) Date: Mon, 24 Jul 2017 09:29:49 +0000 Subject: [gpfsug-discuss] Gpfs Memory Usaage Keeps going up and we don't know why. Message-ID: <1500888588.571.3.camel@qmul.ac.uk> We have two GPFS clusters. One is fairly old and running 4.2.1-2 and non CCR and the nodes run fine using up about 1.5G of memory and is consistent (GPFS pagepool is set to 1G, so that looks about right.) The other one is "newer" running 4.2.1-3 with CCR and the nodes keep increasing in there memory usage, starting at about 1.1G and are find for a few days however after a while they grow to 4.2G which when the node need to run real work, means the work can't be done. I'm losing track of what maybe different other than CCR, and I'm trying to find some more ideas of where to look. I'm checked all the standard things like pagepool and maxFilesToCache (set to the default of 4000), workerThreads is set to 128 on the new gpfs cluster (against default 48 on the old) I'm not sure what else to look at on this one hence why I'm asking the community. Thanks in advance Peter Childs ITS Research Storage Queen Mary University of London. From ilan84 at gmail.com Mon Jul 24 11:36:41 2017 From: ilan84 at gmail.com (Ilan Schwarts) Date: Mon, 24 Jul 2017 13:36:41 +0300 Subject: [gpfsug-discuss] export nfs share on gpfs with no authentication Message-ID: Hi, I have gpfs with 2 Nodes (redhat). I am trying to create NFS share - So I would be able to mount and access it from another linux machine. I receive error: Current authentication: none is invalid. What do i need to configure ? PLEASE NOTE: I dont have the SMB package at the moment, I dont want authentication on the NFS export.. While trying to create NFS (I execute the following): [root at LH20-GPFS1 ~]# mmnfs export add /fs_gpfs01 -c "* Access_Type=RW,Protocols=3:4,Squash=no_root_squash)" I receive the following error: [root at LH20-GPFS1 ~]# mmnfs export add /fs_gpfs01 -c "*(Access_Type=RW,Protocols=3:4,Squash=no_root_squash)" mmcesfuncs.sh: Current authentication: none is invalid. This operation can not be completed without correct Authentication configuration. Configure authentication using: mmuserauth mmnfs export add: Command failed. Examine previous error messages to determine cause. [root at LH20-GPFS1 ~]# mmuserauth service list FILE access not configured PARAMETERS VALUES ------------------------------------------------- OBJECT access not configured PARAMETERS VALUES ------------------------------------------------- [root at LH20-GPFS1 ~]# Some additional information on cluster: ============================== [root at LH20-GPFS1 ~]# mmlsmgr file system manager node ---------------- ------------------ fs_gpfs01 10.10.158.61 (LH20-GPFS1) Cluster manager node: 10.10.158.61 (LH20-GPFS1) [root at LH20-GPFS1 ~]# mmgetstate -a Node number Node name GPFS state ------------------------------------------ 1 LH20-GPFS1 active 3 LH20-GPFS2 active [root at LH20-GPFS1 ~]# mmlscluster GPFS cluster information ======================== GPFS cluster name: LH20-GPFS1 GPFS cluster id: 10777108240438931454 GPFS UID domain: LH20-GPFS1 Remote shell command: /usr/bin/ssh Remote file copy command: /usr/bin/scp Repository type: CCR Node Daemon node name IP address Admin node name Designation -------------------------------------------------------------------- 1 LH20-GPFS1 10.10.158.61 LH20-GPFS1 quorum-manager 3 LH20-GPFS2 10.10.158.62 LH20-GPFS2 quorum From jonathan at buzzard.me.uk Mon Jul 24 12:43:10 2017 From: jonathan at buzzard.me.uk (Jonathan Buzzard) Date: Mon, 24 Jul 2017 12:43:10 +0100 Subject: [gpfsug-discuss] GPFS, LTFS/EE and data-in-inode? In-Reply-To: <28986.1500671597@turing-police.cc.vt.edu> References: <28986.1500671597@turing-police.cc.vt.edu> Message-ID: <1500896590.4387.167.camel@buzzard.me.uk> On Fri, 2017-07-21 at 17:13 -0400, valdis.kletnieks at vt.edu wrote: > So we're running GPFS 4.2.2.3 and LTFS/EE 1.2.3 to use as an archive service. > Inode size is 4K, and we had a requirement to encrypt-at-rest, so encryption > is in play as well. Data is replicated 2x and fragment size is 32K. > For an archive service how about only accepting files in actual "archive" formats and then severely restricting the number of files a user can have? By archive files I am thinking like a .zip, tar.gz, tar.bz or similar. Has a number of effects. Firstly it makes the files "big" so they move to tape efficiently. It also makes it less likely the end user will try and use it as an general purpose file server. As it's an archive there should be no problem for the user to bundle all the files into a .zip file or similar. Noting that Windows Vista and up handle ZIP64 files getting around the older 4GB and 65k files limit. JAB. -- Jonathan A. Buzzard Email: jonathan (at) buzzard.me.uk Fife, United Kingdom. From stefan.dietrich at desy.de Mon Jul 24 13:19:47 2017 From: stefan.dietrich at desy.de (Dietrich, Stefan) Date: Mon, 24 Jul 2017 14:19:47 +0200 (CEST) Subject: [gpfsug-discuss] Scale / Perfmon / Grafana - services running but no data In-Reply-To: References: Message-ID: <1981958989.2609398.1500898787132.JavaMail.zimbra@desy.de> Yep, have look at this Gist [1] The unit files assumes some paths and users, which are created during the installation of my RPM. [1] https://gist.github.com/stdietrich/b3b985f872ea648d6c03bb6249c44e72 Regards, Stefan ----- Original Message ----- > From: "Greg Lehmann" > To: gpfsug-discuss at spectrumscale.org > Sent: Wednesday, July 19, 2017 9:53:58 AM > Subject: Re: [gpfsug-discuss] Scale / Perfmon / Grafana - services running but no data > I?m having a play with this now too. Has anybody coded a systemd unit to handle > step 2b in the knowledge centre article ? bridge creation on the gpfs side? It > would save me a bit of effort. > > > > I?m also wondering about the CherryPy version. It looks like this has been > developed on SLES which has the newer version mentioned as a standard package > and yet RHEL with an older version of CherryPy is perhaps more common as it > seems to have the best support for features of GPFS, like object and block > protocols. Maybe SLES is in favour now? > > > > Cheers, > > > > Greg > > > > From: gpfsug-discuss-bounces at spectrumscale.org > [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Andrew Beattie > Sent: Thursday, 6 July 2017 3:07 PM > To: gpfsug-discuss at spectrumscale.org > Subject: [gpfsug-discuss] Scale / Perfmon / Grafana - services running but no > data > > > > > Greetings, > > > > > > > > > I'm currently setting up Grafana to interact with one of our Scale Clusters > > > and i've followed the knowledge centre link in terms of setup. > > > > > > [ > https://www.ibm.com/support/knowledgecenter/en/STXKQY_4.2.3/com.ibm.spectrum.scale.v4r23.doc/bl1adv_setuppmbridgeforgrafana.htm > | > https://www.ibm.com/support/knowledgecenter/en/STXKQY_4.2.3/com.ibm.spectrum.scale.v4r23.doc/bl1adv_setuppmbridgeforgrafana.htm > ] > > > > > > However while everything appears to be working i'm not seeing any data coming > through the reports within the grafana server, even though I can see data in > the Scale GUI > > > > > > The current environment: > > > > > > [root at sc01n02 ~]# mmlscluster > > > GPFS cluster information > ======================== > GPFS cluster name: sc01.spectrum > GPFS cluster id: 18085710661892594990 > GPFS UID domain: sc01.spectrum > Remote shell command: /usr/bin/ssh > Remote file copy command: /usr/bin/scp > Repository type: CCR > > > Node Daemon node name IP address Admin node name Designation > ------------------------------------------------------------------ > 1 sc01n01 10.2.12.11 sc01n01 quorum-manager-perfmon > 2 sc01n02 10.2.12.12 sc01n02 quorum-manager-perfmon > 3 sc01n03 10.2.12.13 sc01n03 quorum-manager-perfmon > > > [root at sc01n02 ~]# > > > > > > > > > [root at sc01n02 ~]# mmlsconfig > Configuration data for cluster sc01.spectrum: > --------------------------------------------- > clusterName sc01.spectrum > clusterId 18085710661892594990 > autoload yes > profile gpfsProtocolDefaults > dmapiFileHandleSize 32 > minReleaseLevel 4.2.2.0 > ccrEnabled yes > cipherList AUTHONLY > maxblocksize 16M > [cesNodes] > maxMBpS 5000 > numaMemoryInterleave yes > enforceFilesetQuotaOnRoot yes > workerThreads 512 > [common] > tscCmdPortRange 60000-61000 > cesSharedRoot /ibm/cesSharedRoot/ces > cifsBypassTraversalChecking yes > syncSambaMetadataOps yes > cifsBypassShareLocksOnRename yes > adminMode central > > > File systems in cluster sc01.spectrum: > -------------------------------------- > /dev/cesSharedRoot > /dev/icos_demo > /dev/scale01 > [root at sc01n02 ~]# > > > > > > > > > [root at sc01n02 ~]# systemctl status pmcollector > ? pmcollector.service - LSB: Start the ZIMon performance monitor collector. > Loaded: loaded (/etc/rc.d/init.d/pmcollector) > Active: active (running) since Tue 2017-05-30 08:46:32 AEST; 1 months 6 days ago > Docs: man:systemd-sysv-generator(8) > Main PID: 2693 (ZIMonCollector) > CGroup: /system.slice/pmcollector.service > ??2693 /opt/IBM/zimon/ZIMonCollector -C /opt/IBM/zimon/ZIMonCollector.cfg... > ??2698 python /opt/IBM/zimon/bin/pmmonitor.py -f /opt/IBM/zimon/syshealth... > > > May 30 08:46:32 sc01n02 systemd[1]: Starting LSB: Start the ZIMon performance > mon...... > May 30 08:46:32 sc01n02 pmcollector[2584]: Starting performance monitor > collector... > May 30 08:46:32 sc01n02 systemd[1]: Started LSB: Start the ZIMon performance > moni...r.. > Hint: Some lines were ellipsized, use -l to show in full. > > > > > > From Grafana Server: > > > > > > > > > > > > > > > when I send a set of files to the cluster (3.8GB) I can see performance metrics > within the Scale GUI > > > > > > > > > > > > yet from the Grafana Dashboard im not seeing any data points > > > > > > > > > > > > Can anyone provide some hints as to what might be happening? > > > > > > > > > > > > Regards, > > > > > > > > > Andrew Beattie > > > Software Defined Storage - IT Specialist > > > Phone: 614-2133-7927 > > > E-mail: [ mailto:abeattie at au1.ibm.com | abeattie at au1.ibm.com ] > > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss From jjdoherty at yahoo.com Mon Jul 24 14:11:12 2017 From: jjdoherty at yahoo.com (Jim Doherty) Date: Mon, 24 Jul 2017 13:11:12 +0000 (UTC) Subject: [gpfsug-discuss] Gpfs Memory Usaage Keeps going up and we don't know why. References: <261384244.3866909.1500901872347.ref@mail.yahoo.com> Message-ID: <261384244.3866909.1500901872347@mail.yahoo.com> There are 3 places that the GPFS mmfsd uses memory? the pagepool? plus 2 shared memory segments.?? To see the memory utilization of the shared memory segments run the command?? mmfsadm dump malloc .??? The statistics for memory pool id 2 is where? maxFilesToCache/maxStatCache objects are? and the manager nodes use memory pool id 3 to track the MFTC/MSC objects.?? You might want to upgrade to later PTF? as there was a PTF to fix a memory leak that occurred in tscomm associated with network connection drops.?? On Monday, July 24, 2017 5:29 AM, Peter Childs wrote: We have two GPFS clusters. One is fairly old and running 4.2.1-2 and non CCR and the nodes run fine using up about 1.5G of memory and is consistent (GPFS pagepool is set to 1G, so that looks about right.) The other one is "newer" running 4.2.1-3 with CCR and the nodes keep increasing in there memory usage, starting at about 1.1G and are find for a few days however after a while they grow to 4.2G which when the node need to run real work, means the work can't be done. I'm losing track of what maybe different other than CCR, and I'm trying to find some more ideas of where to look. I'm checked all the standard things like pagepool and maxFilesToCache (set to the default of 4000), workerThreads is set to 128 on the new gpfs cluster (against default 48 on the old) I'm not sure what else to look at on this one hence why I'm asking the community. Thanks in advance Peter Childs ITS Research Storage Queen Mary University of London. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From p.childs at qmul.ac.uk Mon Jul 24 14:30:49 2017 From: p.childs at qmul.ac.uk (Peter Childs) Date: Mon, 24 Jul 2017 13:30:49 +0000 Subject: [gpfsug-discuss] Gpfs Memory Usaage Keeps going up and we don't know why. In-Reply-To: <261384244.3866909.1500901872347@mail.yahoo.com> References: <261384244.3866909.1500901872347.ref@mail.yahoo.com> <261384244.3866909.1500901872347@mail.yahoo.com> Message-ID: <1500903047.571.7.camel@qmul.ac.uk> I've had a look at mmfsadm dump malloc and it looks to agree with the output from mmdiag --memory. and does not seam to account for the excessive memory usage. The new machines do have idleSocketTimout set to 0 from what your saying it could be related to keeping that many connections between nodes working. Thanks in advance Peter. [root at dn29 ~]# mmdiag --memory === mmdiag: memory === mmfsd heap size: 2039808 bytes Statistics for MemoryPool id 1 ("Shared Segment (EPHEMERAL)") 128 bytes in use 17500049370 hard limit on memory usage 1048576 bytes committed to regions 1 number of regions 555 allocations 555 frees 0 allocation failures Statistics for MemoryPool id 2 ("Shared Segment") 42179592 bytes in use 17500049370 hard limit on memory usage 56623104 bytes committed to regions 9 number of regions 100027 allocations 79624 frees 0 allocation failures Statistics for MemoryPool id 3 ("Token Manager") 2099520 bytes in use 17500049370 hard limit on memory usage 16778240 bytes committed to regions 1 number of regions 4 allocations 0 frees 0 allocation failures On Mon, 2017-07-24 at 13:11 +0000, Jim Doherty wrote: There are 3 places that the GPFS mmfsd uses memory the pagepool plus 2 shared memory segments. To see the memory utilization of the shared memory segments run the command mmfsadm dump malloc . The statistics for memory pool id 2 is where maxFilesToCache/maxStatCache objects are and the manager nodes use memory pool id 3 to track the MFTC/MSC objects. You might want to upgrade to later PTF as there was a PTF to fix a memory leak that occurred in tscomm associated with network connection drops. On Monday, July 24, 2017 5:29 AM, Peter Childs wrote: We have two GPFS clusters. One is fairly old and running 4.2.1-2 and non CCR and the nodes run fine using up about 1.5G of memory and is consistent (GPFS pagepool is set to 1G, so that looks about right.) The other one is "newer" running 4.2.1-3 with CCR and the nodes keep increasing in there memory usage, starting at about 1.1G and are find for a few days however after a while they grow to 4.2G which when the node need to run real work, means the work can't be done. I'm losing track of what maybe different other than CCR, and I'm trying to find some more ideas of where to look. I'm checked all the standard things like pagepool and maxFilesToCache (set to the default of 4000), workerThreads is set to 128 on the new gpfs cluster (against default 48 on the old) I'm not sure what else to look at on this one hence why I'm asking the community. Thanks in advance Peter Childs ITS Research Storage Queen Mary University of London. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -- Peter Childs ITS Research Storage Queen Mary, University of London -------------- next part -------------- An HTML attachment was scrubbed... URL: From jjdoherty at yahoo.com Mon Jul 24 15:10:45 2017 From: jjdoherty at yahoo.com (Jim Doherty) Date: Mon, 24 Jul 2017 14:10:45 +0000 (UTC) Subject: [gpfsug-discuss] Gpfs Memory Usaage Keeps going up and we don't know why. In-Reply-To: <1500903047.571.7.camel@qmul.ac.uk> References: <261384244.3866909.1500901872347.ref@mail.yahoo.com> <261384244.3866909.1500901872347@mail.yahoo.com> <1500903047.571.7.camel@qmul.ac.uk> Message-ID: <1770436429.3911327.1500905445052@mail.yahoo.com> How are you identifying? the high memory usage???? On Monday, July 24, 2017 9:30 AM, Peter Childs wrote: I've had a look at mmfsadm dump malloc and it looks to agree with the output from mmdiag --memory. and does not seam to account for the excessive memory usage. The new machines do have idleSocketTimout set to 0 from what your saying it could be related to keeping that many connections between nodes working. Thanks in advance Peter. [root at dn29 ~]# mmdiag --memory === mmdiag: memory ===mmfsd heap size: 2039808 bytes Statistics for MemoryPool id 1 ("Shared Segment (EPHEMERAL)")???????????128 bytes in use???17500049370 hard limit on memory usage???????1048576 bytes committed to regions?????????????1 number of regions???????????555 allocations???????????555 frees?????????????0 allocation failures Statistics for MemoryPool id 2 ("Shared Segment")??????42179592 bytes in use???17500049370 hard limit on memory usage??????56623104 bytes committed to regions?????????????9 number of regions????????100027 allocations?????????79624 frees?????????????0 allocation failures Statistics for MemoryPool id 3 ("Token Manager")???????2099520 bytes in use???17500049370 hard limit on memory usage??????16778240 bytes committed to regions?????????????1 number of regions?????????????4 allocations?????????????0 frees?????????????0 allocation failures On Mon, 2017-07-24 at 13:11 +0000, Jim Doherty wrote: There are 3 places that the GPFS mmfsd uses memory? the pagepool? plus 2 shared memory segments.?? To see the memory utilization of the shared memory segments run the command?? mmfsadm dump malloc .??? The statistics for memory pool id 2 is where? maxFilesToCache/maxStatCache objects are? and the manager nodes use memory pool id 3 to track the MFTC/MSC objects.?? You might want to upgrade to later PTF? as there was a PTF to fix a memory leak that occurred in tscomm associated with network connection drops.?? On Monday, July 24, 2017 5:29 AM, Peter Childs wrote: We have two GPFS clusters. One is fairly old and running 4.2.1-2 and non CCR and the nodes run fine using up about 1.5G of memory and is consistent (GPFS pagepool is set to 1G, so that looks about right.) The other one is "newer" running 4.2.1-3 with CCR and the nodes keep increasing in there memory usage, starting at about 1.1G and are find for a few days however after a while they grow to 4.2G which when the node need to run real work, means the work can't be done. I'm losing track of what maybe different other than CCR, and I'm trying to find some more ideas of where to look. I'm checked all the standard things like pagepool and maxFilesToCache (set to the default of 4000), workerThreads is set to 128 on the new gpfs cluster (against default 48 on the old) I'm not sure what else to look at on this one hence why I'm asking the community. Thanks in advance Peter Childs ITS Research Storage Queen Mary University of London. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -- Peter ChildsITS Research StorageQueen Mary, University of London _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From p.childs at qmul.ac.uk Mon Jul 24 15:21:27 2017 From: p.childs at qmul.ac.uk (Peter Childs) Date: Mon, 24 Jul 2017 14:21:27 +0000 Subject: [gpfsug-discuss] Gpfs Memory Usaage Keeps going up and we don't know why. In-Reply-To: <1770436429.3911327.1500905445052@mail.yahoo.com> References: <261384244.3866909.1500901872347.ref@mail.yahoo.com> <261384244.3866909.1500901872347@mail.yahoo.com> <1500903047.571.7.camel@qmul.ac.uk> <1770436429.3911327.1500905445052@mail.yahoo.com> Message-ID: <1500906086.571.9.camel@qmul.ac.uk> top but ps gives the same value. [root at dn29 ~]# ps auww -q 4444 USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND root 4444 2.7 22.3 10537600 5472580 ? S wrote: I've had a look at mmfsadm dump malloc and it looks to agree with the output from mmdiag --memory. and does not seam to account for the excessive memory usage. The new machines do have idleSocketTimout set to 0 from what your saying it could be related to keeping that many connections between nodes working. Thanks in advance Peter. [root at dn29 ~]# mmdiag --memory === mmdiag: memory === mmfsd heap size: 2039808 bytes Statistics for MemoryPool id 1 ("Shared Segment (EPHEMERAL)") 128 bytes in use 17500049370 hard limit on memory usage 1048576 bytes committed to regions 1 number of regions 555 allocations 555 frees 0 allocation failures Statistics for MemoryPool id 2 ("Shared Segment") 42179592 bytes in use 17500049370 hard limit on memory usage 56623104 bytes committed to regions 9 number of regions 100027 allocations 79624 frees 0 allocation failures Statistics for MemoryPool id 3 ("Token Manager") 2099520 bytes in use 17500049370 hard limit on memory usage 16778240 bytes committed to regions 1 number of regions 4 allocations 0 frees 0 allocation failures On Mon, 2017-07-24 at 13:11 +0000, Jim Doherty wrote: There are 3 places that the GPFS mmfsd uses memory the pagepool plus 2 shared memory segments. To see the memory utilization of the shared memory segments run the command mmfsadm dump malloc . The statistics for memory pool id 2 is where maxFilesToCache/maxStatCache objects are and the manager nodes use memory pool id 3 to track the MFTC/MSC objects. You might want to upgrade to later PTF as there was a PTF to fix a memory leak that occurred in tscomm associated with network connection drops. On Monday, July 24, 2017 5:29 AM, Peter Childs wrote: We have two GPFS clusters. One is fairly old and running 4.2.1-2 and non CCR and the nodes run fine using up about 1.5G of memory and is consistent (GPFS pagepool is set to 1G, so that looks about right.) The other one is "newer" running 4.2.1-3 with CCR and the nodes keep increasing in there memory usage, starting at about 1.1G and are find for a few days however after a while they grow to 4.2G which when the node need to run real work, means the work can't be done. I'm losing track of what maybe different other than CCR, and I'm trying to find some more ideas of where to look. I'm checked all the standard things like pagepool and maxFilesToCache (set to the default of 4000), workerThreads is set to 128 on the new gpfs cluster (against default 48 on the old) I'm not sure what else to look at on this one hence why I'm asking the community. Thanks in advance Peter Childs ITS Research Storage Queen Mary University of London. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -- Peter Childs ITS Research Storage Queen Mary, University of London _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -- Peter Childs ITS Research Storage Queen Mary, University of London -------------- next part -------------- An HTML attachment was scrubbed... URL: From adam.huffman at crick.ac.uk Mon Jul 24 15:40:51 2017 From: adam.huffman at crick.ac.uk (Adam Huffman) Date: Mon, 24 Jul 2017 14:40:51 +0000 Subject: [gpfsug-discuss] Gpfs Memory Usaage Keeps going up and we don't know why. In-Reply-To: <1500906086.571.9.camel@qmul.ac.uk> References: <261384244.3866909.1500901872347.ref@mail.yahoo.com> <261384244.3866909.1500901872347@mail.yahoo.com> <1500903047.571.7.camel@qmul.ac.uk> <1770436429.3911327.1500905445052@mail.yahoo.com> <1500906086.571.9.camel@qmul.ac.uk> Message-ID: <1CC632F0-55DB-4185-8177-B814A2F8A874@crick.ac.uk> smem is recommended here Cheers, Adam -- Adam Huffman Senior HPC and Cloud Systems Engineer The Francis Crick Institute 1 Midland Road London NW1 1AT T: 020 3796 1175 E: adam.huffman at crick.ac.uk W: www.crick.ac.uk On 24 Jul 2017, at 15:21, Peter Childs > wrote: top but ps gives the same value. [root at dn29 ~]# ps auww -q 4444 USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND root 4444 2.7 22.3 10537600 5472580 ? S> wrote: I've had a look at mmfsadm dump malloc and it looks to agree with the output from mmdiag --memory. and does not seam to account for the excessive memory usage. The new machines do have idleSocketTimout set to 0 from what your saying it could be related to keeping that many connections between nodes working. Thanks in advance Peter. [root at dn29 ~]# mmdiag --memory === mmdiag: memory === mmfsd heap size: 2039808 bytes Statistics for MemoryPool id 1 ("Shared Segment (EPHEMERAL)") 128 bytes in use 17500049370 hard limit on memory usage 1048576 bytes committed to regions 1 number of regions 555 allocations 555 frees 0 allocation failures Statistics for MemoryPool id 2 ("Shared Segment") 42179592 bytes in use 17500049370 hard limit on memory usage 56623104 bytes committed to regions 9 number of regions 100027 allocations 79624 frees 0 allocation failures Statistics for MemoryPool id 3 ("Token Manager") 2099520 bytes in use 17500049370 hard limit on memory usage 16778240 bytes committed to regions 1 number of regions 4 allocations 0 frees 0 allocation failures On Mon, 2017-07-24 at 13:11 +0000, Jim Doherty wrote: There are 3 places that the GPFS mmfsd uses memory the pagepool plus 2 shared memory segments. To see the memory utilization of the shared memory segments run the command mmfsadm dump malloc . The statistics for memory pool id 2 is where maxFilesToCache/maxStatCache objects are and the manager nodes use memory pool id 3 to track the MFTC/MSC objects. You might want to upgrade to later PTF as there was a PTF to fix a memory leak that occurred in tscomm associated with network connection drops. On Monday, July 24, 2017 5:29 AM, Peter Childs > wrote: We have two GPFS clusters. One is fairly old and running 4.2.1-2 and non CCR and the nodes run fine using up about 1.5G of memory and is consistent (GPFS pagepool is set to 1G, so that looks about right.) The other one is "newer" running 4.2.1-3 with CCR and the nodes keep increasing in there memory usage, starting at about 1.1G and are find for a few days however after a while they grow to 4.2G which when the node need to run real work, means the work can't be done. I'm losing track of what maybe different other than CCR, and I'm trying to find some more ideas of where to look. I'm checked all the standard things like pagepool and maxFilesToCache (set to the default of 4000), workerThreads is set to 128 on the new gpfs cluster (against default 48 on the old) I'm not sure what else to look at on this one hence why I'm asking the community. Thanks in advance Peter Childs ITS Research Storage Queen Mary University of London. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -- Peter Childs ITS Research Storage Queen Mary, University of London _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -- Peter Childs ITS Research Storage Queen Mary, University of London _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss The Francis Crick Institute Limited is a registered charity in England and Wales no. 1140062 and a company registered in England and Wales no. 06885462, with its registered office at 1 Midland Road London NW1 1AT -------------- next part -------------- An HTML attachment was scrubbed... URL: From jamiedavis at us.ibm.com Mon Jul 24 15:45:26 2017 From: jamiedavis at us.ibm.com (James Davis) Date: Mon, 24 Jul 2017 14:45:26 +0000 Subject: [gpfsug-discuss] GPFS, LTFS/EE and data-in-inode? In-Reply-To: <33069.1500675853@turing-police.cc.vt.edu> References: <33069.1500675853@turing-police.cc.vt.edu>, <28986.1500671597@turing-police.cc.vt.edu> Message-ID: An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: attlisjw.dat Type: application/octet-stream Size: 497 bytes Desc: not available URL: From r.sobey at imperial.ac.uk Mon Jul 24 15:50:57 2017 From: r.sobey at imperial.ac.uk (Sobey, Richard A) Date: Mon, 24 Jul 2017 14:50:57 +0000 Subject: [gpfsug-discuss] GPFS, LTFS/EE and data-in-inode? In-Reply-To: References: <33069.1500675853@turing-police.cc.vt.edu>, <28986.1500671597@turing-police.cc.vt.edu> Message-ID: I suppose the distinction between data, metadata and data IN metadata could be made. Whilst it is clear to me (us) now, perhaps the thought was that the data would be encrypted even if it was stored inside the metadata. My two pence. Richard From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of James Davis Sent: 24 July 2017 15:45 To: gpfsug-discuss at spectrumscale.org Subject: Re: [gpfsug-discuss] GPFS, LTFS/EE and data-in-inode? Hey all, On the documentation of encryption restrictions and encryption/HAWC interplay... The encryption documentation currently states: "Secure storage uses encryption to make data unreadable to anyone who does not possess the necessary encryption keys...Only data, not metadata, is encrypted." The HAWC restrictions include: "Encrypted data is never stored in the recovery log..." If this is unclear, I'm open to suggestions for improvements. Cordially, Jamie ----- Original message ----- From: valdis.kletnieks at vt.edu Sent by: gpfsug-discuss-bounces at spectrumscale.org To: gpfsug main discussion list > Cc: Subject: Re: [gpfsug-discuss] GPFS, LTFS/EE and data-in-inode? Date: Fri, Jul 21, 2017 6:24 PM On Fri, 21 Jul 2017 22:04:32 -0000, Sven Oehme said: > i talked with a few others to confirm this, but unfortunate this is a > limitation of the code today (maybe not well documented which we will look > into). Encryption only encrypts data blocks, it doesn't encrypt metadata. > Hence, if encryption is enabled, we don't store data in the inode, because > then it wouldn't be encrypted. For the same reason HAWC and encryption are > incompatible. I can live with that restriction if it's documented better, thanks... [Document Icon]attq4saq.dat Type: application/pgp-signature Name: attq4saq.dat _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From jonathan at buzzard.me.uk Mon Jul 24 15:57:13 2017 From: jonathan at buzzard.me.uk (Jonathan Buzzard) Date: Mon, 24 Jul 2017 15:57:13 +0100 Subject: [gpfsug-discuss] GPFS, LTFS/EE and data-in-inode? In-Reply-To: References: <33069.1500675853@turing-police.cc.vt.edu> , <28986.1500671597@turing-police.cc.vt.edu> Message-ID: <1500908233.4387.194.camel@buzzard.me.uk> On Mon, 2017-07-24 at 14:45 +0000, James Davis wrote: > Hey all, > > On the documentation of encryption restrictions and encryption/HAWC > interplay... > > The encryption documentation currently states: > > "Secure storage uses encryption to make data unreadable to anyone who > does not possess the necessary encryption keys...Only data, not > metadata, is encrypted." > > The HAWC restrictions include: > > "Encrypted data is never stored in the recovery log..." > > If this is unclear, I'm open to suggestions for improvements. > Just because *DATA* is stored in the metadata does not make it magically metadata. It's still data so you could quite reasonably conclude that it is encrypted. We have now been disabused of this, but the documentation is not clear and needs clarifying. Perhaps say metadata blocks are not encrypted. Or just a simple data stored in inodes is not encrypted would suffice. JAB. -- Jonathan A. Buzzard Email: jonathan (at) buzzard.me.uk Fife, United Kingdom. From valdis.kletnieks at vt.edu Mon Jul 24 16:49:07 2017 From: valdis.kletnieks at vt.edu (valdis.kletnieks at vt.edu) Date: Mon, 24 Jul 2017 11:49:07 -0400 Subject: [gpfsug-discuss] GPFS, LTFS/EE and data-in-inode? In-Reply-To: <1500896590.4387.167.camel@buzzard.me.uk> References: <28986.1500671597@turing-police.cc.vt.edu> <1500896590.4387.167.camel@buzzard.me.uk> Message-ID: <17702.1500911347@turing-police.cc.vt.edu> On Mon, 24 Jul 2017 12:43:10 +0100, Jonathan Buzzard said: > For an archive service how about only accepting files in actual > "archive" formats and then severely restricting the number of files a > user can have? > > By archive files I am thinking like a .zip, tar.gz, tar.bz or similar. After having dealt with users who fill up disk storage for almost 4 decades now, I'm fully aware of those advantages. :) ( /me ponders when an IBM 2314 disk pack with 27M of space was "a lot" in 1978, and when we moved 2 IBM mainframes in 1989, 400G took 2,500+ square feet, and now 8T drives are all over the place...) On the flip side, my current project is migrating 5 petabytes of data from our old archive system that didn't have such rules (mostly due to politics and the fact that the underlying XFS filesystem uses a 4K blocksize so it wasn't as big an issue), so I'm stuck with what people put in there years ago. -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 486 bytes Desc: not available URL: From Mark.Bush at siriuscom.com Mon Jul 24 16:49:26 2017 From: Mark.Bush at siriuscom.com (Mark Bush) Date: Mon, 24 Jul 2017 15:49:26 +0000 Subject: [gpfsug-discuss] export nfs share on gpfs with no authentication In-Reply-To: References: Message-ID: Ilan, you must create some type of authentication mechanism for CES to work properly first. If you want a quick and dirty way that would just use your local /etc/passwd try this. /usr/lpp/mmfs/bin/mmuserauth service create --data-access-method file --type userdefined Mark -----Original Message----- From: Ilan Schwarts [mailto:ilan84 at gmail.com] Sent: Monday, July 24, 2017 5:37 AM To: gpfsug main discussion list Subject: [gpfsug-discuss] export nfs share on gpfs with no authentication Hi, I have gpfs with 2 Nodes (redhat). I am trying to create NFS share - So I would be able to mount and access it from another linux machine. I receive error: Current authentication: none is invalid. What do i need to configure ? PLEASE NOTE: I dont have the SMB package at the moment, I dont want authentication on the NFS export.. While trying to create NFS (I execute the following): [root at LH20-GPFS1 ~]# mmnfs export add /fs_gpfs01 -c "* Access_Type=RW,Protocols=3:4,Squash=no_root_squash)" I receive the following error: [root at LH20-GPFS1 ~]# mmnfs export add /fs_gpfs01 -c "*(Access_Type=RW,Protocols=3:4,Squash=no_root_squash)" mmcesfuncs.sh: Current authentication: none is invalid. This operation can not be completed without correct Authentication configuration. Configure authentication using: mmuserauth mmnfs export add: Command failed. Examine previous error messages to determine cause. [root at LH20-GPFS1 ~]# mmuserauth service list FILE access not configured PARAMETERS VALUES ------------------------------------------------- OBJECT access not configured PARAMETERS VALUES ------------------------------------------------- [root at LH20-GPFS1 ~]# Some additional information on cluster: ============================== [root at LH20-GPFS1 ~]# mmlsmgr file system manager node ---------------- ------------------ fs_gpfs01 10.10.158.61 (LH20-GPFS1) Cluster manager node: 10.10.158.61 (LH20-GPFS1) [root at LH20-GPFS1 ~]# mmgetstate -a Node number Node name GPFS state ------------------------------------------ 1 LH20-GPFS1 active 3 LH20-GPFS2 active [root at LH20-GPFS1 ~]# mmlscluster GPFS cluster information ======================== GPFS cluster name: LH20-GPFS1 GPFS cluster id: 10777108240438931454 GPFS UID domain: LH20-GPFS1 Remote shell command: /usr/bin/ssh Remote file copy command: /usr/bin/scp Repository type: CCR Node Daemon node name IP address Admin node name Designation -------------------------------------------------------------------- 1 LH20-GPFS1 10.10.158.61 LH20-GPFS1 quorum-manager 3 LH20-GPFS2 10.10.158.62 LH20-GPFS2 quorum _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss This message (including any attachments) is intended only for the use of the individual or entity to which it is addressed and may contain information that is non-public, proprietary, privileged, confidential, and exempt from disclosure under applicable law. If you are not the intended recipient, you are hereby notified that any use, dissemination, distribution, or copying of this communication is strictly prohibited. This message may be viewed by parties at Sirius Computer Solutions other than those named in the message header. This message does not contain an official representation of Sirius Computer Solutions. If you have received this communication in error, notify Sirius Computer Solutions immediately and (i) destroy this message if a facsimile or (ii) delete this message immediately if this is an electronic communication. Thank you. Sirius Computer Solutions From valdis.kletnieks at vt.edu Mon Jul 24 17:35:34 2017 From: valdis.kletnieks at vt.edu (valdis.kletnieks at vt.edu) Date: Mon, 24 Jul 2017 12:35:34 -0400 Subject: [gpfsug-discuss] export nfs share on gpfs with no authentication In-Reply-To: References: Message-ID: <27469.1500914134@turing-police.cc.vt.edu> On Mon, 24 Jul 2017 13:36:41 +0300, Ilan Schwarts said: > Hi, > I have gpfs with 2 Nodes (redhat). > I am trying to create NFS share - So I would be able to mount and > access it from another linux machine. > While trying to create NFS (I execute the following): > [root at LH20-GPFS1 ~]# mmnfs export add /fs_gpfs01 -c "* > Access_Type=RW,Protocols=3:4,Squash=no_root_squash)" You can get away with little to no authentication for NFSv3, but not for NFSv4. Try with Protocols=3 only and mmuserauth service create --type userdefined that should get you Unix-y NFSv3 UID/GID support and "trust what the NFS client tells you". This of course only works sanely if each NFS export is only to a set of machines in the same administrative domain that manages their UID/GIDs. Exporting to two sets of machines that don't coordinate their UID/GID space is, of course, where hilarity and hijinks ensue.... -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 486 bytes Desc: not available URL: From luke.raimbach at googlemail.com Mon Jul 24 23:23:03 2017 From: luke.raimbach at googlemail.com (Luke Raimbach) Date: Mon, 24 Jul 2017 22:23:03 +0000 Subject: [gpfsug-discuss] Gpfs Memory Usaage Keeps going up and we don't know why. In-Reply-To: <1CC632F0-55DB-4185-8177-B814A2F8A874@crick.ac.uk> References: <261384244.3866909.1500901872347.ref@mail.yahoo.com> <261384244.3866909.1500901872347@mail.yahoo.com> <1500903047.571.7.camel@qmul.ac.uk> <1770436429.3911327.1500905445052@mail.yahoo.com> <1500906086.571.9.camel@qmul.ac.uk> <1CC632F0-55DB-4185-8177-B814A2F8A874@crick.ac.uk> Message-ID: Switch of CCR and see what happens. On Mon, 24 Jul 2017, 15:40 Adam Huffman, wrote: > smem is recommended here > > Cheers, > Adam > > -- > > Adam Huffman > Senior HPC and Cloud Systems Engineer > The Francis Crick Institute > 1 Midland Road > London NW1 1AT > > T: 020 3796 1175 > E: adam.huffman at crick.ac.uk > W: www.crick.ac.uk > > > > > > On 24 Jul 2017, at 15:21, Peter Childs wrote: > > > top > > but ps gives the same value. > > [root at dn29 ~]# ps auww -q 4444 > USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND > root 4444 2.7 22.3 10537600 5472580 ? S /usr/lpp/mmfs/bin/mmfsd > > Thanks for the help > > Peter. > > > On Mon, 2017-07-24 at 14:10 +0000, Jim Doherty wrote: > > How are you identifying the high memory usage? > > > On Monday, July 24, 2017 9:30 AM, Peter Childs > wrote: > > > I've had a look at mmfsadm dump malloc and it looks to agree with the > output from mmdiag --memory. and does not seam to account for the excessive > memory usage. > > The new machines do have idleSocketTimout set to 0 from what your saying > it could be related to keeping that many connections between nodes working. > > Thanks in advance > > Peter. > > > > > [root at dn29 ~]# mmdiag --memory > > === mmdiag: memory === > mmfsd heap size: 2039808 bytes > > > Statistics for MemoryPool id 1 ("Shared Segment (EPHEMERAL)") > 128 bytes in use > 17500049370 hard limit on memory usage > 1048576 bytes committed to regions > 1 number of regions > 555 allocations > 555 frees > 0 allocation failures > > > Statistics for MemoryPool id 2 ("Shared Segment") > 42179592 bytes in use > 17500049370 hard limit on memory usage > 56623104 bytes committed to regions > 9 number of regions > 100027 allocations > 79624 frees > 0 allocation failures > > > Statistics for MemoryPool id 3 ("Token Manager") > 2099520 bytes in use > 17500049370 hard limit on memory usage > 16778240 bytes committed to regions > 1 number of regions > 4 allocations > 0 frees > 0 allocation failures > > > On Mon, 2017-07-24 at 13:11 +0000, Jim Doherty wrote: > > There are 3 places that the GPFS mmfsd uses memory the pagepool plus 2 > shared memory segments. To see the memory utilization of the shared > memory segments run the command mmfsadm dump malloc . The statistics > for memory pool id 2 is where maxFilesToCache/maxStatCache objects are > and the manager nodes use memory pool id 3 to track the MFTC/MSC objects. > > You might want to upgrade to later PTF as there was a PTF to fix a memory > leak that occurred in tscomm associated with network connection drops. > > > On Monday, July 24, 2017 5:29 AM, Peter Childs > wrote: > > > We have two GPFS clusters. > > One is fairly old and running 4.2.1-2 and non CCR and the nodes run > fine using up about 1.5G of memory and is consistent (GPFS pagepool is > set to 1G, so that looks about right.) > > The other one is "newer" running 4.2.1-3 with CCR and the nodes keep > increasing in there memory usage, starting at about 1.1G and are find > for a few days however after a while they grow to 4.2G which when the > node need to run real work, means the work can't be done. > > I'm losing track of what maybe different other than CCR, and I'm trying > to find some more ideas of where to look. > > I'm checked all the standard things like pagepool and maxFilesToCache > (set to the default of 4000), workerThreads is set to 128 on the new > gpfs cluster (against default 48 on the old) > > I'm not sure what else to look at on this one hence why I'm asking the > community. > > Thanks in advance > > Peter Childs > ITS Research Storage > Queen Mary University of London. > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.orghttp://gpfsug.org/mailman/listinfo/gpfsug-discuss > > -- > > Peter Childs > ITS Research Storage > Queen Mary, University of London > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.orghttp://gpfsug.org/mailman/listinfo/gpfsug-discuss > > -- > > Peter Childs > ITS Research Storage > Queen Mary, University of London > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > The Francis Crick Institute Limited is a registered charity in England and > Wales no. 1140062 and a company registered in England and Wales no. > 06885462, with its registered office at 1 Midland Road London NW1 1AT > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ilan84 at gmail.com Tue Jul 25 05:52:11 2017 From: ilan84 at gmail.com (Ilan Schwarts) Date: Tue, 25 Jul 2017 07:52:11 +0300 Subject: [gpfsug-discuss] export nfs share on gpfs with no authentication In-Reply-To: <27469.1500914134@turing-police.cc.vt.edu> References: <27469.1500914134@turing-police.cc.vt.edu> Message-ID: Hi, While trying to add the userdefined auth, I receive error that SMB service not enabled. I am currently working on a spectrum scale cluster, and i dont have the SMB package, I am waiting for it.. is there a way to export NFSv3 using the spectrum scale tools without SMB package ? [root at LH20-GPFS1 ~]# mmuserauth service create --type userdefined : SMB service not enabled. Enable SMB service first. mmcesuserauthcrservice: Command failed. Examine previous error messages to determine cause. I exported the NFS via /etc/exports and than ./exportfs -a .. It works fine, I was able to mount the gpfs export from another machine.. this was my work-around since the spectrum scale tools failed to export NFSv3 On Mon, Jul 24, 2017 at 7:35 PM, wrote: > On Mon, 24 Jul 2017 13:36:41 +0300, Ilan Schwarts said: >> Hi, >> I have gpfs with 2 Nodes (redhat). >> I am trying to create NFS share - So I would be able to mount and >> access it from another linux machine. > >> While trying to create NFS (I execute the following): >> [root at LH20-GPFS1 ~]# mmnfs export add /fs_gpfs01 -c "* >> Access_Type=RW,Protocols=3:4,Squash=no_root_squash)" > > You can get away with little to no authentication for NFSv3, but > not for NFSv4. Try with Protocols=3 only and > > mmuserauth service create --type userdefined > > that should get you Unix-y NFSv3 UID/GID support and "trust what the NFS > client tells you". This of course only works sanely if each NFS export is > only to a set of machines in the same administrative domain that manages their > UID/GIDs. Exporting to two sets of machines that don't coordinate their > UID/GID space is, of course, where hilarity and hijinks ensue.... > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -- - Ilan Schwarts From ulmer at ulmer.org Tue Jul 25 06:33:13 2017 From: ulmer at ulmer.org (Stephen Ulmer) Date: Tue, 25 Jul 2017 01:33:13 -0400 Subject: [gpfsug-discuss] GPFS, LTFS/EE and data-in-inode? In-Reply-To: <1500908233.4387.194.camel@buzzard.me.uk> References: <33069.1500675853@turing-police.cc.vt.edu> <28986.1500671597@turing-police.cc.vt.edu> <1500908233.4387.194.camel@buzzard.me.uk> Message-ID: <1233C5A4-A8C9-4A56-AEC3-AE65DBB5D346@ulmer.org> > On Jul 24, 2017, at 10:57 AM, Jonathan Buzzard > wrote: > > On Mon, 2017-07-24 at 14:45 +0000, James Davis wrote: >> Hey all, >> >> On the documentation of encryption restrictions and encryption/HAWC >> interplay... >> >> The encryption documentation currently states: >> >> "Secure storage uses encryption to make data unreadable to anyone who >> does not possess the necessary encryption keys...Only data, not >> metadata, is encrypted." >> >> The HAWC restrictions include: >> >> "Encrypted data is never stored in the recovery log..." >> >> If this is unclear, I'm open to suggestions for improvements. >> > > Just because *DATA* is stored in the metadata does not make it magically > metadata. It's still data so you could quite reasonably conclude that it > is encrypted. > [?] > JAB. +1. Also, "Encrypted data is never stored in the recovery log?" does not make it clear whether: The data that is supposed to be encrypted is not written to the recovery log. The data that is supposed to be encrypted is written to the recovery log, but is not encrypted there. Thanks, -- Stephen -------------- next part -------------- An HTML attachment was scrubbed... URL: From jonathan at buzzard.me.uk Tue Jul 25 10:02:14 2017 From: jonathan at buzzard.me.uk (Jonathan Buzzard) Date: Tue, 25 Jul 2017 10:02:14 +0100 Subject: [gpfsug-discuss] GPFS, LTFS/EE and data-in-inode? In-Reply-To: <17702.1500911347@turing-police.cc.vt.edu> References: <28986.1500671597@turing-police.cc.vt.edu> <1500896590.4387.167.camel@buzzard.me.uk> <17702.1500911347@turing-police.cc.vt.edu> Message-ID: <1500973334.4387.201.camel@buzzard.me.uk> On Mon, 2017-07-24 at 11:49 -0400, valdis.kletnieks at vt.edu wrote: > On Mon, 24 Jul 2017 12:43:10 +0100, Jonathan Buzzard said: > > > For an archive service how about only accepting files in actual > > "archive" formats and then severely restricting the number of files a > > user can have? > > > > By archive files I am thinking like a .zip, tar.gz, tar.bz or similar. > > After having dealt with users who fill up disk storage for almost 4 decades > now, I'm fully aware of those advantages. :) > > ( /me ponders when an IBM 2314 disk pack with 27M of space was "a lot" in 1978, > and when we moved 2 IBM mainframes in 1989, 400G took 2,500+ square feet, and > now 8T drives are all over the place...) > > On the flip side, my current project is migrating 5 petabytes of data from our > old archive system that didn't have such rules (mostly due to politics and the > fact that the underlying XFS filesystem uses a 4K blocksize so it wasn't as big > an issue), so I'm stuck with what people put in there years ago. I would be tempted to zip up the directories and move them ziped ;-) JAB. -- Jonathan A. Buzzard Email: jonathan (at) buzzard.me.uk Fife, United Kingdom. From john.hearns at asml.com Tue Jul 25 10:30:28 2017 From: john.hearns at asml.com (John Hearns) Date: Tue, 25 Jul 2017 09:30:28 +0000 Subject: [gpfsug-discuss] GPFS, LTFS/EE and data-in-inode? In-Reply-To: <1500973334.4387.201.camel@buzzard.me.uk> References: <28986.1500671597@turing-police.cc.vt.edu> <1500896590.4387.167.camel@buzzard.me.uk> <17702.1500911347@turing-police.cc.vt.edu> <1500973334.4387.201.camel@buzzard.me.uk> Message-ID: I agree with Jonathan. In my experience, if you look at why there are many small files being stored by researchers, these are either the results of data acquisition - high speed cameras, microscopes, or in my experience a wind tunnel. Or the images are a sequence of images produced by a simulation which are later post-processed into a movie or Ensight/Paraview format. When questioned, the resaechers will always say "but I would like to keep this data available just in case". In reality those files are never looked at again. And as has been said if you have a tape based archiving system you could end up with thousands of small files being spread all over your tapes. So it is legitimate to make zips / tars of directories like that. I am intrigued to see that GPFS has a policy facility which can call an external program. That is useful. -----Original Message----- From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Jonathan Buzzard Sent: Tuesday, July 25, 2017 11:02 AM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] GPFS, LTFS/EE and data-in-inode? On Mon, 2017-07-24 at 11:49 -0400, valdis.kletnieks at vt.edu wrote: > On Mon, 24 Jul 2017 12:43:10 +0100, Jonathan Buzzard said: > > > For an archive service how about only accepting files in actual > > "archive" formats and then severely restricting the number of files > > a user can have? > > > > By archive files I am thinking like a .zip, tar.gz, tar.bz or similar. > > After having dealt with users who fill up disk storage for almost 4 > decades now, I'm fully aware of those advantages. :) > > ( /me ponders when an IBM 2314 disk pack with 27M of space was "a lot" > in 1978, and when we moved 2 IBM mainframes in 1989, 400G took 2,500+ > square feet, and now 8T drives are all over the place...) > > On the flip side, my current project is migrating 5 petabytes of data > from our old archive system that didn't have such rules (mostly due to > politics and the fact that the underlying XFS filesystem uses a 4K > blocksize so it wasn't as big an issue), so I'm stuck with what people put in there years ago. I would be tempted to zip up the directories and move them ziped ;-) JAB. -- Jonathan A. Buzzard Email: jonathan (at) buzzard.me.uk Fife, United Kingdom. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://emea01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fgpfsug.org%2Fmailman%2Flistinfo%2Fgpfsug-discuss&data=01%7C01%7Cjohn.hearns%40asml.com%7Ce8a4016223414177bf9408d4d33bdb31%7Caf73baa8f5944eb2a39d93e96cad61fc%7C1&sdata=pean0PRBgJJmtbZ7TwO%2BxiSvhKsba%2FRGI9VUCxhp6kM%3D&reserved=0 -- The information contained in this communication and any attachments is confidential and may be privileged, and is for the sole use of the intended recipient(s). Any unauthorized review, use, disclosure or distribution is prohibited. Unless explicitly stated otherwise in the body of this communication or the attachment thereto (if any), the information is provided on an AS-IS basis without any express or implied warranties or liabilities. To the extent you are relying on this information, you are doing so at your own risk. If you are not the intended recipient, please notify the sender immediately by replying to this message and destroy all copies of this message and any attachments. Neither the sender nor the company/group of companies he or she represents shall be liable for the proper and complete transmission of the information contained in this communication, or for any delay in its receipt. From jonathan at buzzard.me.uk Tue Jul 25 12:22:49 2017 From: jonathan at buzzard.me.uk (Jonathan Buzzard) Date: Tue, 25 Jul 2017 12:22:49 +0100 Subject: [gpfsug-discuss] GPFS, LTFS/EE and data-in-inode? In-Reply-To: References: <28986.1500671597@turing-police.cc.vt.edu> <1500896590.4387.167.camel@buzzard.me.uk> <17702.1500911347@turing-police.cc.vt.edu> <1500973334.4387.201.camel@buzzard.me.uk> Message-ID: <1500981769.4387.222.camel@buzzard.me.uk> On Tue, 2017-07-25 at 09:30 +0000, John Hearns wrote: > I agree with Jonathan. > > In my experience, if you look at why there are many small files being > stored by researchers, these are either the results of data acquisition > - high speed cameras, microscopes, or in my experience a wind tunnel. > Or the images are a sequence of images produced by a simulation which > are later post-processed into a movie or Ensight/Paraview format. When > questioned, the resaechers will always say "but I would like to keep > this data available just in case". In reality those files are never > looked at again. And as has been said if you have a tape based > archiving system you could end up with thousands of small files being > spread all over your tapes. So it is legitimate to make zips / tars of > directories like that. > Note that rules on data retention may require them to keep them for 10 years, so it is not unreasonable. Letting them spew thousands of files into an "archive" is not sensible. I was thinking of ways of getting the users to do it, and I guess leaving them with zero available file number quota in the new system would force them to zip up their data so they could add new stuff ;-) Archives in my view should have no quota on the space, only quota's on the number of files. Of course that might not be very popular. On reflection I think I would use a policy to restrict to files ending with .zip/.ZIP only. It's an archive and this format is effectively open source, widely understood and cross platform, and with the ZIP64 version will now stand the test of time too. Given it's an archive I would have a script that ran around setting all the files to immutable 7 days after creation too. Or maybe change the ownership and set a readonly ACL to the original user. Need to stop them changing stuff after the event if you are going to use to as part of your anti research fraud measures. JAB. -- Jonathan A. Buzzard Email: jonathan (at) buzzard.me.uk Fife, United Kingdom. From valdis.kletnieks at vt.edu Tue Jul 25 17:11:45 2017 From: valdis.kletnieks at vt.edu (valdis.kletnieks at vt.edu) Date: Tue, 25 Jul 2017 12:11:45 -0400 Subject: [gpfsug-discuss] GPFS, LTFS/EE and data-in-inode? In-Reply-To: <1500973334.4387.201.camel@buzzard.me.uk> References: <28986.1500671597@turing-police.cc.vt.edu> <1500896590.4387.167.camel@buzzard.me.uk> <17702.1500911347@turing-police.cc.vt.edu> <1500973334.4387.201.camel@buzzard.me.uk> Message-ID: <88035.1500999105@turing-police.cc.vt.edu> On Tue, 25 Jul 2017 10:02:14 +0100, Jonathan Buzzard said: > I would be tempted to zip up the directories and move them ziped ;-) Not an option, unless you want to come here and re-write the researcher's tracking systems that knows where they archived a given run, and teach it "Except now it's in a .tar.gz in that directory, or perhaps one or two directories higher up, under some name". Yes, researchers do that. And as the joke goes: "What's the difference between a tenured professor and a terrorist?" "You can negotiate with a terrorist..." Plus remember that most of these directories are currently scattered across multiple tapes, which means "zip up a directory" may mean reading as many as 10 to 20 tapes just to get the directory on disk so you can zip it up. As it is, I had to write code that recall and processes all the files on tape 1, *wherever they are in the file system*, free them from the source disk, recall and process all the files on tape 2, repeat until tape 3,857. (And due to funding issues 5 years ago which turned into a "who paid for what tapes" food fight, most of the tapes ended up with files from entirely different file systems on them, going into different filesets on the destination). (And in fact, the migration is currently hosed up because a researcher *is* doing pretty much that - recalling all the files from one directory, then the next, then the next, to get files they need urgently for a deliverable but haven't been moved to the new system. So rather than having 12 LTO-5 drives to multistream the tape recalls, I've got 12 recalls fighting for one drive while the researcher's processing is hogging the other 11, due to the way the previous system prioritizes in-line opens of files versus bulk recalls) -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 486 bytes Desc: not available URL: From scbatche at us.ibm.com Tue Jul 25 21:46:45 2017 From: scbatche at us.ibm.com (Scott C Batchelder) Date: Tue, 25 Jul 2017 15:46:45 -0500 Subject: [gpfsug-discuss] Baseline testing GPFS with gpfsperf Message-ID: Hello: I am wondering if I can get some more information on the gpfsperf tool for baseline testing GPFS. I want to record GPFS read and write performance for a file system on the cluster before I enable DMAPI and configure the HSM interface. The README for the tool does not offer much insight in how I should run this tool based on the cluster or file system settings. The cluster that I will be running this tool on will not have MPI installed and will have multiple file systems in the cluster. Are there some best practises for running this tool? For example: - Should the number of threads equal the number of NSDs for the file system? or equal to the number of nodes? - If I execute a large multi-threaded run of this tool from a single node in the cluster, will that give me an accurate result of the performance of the file system? Any feedback is appreciated. Thanks. Sincerely, Scott Batchelder Phone: 1-281-883-7926 E-mail: scbatche at us.ibm.com 12301 Kurland Dr Houston, TX 77034-4812 United States -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/gif Size: 2022 bytes Desc: not available URL: From valdis.kletnieks at vt.edu Wed Jul 26 00:59:08 2017 From: valdis.kletnieks at vt.edu (valdis.kletnieks at vt.edu) Date: Tue, 25 Jul 2017 19:59:08 -0400 Subject: [gpfsug-discuss] Baseline testing GPFS with gpfsperf In-Reply-To: References: Message-ID: <13777.1501027148@turing-police.cc.vt.edu> On Tue, 25 Jul 2017 15:46:45 -0500, "Scott C Batchelder" said: > - Should the number of threads equal the number of NSDs for the file > system? or equal to the number of nodes? Depends on what definition of "throughput" you are interested in. If your configuration has 50 clients banging on 5 NSD servers, your numbers for 5 threads and 50 threads are going to tell you subtly different things... (Basically, one thread per NSD is going to tell you the maximum that one client can expect to get with little to no contention, while one per client will tell you about the maximum *aggregate* that all 50 can get together - which is probably still giving each individual client less throughput than one-to-one....) We usually test with "exactly one thread total", "one thread per server", and "keep piling the clients on till the total number doesn't get any bigger". Also be aware that it only gives you insight to your workload performance if your workload is comprised of large file access - if your users are actually doing a lot of medium or small files, that changes the results dramatically as you end up possibly pounding on metadata more than the actual data.... -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 486 bytes Desc: not available URL: From varun.mittal at in.ibm.com Wed Jul 26 04:42:27 2017 From: varun.mittal at in.ibm.com (Varun Mittal3) Date: Wed, 26 Jul 2017 09:12:27 +0530 Subject: [gpfsug-discuss] export nfs share on gpfs with no authentication In-Reply-To: References: <27469.1500914134@turing-police.cc.vt.edu> Message-ID: Hi Did you try to run this command from a CES designated node ? If no, then try executing the command from a CES node: mmuserauth service create --data-access-type file --type userdefined Best regards, Varun Mittal Cloud/Object Scrum @ Spectrum Scale ETZ, Pune From: Ilan Schwarts To: gpfsug main discussion list Date: 25/07/2017 10:22 AM Subject: Re: [gpfsug-discuss] export nfs share on gpfs with no authentication Sent by: gpfsug-discuss-bounces at spectrumscale.org Hi, While trying to add the userdefined auth, I receive error that SMB service not enabled. I am currently working on a spectrum scale cluster, and i dont have the SMB package, I am waiting for it.. is there a way to export NFSv3 using the spectrum scale tools without SMB package ? [root at LH20-GPFS1 ~]# mmuserauth service create --type userdefined : SMB service not enabled. Enable SMB service first. mmcesuserauthcrservice: Command failed. Examine previous error messages to determine cause. I exported the NFS via /etc/exports and than ./exportfs -a .. It works fine, I was able to mount the gpfs export from another machine.. this was my work-around since the spectrum scale tools failed to export NFSv3 On Mon, Jul 24, 2017 at 7:35 PM, wrote: > On Mon, 24 Jul 2017 13:36:41 +0300, Ilan Schwarts said: >> Hi, >> I have gpfs with 2 Nodes (redhat). >> I am trying to create NFS share - So I would be able to mount and >> access it from another linux machine. > >> While trying to create NFS (I execute the following): >> [root at LH20-GPFS1 ~]# mmnfs export add /fs_gpfs01 -c "* >> Access_Type=RW,Protocols=3:4,Squash=no_root_squash)" > > You can get away with little to no authentication for NFSv3, but > not for NFSv4. Try with Protocols=3 only and > > mmuserauth service create --type userdefined > > that should get you Unix-y NFSv3 UID/GID support and "trust what the NFS > client tells you". This of course only works sanely if each NFS export is > only to a set of machines in the same administrative domain that manages their > UID/GIDs. Exporting to two sets of machines that don't coordinate their > UID/GID space is, of course, where hilarity and hijinks ensue.... > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -- - Ilan Schwarts _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: From varun.mittal at in.ibm.com Wed Jul 26 04:44:24 2017 From: varun.mittal at in.ibm.com (Varun Mittal3) Date: Wed, 26 Jul 2017 09:14:24 +0530 Subject: [gpfsug-discuss] export nfs share on gpfs with no authentication In-Reply-To: References: <27469.1500914134@turing-police.cc.vt.edu> Message-ID: Sorry a small typo: mmuserauth service create --data-access-method file --type userdefined Best regards, Varun Mittal Cloud/Object Scrum @ Spectrum Scale ETZ, Pune From: Varun Mittal3/India/IBM To: gpfsug main discussion list Date: 26/07/2017 09:12 AM Subject: Re: [gpfsug-discuss] export nfs share on gpfs with no authentication Hi Did you try to run this command from a CES designated node ? If no, then try executing the command from a CES node: mmuserauth service create --data-access-type file --type userdefined Best regards, Varun Mittal Cloud/Object Scrum @ Spectrum Scale ETZ, Pune From: Ilan Schwarts To: gpfsug main discussion list Date: 25/07/2017 10:22 AM Subject: Re: [gpfsug-discuss] export nfs share on gpfs with no authentication Sent by: gpfsug-discuss-bounces at spectrumscale.org Hi, While trying to add the userdefined auth, I receive error that SMB service not enabled. I am currently working on a spectrum scale cluster, and i dont have the SMB package, I am waiting for it.. is there a way to export NFSv3 using the spectrum scale tools without SMB package ? [root at LH20-GPFS1 ~]# mmuserauth service create --type userdefined : SMB service not enabled. Enable SMB service first. mmcesuserauthcrservice: Command failed. Examine previous error messages to determine cause. I exported the NFS via /etc/exports and than ./exportfs -a .. It works fine, I was able to mount the gpfs export from another machine.. this was my work-around since the spectrum scale tools failed to export NFSv3 On Mon, Jul 24, 2017 at 7:35 PM, wrote: > On Mon, 24 Jul 2017 13:36:41 +0300, Ilan Schwarts said: >> Hi, >> I have gpfs with 2 Nodes (redhat). >> I am trying to create NFS share - So I would be able to mount and >> access it from another linux machine. > >> While trying to create NFS (I execute the following): >> [root at LH20-GPFS1 ~]# mmnfs export add /fs_gpfs01 -c "* >> Access_Type=RW,Protocols=3:4,Squash=no_root_squash)" > > You can get away with little to no authentication for NFSv3, but > not for NFSv4. Try with Protocols=3 only and > > mmuserauth service create --type userdefined > > that should get you Unix-y NFSv3 UID/GID support and "trust what the NFS > client tells you". This of course only works sanely if each NFS export is > only to a set of machines in the same administrative domain that manages their > UID/GIDs. Exporting to two sets of machines that don't coordinate their > UID/GID space is, of course, where hilarity and hijinks ensue.... > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -- - Ilan Schwarts _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: From Mark.Bush at siriuscom.com Wed Jul 26 18:28:55 2017 From: Mark.Bush at siriuscom.com (Mark Bush) Date: Wed, 26 Jul 2017 17:28:55 +0000 Subject: [gpfsug-discuss] Lost disks Message-ID: I have a client has had an issue where all of the nsd disks disappeared in the cluster recently. Not sure if it's due to a back end disk issue or if it's a reboot that did it. But in their PMR they were told that all that data is lost now and that the disk headers didn't appear as GPFS disk headers. How on earth could something like that happen? Could it be a backend disk thing? They are confident that nobody tried to reformat disks but aren't 100% sure that something at the disk array couldn't have caused this. Is there an easy way to see if there is still data on these disks? Short of a full restore from backup what other options might they have? The mmlsnsd -X show's blanks for device and device type now. # mmlsnsd -X Disk name NSD volume ID Device Devtype Node name Remarks --------------------------------------------------------------------------------------------------- INGEST_FILEMGR_xis2301 0A23982E57FD995D - - ingest-filemgr01.a.fXXXXXXX.net (not found) server node INGEST_FILEMGR_xis2301 0A23982E57FD995D - - ingest-filemgr02.a.fXXXXXXX.net (not found) server node INGEST_FILEMGR_xis2302 0A23982E57FD9960 - - ingest-filemgr01.a.fXXXXXXX.net (not found) server node INGEST_FILEMGR_xis2302 0A23982E57FD9960 - - ingest-filemgr02.a.fXXXXXXX.net (not found) server node INGEST_FILEMGR_xis2303 0A23982E57FD9962 - - ingest-filemgr01.a.fXXXXXXX.net (not found) server node Mark This message (including any attachments) is intended only for the use of the individual or entity to which it is addressed and may contain information that is non-public, proprietary, privileged, confidential, and exempt from disclosure under applicable law. If you are not the intended recipient, you are hereby notified that any use, dissemination, distribution, or copying of this communication is strictly prohibited. This message may be viewed by parties at Sirius Computer Solutions other than those named in the message header. This message does not contain an official representation of Sirius Computer Solutions. If you have received this communication in error, notify Sirius Computer Solutions immediately and (i) destroy this message if a facsimile or (ii) delete this message immediately if this is an electronic communication. Thank you. Sirius Computer Solutions -------------- next part -------------- An HTML attachment was scrubbed... URL: From kums at us.ibm.com Wed Jul 26 18:37:45 2017 From: kums at us.ibm.com (Kumaran Rajaram) Date: Wed, 26 Jul 2017 13:37:45 -0400 Subject: [gpfsug-discuss] Baseline testing GPFS with gpfsperf In-Reply-To: <13777.1501027148@turing-police.cc.vt.edu> References: <13777.1501027148@turing-police.cc.vt.edu> Message-ID: Hi Scott, >>- Should the number of threads equal the number of NSDs for the file system? or equal to the number of nodes? >>- If I execute a large multi-threaded run of this tool from a single node in the cluster, will that give me an accurate result of the performance of the file system? To add to Valdis's note, the answer to above also depends on the node, network used for GPFS communication between client and server, as well as storage performance capabilities constituting the GPFS cluster/network/storage stack. As an example, if the storage subsystem (including controller + disks) hosting the file-system can deliver ~20 GB/s and the networking between NSD client and server is FDR 56Gb/s Infiniband (with verbsRdma = ~6GB/s). Assuming, one FDR-IB link (verbsPorts) is configured per NSD server as well as client, then you could need minimum of 4 x NSD servers (4 x 6GB/s ==> 24 GB/s) to saturate the backend storage. So, you would need to run gpfsperf (or anyother parallel I/O benchmark) across minimum of 4 x GPFS NSD clients to saturate the backend storage. You can scale the gpfsperf thread counts (-th parameter) depending on access pattern (buffered/dio etc) but this would only be able to drive load from single NSD client node. If you would like to drive I/O load from multiple NSD client nodes + synchronize the parallel runs across multiple nodes for accuracy, then gpfsperf-mpi would be strongly recommended. You would need to use MPI to launch gpfsperf-mpi across multiple NSD client nodes and scale the MPI processes (across NSD clients with 1 or more MPI process per NSD client) accordingly to drive the I/O load for good performance. >>The cluster that I will be running this tool on will not have MPI installed and will have multiple file systems in the cluster. Without MPI, alternative would be to use ssh or pdsh to launch gpfsperf across multiple nodes however if there are slow NSD clients then the performance may not be accurate (slow clients taking longer and after faster clients finished it will get all the network/storage resources skewing the performance analysis. You may also consider using parallel Iozone as it can be run across multiple node using rsh/ssh with combination of "-+m" and "-t" option. http://iozone.org/docs/IOzone_msword_98.pdf ## -+m filename Use this file to obtain the configuration informati on of the clients for cluster testing. The file contains one line for each client. Each line has th ree fields. The fields are space delimited. A # sign in column zero is a comment line. The first fi eld is the name of the client. The second field is the path, on the client, for the working directory where Iozone will execute. The third field is the path, on the client, for the executable Iozone. To use this option one must be able to execute comm ands on the clients without being challenged for a password. Iozone will start remote execution by using ?rsh" To use ssh, export RSH=/usr/bin/ssh -t # Run Iozone in a throughput mode. This option allows the user to specify how many threads or processes to have active during th e measurement. ## Hope this helps, -Kums From: valdis.kletnieks at vt.edu To: gpfsug main discussion list Date: 07/25/2017 07:59 PM Subject: Re: [gpfsug-discuss] Baseline testing GPFS with gpfsperf Sent by: gpfsug-discuss-bounces at spectrumscale.org On Tue, 25 Jul 2017 15:46:45 -0500, "Scott C Batchelder" said: > - Should the number of threads equal the number of NSDs for the file > system? or equal to the number of nodes? Depends on what definition of "throughput" you are interested in. If your configuration has 50 clients banging on 5 NSD servers, your numbers for 5 threads and 50 threads are going to tell you subtly different things... (Basically, one thread per NSD is going to tell you the maximum that one client can expect to get with little to no contention, while one per client will tell you about the maximum *aggregate* that all 50 can get together - which is probably still giving each individual client less throughput than one-to-one....) We usually test with "exactly one thread total", "one thread per server", and "keep piling the clients on till the total number doesn't get any bigger". Also be aware that it only gives you insight to your workload performance if your workload is comprised of large file access - if your users are actually doing a lot of medium or small files, that changes the results dramatically as you end up possibly pounding on metadata more than the actual data.... [attachment "att0twxd.dat" deleted by Kumaran Rajaram/Arlington/IBM] _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From Robert.Oesterlin at nuance.com Wed Jul 26 18:45:35 2017 From: Robert.Oesterlin at nuance.com (Oesterlin, Robert) Date: Wed, 26 Jul 2017 17:45:35 +0000 Subject: [gpfsug-discuss] Lost disks Message-ID: One way this could possible happen would be a system is being installed (I?m assuming this is Linux) and the FC adapter is active; then the OS install will see disks and wipe out the NSD descriptor on those disks. (Which is why the NSD V2 format was invented, to prevent this from happening) If you don?t lose all of the descriptors, it?s sometimes possible to manually re-construct the missing header information - I?m assuming since you opened a PMR, IBM has looked at this. This is a scenario I?ve had to recover from - twice. Back-end array issue seems unlikely to me, I?d keep looking at the systems with access to those LUNs and see what commands/operations could have been run. Bob Oesterlin Sr Principal Storage Engineer, Nuance From: on behalf of Mark Bush Reply-To: gpfsug main discussion list Date: Wednesday, July 26, 2017 at 12:29 PM To: "gpfsug-discuss at spectrumscale.org" Subject: [EXTERNAL] [gpfsug-discuss] Lost disks I have a client has had an issue where all of the nsd disks disappeared in the cluster recently. Not sure if it?s due to a back end disk issue or if it?s a reboot that did it. But in their PMR they were told that all that data is lost now and that the disk headers didn?t appear as GPFS disk headers. How on earth could something like that happen? Could it be a backend disk thing? They are confident that nobody tried to reformat disks but aren?t 100% sure that something at the disk array couldn?t have caused this. -------------- next part -------------- An HTML attachment was scrubbed... URL: From oehmes at gmail.com Wed Jul 26 19:18:38 2017 From: oehmes at gmail.com (Sven Oehme) Date: Wed, 26 Jul 2017 18:18:38 +0000 Subject: [gpfsug-discuss] Lost disks In-Reply-To: References: Message-ID: it can happen for multiple reasons , one is a linux install, unfortunate there are significant more simpler explanations. Linux as well as BIOS in servers from time to time looks for empty disks and puts a GPT label on it if the disk doesn't have one, etc. this thread is explaining a lot of this : https://www.ibm.com/developerworks/community/forums/html/topic?id=77777777-0000-0000-0000-000014439222 this is why we implemented NSD V2 format long time ago , unfortunate there is no way to convert an V1 NSD to a V2 nsd on an existing filesytem except you remove the NSDs one at a time and re-add them after you upgraded the system to at least GPFS 4.1 (i would recommend a later version like 4.2.3) some more details are here in this thread : https://www.ibm.com/developerworks/community/forums/html/threadTopic?id=5c1ee5bc-41b8-4318-a74e-4d962f82ce2e but a quick summary of the benefits of V2 are : - ? Support for GPT NSD ? - Adds a standard disk partition table (GPT type) to NSDs ? - Disk label support for Linux ? - New GPFS NSD v2 format provides the following benefits: ? - Includes a partition table so that the disk is recognized as a GPFS device ? - Adjusts data alignment to support disks with a 4 KB physical block size ? - Adds backup copies of some key GPFS data structures ? - Expands some reserved areas to allow for future growth the main reason we can't convert from V1 to V2 is the on disk format changed significant so we would have to move on disk data which is very risky. hope that explains this. Sven On Wed, Jul 26, 2017 at 10:29 AM Mark Bush wrote: > I have a client has had an issue where all of the nsd disks disappeared in > the cluster recently. Not sure if it?s due to a back end disk issue or if > it?s a reboot that did it. But in their PMR they were told that all that > data is lost now and that the disk headers didn?t appear as GPFS disk > headers. How on earth could something like that happen? Could it be a > backend disk thing? They are confident that nobody tried to reformat disks > but aren?t 100% sure that something at the disk array couldn?t have caused > this. > > > > Is there an easy way to see if there is still data on these disks? > > Short of a full restore from backup what other options might they have? > > > > The mmlsnsd -X show?s blanks for device and device type now. > > > > # mmlsnsd -X > > > > Disk name NSD volume ID Device Devtype Node > name Remarks > > > --------------------------------------------------------------------------------------------------- > > INGEST_FILEMGR_xis2301 0A23982E57FD995D - - > ingest-filemgr01.a.fXXXXXXX.net (not found) server node > > INGEST_FILEMGR_xis2301 0A23982E57FD995D - - > ingest-filemgr02.a.fXXXXXXX.net (not found) server node > > INGEST_FILEMGR_xis2302 0A23982E57FD9960 - - > ingest-filemgr01.a.fXXXXXXX.net (not found) server node > > INGEST_FILEMGR_xis2302 0A23982E57FD9960 - - > ingest-filemgr02.a.fXXXXXXX.net (not found) server node > > INGEST_FILEMGR_xis2303 0A23982E57FD9962 - - > ingest-filemgr01.a.fXXXXXXX.net (not found) server node > > > > > > *Mark* > > This message (including any attachments) is intended only for the use of > the individual or entity to which it is addressed and may contain > information that is non-public, proprietary, privileged, confidential, and > exempt from disclosure under applicable law. If you are not the intended > recipient, you are hereby notified that any use, dissemination, > distribution, or copying of this communication is strictly prohibited. This > message may be viewed by parties at Sirius Computer Solutions other than > those named in the message header. This message does not contain an > official representation of Sirius Computer Solutions. If you have received > this communication in error, notify Sirius Computer Solutions immediately > and (i) destroy this message if a facsimile or (ii) delete this message > immediately if this is an electronic communication. Thank you. > Sirius Computer Solutions > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: From Mark.Bush at siriuscom.com Wed Jul 26 19:19:15 2017 From: Mark.Bush at siriuscom.com (Mark Bush) Date: Wed, 26 Jul 2017 18:19:15 +0000 Subject: [gpfsug-discuss] Lost disks In-Reply-To: References: Message-ID: What is this manual header reconstruction you speak of? That doesn?t sound trivial at all. From: Oesterlin, Robert [mailto:Robert.Oesterlin at nuance.com] Sent: Wednesday, July 26, 2017 12:46 PM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] Lost disks One way this could possible happen would be a system is being installed (I?m assuming this is Linux) and the FC adapter is active; then the OS install will see disks and wipe out the NSD descriptor on those disks. (Which is why the NSD V2 format was invented, to prevent this from happening) If you don?t lose all of the descriptors, it?s sometimes possible to manually re-construct the missing header information - I?m assuming since you opened a PMR, IBM has looked at this. This is a scenario I?ve had to recover from - twice. Back-end array issue seems unlikely to me, I?d keep looking at the systems with access to those LUNs and see what commands/operations could have been run. Bob Oesterlin Sr Principal Storage Engineer, Nuance From: > on behalf of Mark Bush > Reply-To: gpfsug main discussion list > Date: Wednesday, July 26, 2017 at 12:29 PM To: "gpfsug-discuss at spectrumscale.org" > Subject: [EXTERNAL] [gpfsug-discuss] Lost disks I have a client has had an issue where all of the nsd disks disappeared in the cluster recently. Not sure if it?s due to a back end disk issue or if it?s a reboot that did it. But in their PMR they were told that all that data is lost now and that the disk headers didn?t appear as GPFS disk headers. How on earth could something like that happen? Could it be a backend disk thing? They are confident that nobody tried to reformat disks but aren?t 100% sure that something at the disk array couldn?t have caused this. This message (including any attachments) is intended only for the use of the individual or entity to which it is addressed and may contain information that is non-public, proprietary, privileged, confidential, and exempt from disclosure under applicable law. If you are not the intended recipient, you are hereby notified that any use, dissemination, distribution, or copying of this communication is strictly prohibited. This message may be viewed by parties at Sirius Computer Solutions other than those named in the message header. This message does not contain an official representation of Sirius Computer Solutions. If you have received this communication in error, notify Sirius Computer Solutions immediately and (i) destroy this message if a facsimile or (ii) delete this message immediately if this is an electronic communication. Thank you. Sirius Computer Solutions -------------- next part -------------- An HTML attachment was scrubbed... URL: From Robert.Oesterlin at nuance.com Wed Jul 26 20:05:59 2017 From: Robert.Oesterlin at nuance.com (Oesterlin, Robert) Date: Wed, 26 Jul 2017 19:05:59 +0000 Subject: [gpfsug-discuss] Lost disks In-Reply-To: References: Message-ID: IBM has a procedure for it that may work in some cases, but you?re manually editing the NSD descriptors on disk. Contact IBM if you think an NSD has been lost to descriptor being re-written. Bob Oesterlin Sr Principal Storage Engineer, Nuance From: on behalf of Mark Bush Reply-To: gpfsug main discussion list Date: Wednesday, July 26, 2017 at 1:19 PM To: gpfsug main discussion list Subject: [EXTERNAL] Re: [gpfsug-discuss] Lost disks What is this manual header reconstruction you speak of? That doesn?t sound trivial at all. -------------- next part -------------- An HTML attachment was scrubbed... URL: From john.hearns at asml.com Thu Jul 27 11:39:28 2017 From: john.hearns at asml.com (John Hearns) Date: Thu, 27 Jul 2017 10:39:28 +0000 Subject: [gpfsug-discuss] Lost disks In-Reply-To: References: Message-ID: Mark, I once rescued a system which had the disk partition on the OS disks deleted. (This was a system with a device mapper RAID pair of OS disks). Download a copy of sysrescue http://www.system-rescue-cd.org/ and create a bootable USB stick (or network boot). When you boot the system in sysrescue it has a utility to scan disks which will identify existing partitions, even if the partition table has been erased. I can?t say if this will do anything with the disks in your system, but this is certainly worth a try if you suspect that the data is all still on disk. From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Mark Bush Sent: Wednesday, July 26, 2017 8:19 PM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] Lost disks What is this manual header reconstruction you speak of? That doesn?t sound trivial at all. From: Oesterlin, Robert [mailto:Robert.Oesterlin at nuance.com] Sent: Wednesday, July 26, 2017 12:46 PM To: gpfsug main discussion list > Subject: Re: [gpfsug-discuss] Lost disks One way this could possible happen would be a system is being installed (I?m assuming this is Linux) and the FC adapter is active; then the OS install will see disks and wipe out the NSD descriptor on those disks. (Which is why the NSD V2 format was invented, to prevent this from happening) If you don?t lose all of the descriptors, it?s sometimes possible to manually re-construct the missing header information - I?m assuming since you opened a PMR, IBM has looked at this. This is a scenario I?ve had to recover from - twice. Back-end array issue seems unlikely to me, I?d keep looking at the systems with access to those LUNs and see what commands/operations could have been run. Bob Oesterlin Sr Principal Storage Engineer, Nuance From: > on behalf of Mark Bush > Reply-To: gpfsug main discussion list > Date: Wednesday, July 26, 2017 at 12:29 PM To: "gpfsug-discuss at spectrumscale.org" > Subject: [EXTERNAL] [gpfsug-discuss] Lost disks I have a client has had an issue where all of the nsd disks disappeared in the cluster recently. Not sure if it?s due to a back end disk issue or if it?s a reboot that did it. But in their PMR they were told that all that data is lost now and that the disk headers didn?t appear as GPFS disk headers. How on earth could something like that happen? Could it be a backend disk thing? They are confident that nobody tried to reformat disks but aren?t 100% sure that something at the disk array couldn?t have caused this. This message (including any attachments) is intended only for the use of the individual or entity to which it is addressed and may contain information that is non-public, proprietary, privileged, confidential, and exempt from disclosure under applicable law. If you are not the intended recipient, you are hereby notified that any use, dissemination, distribution, or copying of this communication is strictly prohibited. This message may be viewed by parties at Sirius Computer Solutions other than those named in the message header. This message does not contain an official representation of Sirius Computer Solutions. If you have received this communication in error, notify Sirius Computer Solutions immediately and (i) destroy this message if a facsimile or (ii) delete this message immediately if this is an electronic communication. Thank you. Sirius Computer Solutions -- The information contained in this communication and any attachments is confidential and may be privileged, and is for the sole use of the intended recipient(s). Any unauthorized review, use, disclosure or distribution is prohibited. Unless explicitly stated otherwise in the body of this communication or the attachment thereto (if any), the information is provided on an AS-IS basis without any express or implied warranties or liabilities. To the extent you are relying on this information, you are doing so at your own risk. If you are not the intended recipient, please notify the sender immediately by replying to this message and destroy all copies of this message and any attachments. Neither the sender nor the company/group of companies he or she represents shall be liable for the proper and complete transmission of the information contained in this communication, or for any delay in its receipt. -------------- next part -------------- An HTML attachment was scrubbed... URL: From jonathan at buzzard.me.uk Thu Jul 27 11:58:08 2017 From: jonathan at buzzard.me.uk (Jonathan Buzzard) Date: Thu, 27 Jul 2017 11:58:08 +0100 Subject: [gpfsug-discuss] Lost disks In-Reply-To: References: Message-ID: <1501153088.26563.39.camel@buzzard.me.uk> On Wed, 2017-07-26 at 17:45 +0000, Oesterlin, Robert wrote: > One way this could possible happen would be a system is being > installed (I?m assuming this is Linux) and the FC adapter is active; > then the OS install will see disks and wipe out the NSD descriptor on > those disks. (Which is why the NSD V2 format was invented, to prevent > this from happening) If you don?t lose all of the descriptors, it?s > sometimes possible to manually re-construct the missing header > information - I?m assuming since you opened a PMR, IBM has looked at > this. This is a scenario I?ve had to recover from - twice. Back-end > array issue seems unlikely to me, I?d keep looking at the systems with > access to those LUNs and see what commands/operations could have been > run. I would concur that this is the most likely scenario; an install where for whatever reason the machine could see the disks and they are gone. I know that RHEL6 and its derivatives will do that for you. Has happened to me at previous place of work where another admin forgot to de-zone a server, went to install CentOS6 as part of a cluster upgrade from CentOS5 and overwrote all the NSD descriptors. Thing is GPFS does not look at the NSD descriptors that much. So in my case it was several days before it was noticed, and only then because I rebooted the last NSD server as part of a rolling upgrade of GPFS. I could have cruised for weeks/months with no NSD descriptors if I had not restarted all the NSD servers. The moral of this is the overwrite could have take place quite some time ago. Basically if the disks are all missing then the NSD descriptor has been overwritten, and the protestations of the client are irrelevant. The chances of the disk array doing it to *ALL* the disks is somewhere around ? IMHO. JAB. -- Jonathan A. Buzzard Email: jonathan (at) buzzard.me.uk Fife, United Kingdom. From richard.rupp at us.ibm.com Thu Jul 27 12:28:35 2017 From: richard.rupp at us.ibm.com (RICHARD RUPP) Date: Thu, 27 Jul 2017 07:28:35 -0400 Subject: [gpfsug-discuss] Lost disks In-Reply-To: References: Message-ID: If you are under IBM support, leverage IBM for help. A third party utility has the possibility of making it worse. From: John Hearns To: gpfsug main discussion list Date: 07/27/2017 06:40 AM Subject: Re: [gpfsug-discuss] Lost disks Sent by: gpfsug-discuss-bounces at spectrumscale.org Mark, I once rescued a system which had the disk partition on the OS disks deleted. (This was a system with a device mapper RAID pair of OS disks). Download a copy of sysrescue http://www.system-rescue-cd.org/ and create a bootable USB stick (or network boot). When you boot the system in sysrescue it has a utility to scan disks which will identify existing partitions, even if the partition table has been erased. I can?t say if this will do anything with the disks in your system, but this is certainly worth a try if you suspect that the data is all still on disk. From: gpfsug-discuss-bounces at spectrumscale.org [ mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Mark Bush Sent: Wednesday, July 26, 2017 8:19 PM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] Lost disks What is this manual header reconstruction you speak of? That doesn?t sound trivial at all. From: Oesterlin, Robert [mailto:Robert.Oesterlin at nuance.com] Sent: Wednesday, July 26, 2017 12:46 PM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] Lost disks One way this could possible happen would be a system is being installed (I?m assuming this is Linux) and the FC adapter is active; then the OS install will see disks and wipe out the NSD descriptor on those disks. (Which is why the NSD V2 format was invented, to prevent this from happening) If you don?t lose all of the descriptors, it?s sometimes possible to manually re-construct the missing header information - I?m assuming since you opened a PMR, IBM has looked at this. This is a scenario I?ve had to recover from - twice. Back-end array issue seems unlikely to me, I?d keep looking at the systems with access to those LUNs and see what commands/operations could have been run. Bob Oesterlin Sr Principal Storage Engineer, Nuance From: on behalf of Mark Bush < Mark.Bush at siriuscom.com> Reply-To: gpfsug main discussion list Date: Wednesday, July 26, 2017 at 12:29 PM To: "gpfsug-discuss at spectrumscale.org" Subject: [EXTERNAL] [gpfsug-discuss] Lost disks I have a client has had an issue where all of the nsd disks disappeared in the cluster recently. Not sure if it?s due to a back end disk issue or if it?s a reboot that did it. But in their PMR they were told that all that data is lost now and that the disk headers didn?t appear as GPFS disk headers. How on earth could something like that happen? Could it be a backend disk thing? They are confident that nobody tried to reformat disks but aren?t 100% sure that something at the disk array couldn?t have caused this. This message (including any attachments) is intended only for the use of the individual or entity to which it is addressed and may contain information that is non-public, proprietary, privileged, confidential, and exempt from disclosure under applicable law. If you are not the intended recipient, you are hereby notified that any use, dissemination, distribution, or copying of this communication is strictly prohibited. This message may be viewed by parties at Sirius Computer Solutions other than those named in the message header. This message does not contain an official representation of Sirius Computer Solutions. If you have received this communication in error, notify Sirius Computer Solutions immediately and (i) destroy this message if a facsimile or (ii) delete this message immediately if this is an electronic communication. Thank you. Sirius Computer Solutions -- The information contained in this communication and any attachments is confidential and may be privileged, and is for the sole use of the intended recipient(s). Any unauthorized review, use, disclosure or distribution is prohibited. Unless explicitly stated otherwise in the body of this communication or the attachment thereto (if any), the information is provided on an AS-IS basis without any express or implied warranties or liabilities. To the extent you are relying on this information, you are doing so at your own risk. If you are not the intended recipient, please notify the sender immediately by replying to this message and destroy all copies of this message and any attachments. Neither the sender nor the company/group of companies he or she represents shall be liable for the proper and complete transmission of the information contained in this communication, or for any delay in its receipt. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: From jonathan.buzzard at strath.ac.uk Thu Jul 27 12:58:50 2017 From: jonathan.buzzard at strath.ac.uk (Jonathan Buzzard) Date: Thu, 27 Jul 2017 12:58:50 +0100 Subject: [gpfsug-discuss] Lost disks In-Reply-To: References: Message-ID: <1501156730.26563.49.camel@strath.ac.uk> On Thu, 2017-07-27 at 07:28 -0400, RICHARD RUPP wrote: > If you are under IBM support, leverage IBM for help. A third party > utility has the possibility of making it worse. > The chances of recovery are slim in the first place from this sort of problem. At least with v1 NSD descriptors. Further IBM have *ALREADY* told him the data is lost, I quote But in their PMR they were told that all that data is lost now and that the disk headers didn?t appear as GPFS disk headers. So in this scenario you have little to loose trying something because you are now on your own. Worst case scenario is that whatever you try does not work, which leave you no worse of than you are now. Well apart from lost time for the restore, but you might have started that already to somewhere else. I was once told by IBM (nine years ago now) that my GPFS file system was caput and to arrange a restore from tape. At which point some fiddling by myself fixed the problem and a 100TB restore was no longer required. However this was not due to overwritten NSD descriptors. When that happened the two file systems effected had to be restored. Well bizarrely one was still mounted and I was able to rsync the data off. However the point is that at this stage fiddling with third party tools is the only option left. JAB. -- Jonathan A. Buzzard Tel: +44141-5483420 HPC System Administrator, ARCHIE-WeSt. University of Strathclyde, John Anderson Building, Glasgow. G4 0NG From UWEFALKE at de.ibm.com Thu Jul 27 15:18:02 2017 From: UWEFALKE at de.ibm.com (Uwe Falke) Date: Thu, 27 Jul 2017 16:18:02 +0200 Subject: [gpfsug-discuss] Lost disks In-Reply-To: <1501156730.26563.49.camel@strath.ac.uk> References: <1501156730.26563.49.camel@strath.ac.uk> Message-ID: "Just doing something" makes things worse usually. Whether a 3rd party tool knows how to handle GPFS NSDs can be doubted (as long as it is not dedicated to that purpose). First, I'd look what is actually on the sectors where the NSD headers used to be, and try to find whether data beyond that area were also modified (if the latter is the case, restoring the NSDs does not make much sense as data and/or metadata (depending on disk usage) would also be corrupted. If you are sure that just the NSD header area has been affected, you might try to trick GPFS in getting just the information into the header area needed that GPFS recognises the devices as the NSDs they were. The first 4 kiB of a v1 NSD from a VM on my laptop look like $ cat nsdv1head | od --address-radix=x -xc 000000 0000 0000 0000 0000 0000 0000 0000 0000 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 * 000200 cf70 4192 0000 0100 0000 3000 e930 a028 p 317 222 A \0 \0 \0 001 \0 \0 \0 0 0 351 ( 240 000210 a8c0 ce7a a251 1f92 a251 1a92 0000 0800 300 250 z 316 Q 242 222 037 Q 242 222 032 \0 \0 \0 \b 000220 0000 f20f 0000 0000 0000 0000 0000 0000 \0 \0 017 362 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 000230 0000 0000 0000 0000 0000 0000 0000 0000 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 * 000400 93d2 7885 0000 0100 0000 0002 141e 64a8 322 223 205 x \0 \0 \0 001 \0 \0 002 \0 036 024 250 d 000410 a8c0 ce7a a251 3490 0000 fa0f 0000 0800 300 250 z 316 Q 242 220 4 \0 \0 017 372 \0 \0 \0 \b 000420 0000 0000 0000 0000 0000 0000 0000 0000 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 * 000480 534e 2044 6564 6373 6972 7470 726f 6620 N S D d e s c r i p t o r f 000490 726f 2f20 6564 2f76 6476 2062 7263 6165 o r / d e v / v d b c r e a 0004a0 6574 2064 7962 4720 4650 2053 6f4d 206e t e d b y G P F S M o n 0004b0 614d 2079 3732 3020 3a30 3434 303a 2034 M a y 2 7 0 0 : 4 4 : 0 4 0004c0 3032 3331 000a 0000 0000 0000 0000 0000 2 0 1 3 \n \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 0004d0 0000 0000 0000 0000 0000 0000 0000 0000 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 * 000e00 4c5f 4d56 0000 017d 0000 017d 0000 017d _ L V M \0 \0 } 001 \0 \0 } 001 \0 \0 } 001 000e10 0000 017d 0000 0000 0000 0000 0000 0000 \0 \0 } 001 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 000e20 0000 0000 0000 0000 0000 0000 0000 0000 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 000e30 0000 0000 0000 0000 0000 0000 017d 0000 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 } 001 \0 \0 000e40 0000 0000 0000 0000 0000 0000 0000 0000 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 * 001000 I suppose, the important area starts at 0x0200 (ie. with the second 512Byte sector) and ends at 0x04df (which would be within the 3rd 512Bytes sector, hence the 2nd and 3rd sectors appear crucial). I think that there is some more space before the payload area starts. Without knowledge what exactly has to go into the header, I'd try to create an NSD on one or two (new) disks, save the headers, then create an FS on them, save the headers again, check if anything has changed. So, creating some new NSDs, checking what keys might appear there and in the cluster configuration could get you very close to craft the header information which is gone. Of course, that depends on how dear the data on the gone FS AKA SG are and how hard it'd be to rebuild them otherwise (replay from backup, recalculate, ...) It seems not a bad idea to set aside the NSD headers of your NSDs in a back up :-) And also now: Before amending any blocks on your disks, save them! Mit freundlichen Gr??en / Kind regards Dr. Uwe Falke IT Specialist High Performance Computing Services / Integrated Technology Services / Data Center Services ------------------------------------------------------------------------------------------------------------------------------------------- IBM Deutschland Rathausstr. 7 09111 Chemnitz Phone: +49 371 6978 2165 Mobile: +49 175 575 2877 E-Mail: uwefalke at de.ibm.com ------------------------------------------------------------------------------------------------------------------------------------------- IBM Deutschland Business & Technology Services GmbH / Gesch?ftsf?hrung: Andreas Hasse, Thomas Wolter Sitz der Gesellschaft: Ehningen / Registergericht: Amtsgericht Stuttgart, HRB 17122 From: Jonathan Buzzard To: gpfsug main discussion list Date: 07/27/2017 01:59 PM Subject: Re: [gpfsug-discuss] Lost disks Sent by: gpfsug-discuss-bounces at spectrumscale.org On Thu, 2017-07-27 at 07:28 -0400, RICHARD RUPP wrote: > If you are under IBM support, leverage IBM for help. A third party > utility has the possibility of making it worse. > The chances of recovery are slim in the first place from this sort of problem. At least with v1 NSD descriptors. Further IBM have *ALREADY* told him the data is lost, I quote But in their PMR they were told that all that data is lost now and that the disk headers didn?t appear as GPFS disk headers. So in this scenario you have little to loose trying something because you are now on your own. Worst case scenario is that whatever you try does not work, which leave you no worse of than you are now. Well apart from lost time for the restore, but you might have started that already to somewhere else. I was once told by IBM (nine years ago now) that my GPFS file system was caput and to arrange a restore from tape. At which point some fiddling by myself fixed the problem and a 100TB restore was no longer required. However this was not due to overwritten NSD descriptors. When that happened the two file systems effected had to be restored. Well bizarrely one was still mounted and I was able to rsync the data off. However the point is that at this stage fiddling with third party tools is the only option left. JAB. -- Jonathan A. Buzzard Tel: +44141-5483420 HPC System Administrator, ARCHIE-WeSt. University of Strathclyde, John Anderson Building, Glasgow. G4 0NG _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss From jonathan.buzzard at strath.ac.uk Thu Jul 27 16:09:31 2017 From: jonathan.buzzard at strath.ac.uk (Jonathan Buzzard) Date: Thu, 27 Jul 2017 16:09:31 +0100 Subject: [gpfsug-discuss] Lost disks In-Reply-To: References: <1501156730.26563.49.camel@strath.ac.uk> Message-ID: <1501168171.26563.56.camel@strath.ac.uk> On Thu, 2017-07-27 at 16:18 +0200, Uwe Falke wrote: > "Just doing something" makes things worse usually. Whether a 3rd > party tool knows how to handle GPFS NSDs can be doubted (as long as it > is not dedicated to that purpose). It might usually, but IBM have *ALREADY* given up in this case and told the customer their data is toast. Under these circumstances other than wasting time that could have been spent profitably on a restore it is *IMPOSSIBLE* to make the situation worse. [SNIP] > It seems not a bad idea to set aside the NSD headers of your NSDs in a > back up :-) > And also now: Before amending any blocks on your disks, save them! > It's called NSD v2 descriptor format, so rather than use raw disks they are in a GPT partition, and for good measure a backup copy is stored at the end of the disk too. Personally if I had any v1 NSD's in a file system I would have a plan for a series of mmdeldisk/mmcrnsd/mmadddisk to get them all to v2 sooner rather than later. JAB. -- Jonathan A. Buzzard Tel: +44141-5483420 HPC System Administrator, ARCHIE-WeSt. University of Strathclyde, John Anderson Building, Glasgow. G4 0NG From Robert.Oesterlin at nuance.com Thu Jul 27 16:28:02 2017 From: Robert.Oesterlin at nuance.com (Oesterlin, Robert) Date: Thu, 27 Jul 2017 15:28:02 +0000 Subject: [gpfsug-discuss] NSD V2 vs V1 - how can you tell? Message-ID: <58646EEA-0AD2-4EC8-8A05-070064C35F2E@nuance.com> I?m sure I have a mix of V1 and V2 NSDs - how can I tell what the format each is? Bob Oesterlin Sr Principal Storage Engineer, Nuance -------------- next part -------------- An HTML attachment was scrubbed... URL: From UWEFALKE at de.ibm.com Thu Jul 27 16:51:29 2017 From: UWEFALKE at de.ibm.com (Uwe Falke) Date: Thu, 27 Jul 2017 17:51:29 +0200 Subject: [gpfsug-discuss] Lost disks In-Reply-To: <1501168171.26563.56.camel@strath.ac.uk> References: <1501156730.26563.49.camel@strath.ac.uk> <1501168171.26563.56.camel@strath.ac.uk> Message-ID: gpfsug-discuss-bounces at spectrumscale.org wrote on 07/27/2017 05:09:31 PM: > From: Jonathan Buzzard > To: gpfsug main discussion list > Date: 07/27/2017 05:09 PM > Subject: Re: [gpfsug-discuss] Lost disks > Sent by: gpfsug-discuss-bounces at spectrumscale.org > > On Thu, 2017-07-27 at 16:18 +0200, Uwe Falke wrote: > > > "Just doing something" makes things worse usually. Whether a 3rd > > party tool knows how to handle GPFS NSDs can be doubted (as long as it > > is not dedicated to that purpose). > > It might usually, but IBM have *ALREADY* given up in this case and told > the customer their data is toast. Under these circumstances other than > wasting time that could have been spent profitably on a restore it is > *IMPOSSIBLE* to make the situation worse. SCNR: It is always possible to make things worse. However, of course, if the efforts to do research on that system appear too expensive compared to the possible gain, then it is wise to give up and restore data from backup to a new file system. > > [SNIP] > > > It seems not a bad idea to set aside the NSD headers of your NSDs in a > > back up :-) > > And also now: Before amending any blocks on your disks, save them! > > > > It's called NSD v2 descriptor format, so rather than use raw disks they > are in a GPT partition, and for good measure a backup copy is stored at > the end of the disk too. > > Personally if I had any v1 NSD's in a file system I would have a plan > for a series of mmdeldisk/mmcrnsd/mmadddisk to get them all to v2 sooner > rather than later. Yep, but I suppose the gone NSDs were v1. Then, there might be some restrictions blocking the move from NSDv1 to NSDv2 (old FS level still req.ed, or just the hugeness of a file system). And you never know, if some tool runs wild due to logical failures it overwrites all GPT copies on a disk and you're lost again (but of course NSDv2 has been a tremendous step ahead). Mit freundlichen Gr??en / Kind regards Dr. Uwe Falke IT Specialist High Performance Computing Services / Integrated Technology Services / Data Center Services ------------------------------------------------------------------------------------------------------------------------------------------- IBM Deutschland Rathausstr. 09111 Chemnitz Phone: +49 371 6978 2165 Mobile: +49 175 575 2877 E-Mail: uwefalke at de.ibm.com ------------------------------------------------------------------------------------------------------------------------------------------- IBM Deutschland Business & Technology Services GmbH / Gesch?ftsf?hrung: Andreas Hasse, Thomas Wolter Sitz der Gesellschaft: Ehningen / Registergericht: Amtsgericht Stuttgart, HRB 17122 From luke.raimbach at googlemail.com Thu Jul 27 17:09:42 2017 From: luke.raimbach at googlemail.com (Luke Raimbach) Date: Thu, 27 Jul 2017 16:09:42 +0000 Subject: [gpfsug-discuss] NSD V2 vs V1 - how can you tell? In-Reply-To: <58646EEA-0AD2-4EC8-8A05-070064C35F2E@nuance.com> References: <58646EEA-0AD2-4EC8-8A05-070064C35F2E@nuance.com> Message-ID: mmfsadm test readdescraw On Thu, 27 Jul 2017, 16:28 Oesterlin, Robert, wrote: > I?m sure I have a mix of V1 and V2 NSDs - how can I tell what the format > each is? > > > > Bob Oesterlin > Sr Principal Storage Engineer, Nuance > > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: From Robert.Oesterlin at nuance.com Thu Jul 27 17:17:20 2017 From: Robert.Oesterlin at nuance.com (Oesterlin, Robert) Date: Thu, 27 Jul 2017 16:17:20 +0000 Subject: [gpfsug-discuss] NSD V2 vs V1 - how can you tell? Message-ID: <50669E00-32A8-4AC7-A729-CB961F96ECAE@nuance.com> Right - but what field do I look at? Bob Oesterlin Sr Principal Storage Engineer, Nuance From: on behalf of Luke Raimbach Reply-To: gpfsug main discussion list Date: Thursday, July 27, 2017 at 11:10 AM To: gpfsug main discussion list Subject: [EXTERNAL] Re: [gpfsug-discuss] NSD V2 vs V1 - how can you tell? mmfsadm test readdescraw -------------- next part -------------- An HTML attachment was scrubbed... URL: From jonathan.buzzard at strath.ac.uk Thu Jul 27 19:26:45 2017 From: jonathan.buzzard at strath.ac.uk (Jonathan Buzzard) Date: Thu, 27 Jul 2017 19:26:45 +0100 Subject: [gpfsug-discuss] Lost disks In-Reply-To: References: <1501156730.26563.49.camel@strath.ac.uk> <1501168171.26563.56.camel@strath.ac.uk> Message-ID: <3ea4371f-49c9-1236-44e2-d44f39e27c9e@strath.ac.uk> On 27/07/17 16:51, Uwe Falke wrote: [SNIP] > SCNR: It is always possible to make things worse. > However, of course, if the efforts to do research on that system appear > too expensive compared to the possible gain, then it is wise to give up > and restore data from backup to a new file system. > Explain to me when IBM have washed their hands of the situation; that is they deem the file system unrecoverable and will take no further action to help the customer, how under these circumstances it is possible for it to get any worse attempting to recover the situation yourself? The answer is you can't so and are talking complete codswallop. In general you are right, in this situation you are utterly and totally wrong. JAB. -- Jonathan A. Buzzard Tel: +44141-5483420 HPC System Administrator, ARCHIE-WeSt. University of Strathclyde, John Anderson Building, Glasgow. G4 0NG From chair at spectrumscale.org Thu Jul 27 21:19:15 2017 From: chair at spectrumscale.org (Spectrum Scale UG Chair (Simon Thompson)) Date: Thu, 27 Jul 2017 21:19:15 +0100 Subject: [gpfsug-discuss] Lost disks In-Reply-To: <3ea4371f-49c9-1236-44e2-d44f39e27c9e@strath.ac.uk> References: <1501156730.26563.49.camel@strath.ac.uk> <1501168171.26563.56.camel@strath.ac.uk> <3ea4371f-49c9-1236-44e2-d44f39e27c9e@strath.ac.uk> Message-ID: Guys, this is supposed to be a community mailing list where people can come and ask questions and we can have healthy debate, but please can we keep it calm? Thanks Simon Group Chair From sfadden at us.ibm.com Thu Jul 27 21:33:19 2017 From: sfadden at us.ibm.com (Scott Fadden) Date: Thu, 27 Jul 2017 20:33:19 +0000 Subject: [gpfsug-discuss] NSD V2 vs V1 - how can you tell? In-Reply-To: References: , <58646EEA-0AD2-4EC8-8A05-070064C35F2E@nuance.com> Message-ID: An HTML attachment was scrubbed... URL: From jonathan.buzzard at strath.ac.uk Fri Jul 28 00:29:47 2017 From: jonathan.buzzard at strath.ac.uk (Jonathan Buzzard) Date: Fri, 28 Jul 2017 00:29:47 +0100 Subject: [gpfsug-discuss] NSD V2 vs V1 - how can you tell? In-Reply-To: <58646EEA-0AD2-4EC8-8A05-070064C35F2E@nuance.com> References: <58646EEA-0AD2-4EC8-8A05-070064C35F2E@nuance.com> Message-ID: On 27/07/17 16:28, Oesterlin, Robert wrote: > I?m sure I have a mix of V1 and V2 NSDs - how can I tell what the format > each is? Well on anything approaching a recent Linux lsblk should as I understand it should show GPT partitions on v2 NSD's. Normally a v1 NSD would show up as a raw block device. I guess you could have created the v1 NSD's inside a partition but that was not normal practice. JAB. -- Jonathan A. Buzzard Tel: +44141-5483420 HPC System Administrator, ARCHIE-WeSt. University of Strathclyde, John Anderson Building, Glasgow. G4 0NG From daniel.kidger at uk.ibm.com Fri Jul 28 12:03:40 2017 From: daniel.kidger at uk.ibm.com (Daniel Kidger) Date: Fri, 28 Jul 2017 11:03:40 +0000 Subject: [gpfsug-discuss] Lost disks In-Reply-To: References: , <1501156730.26563.49.camel@strath.ac.uk><1501168171.26563.56.camel@strath.ac.uk><3ea4371f-49c9-1236-44e2-d44f39e27c9e@strath.ac.uk> Message-ID: An HTML attachment was scrubbed... URL: From Robert.Oesterlin at nuance.com Fri Jul 28 12:46:47 2017 From: Robert.Oesterlin at nuance.com (Oesterlin, Robert) Date: Fri, 28 Jul 2017 11:46:47 +0000 Subject: [gpfsug-discuss] NSD V2 vs V1 - how can you tell? Message-ID: <6DDEF46C-D30E-4C68-B569-770DE04A9D5B@nuance.com> Hi Scott This refers to the file system format which is independent of the NSD version number. File systems can be upgraded but all the NSDs are still at V1. For instance, here is an NSD I know is still V1: [root at gpfs2-gpfs01 ~]# grep msa0319VOL2 volmaps msa0319VOL2 mpathel (3600c0ff0001497e259ebac5001000000) dm-19 14T sdad 0[active][ready] sdft 1[active][ready] sdam 2[active][ready] sdgc 3[active][ready] [root at gpfs2-gpfs01 ~]# mmfsadm test readdescraw /dev/dm-19 | grep " original format" original format version 1001, cur version 1600 (mgr 1600, helper 1600, mnode 1600) The file system version is current however. Bob Oesterlin Sr Principal Storage Engineer, Nuance From: on behalf of Scott Fadden Reply-To: gpfsug main discussion list Date: Thursday, July 27, 2017 at 3:33 PM To: "gpfsug-discuss at spectrumscale.org" Cc: "gpfsug-discuss at spectrumscale.org" Subject: [EXTERNAL] Re: [gpfsug-discuss] NSD V2 vs V1 - how can you tell? # mmfsadm test readdescraw /dev/dm-14 | grep " original format" original format version 1600, cur version 1700 (mgr 1700, helper 1700, mnode 1700) The harder part is what version number = v2 and what matches version 1. The real answer is there is not a simple one, it is not really v1 vs v2 it is what feature you are interested in. Just one small example 4K Disk SECTOR support started in 1403 Dynamically enabling quotas started in 1404 -------------- next part -------------- An HTML attachment was scrubbed... URL: From Robert.Oesterlin at nuance.com Fri Jul 28 13:44:11 2017 From: Robert.Oesterlin at nuance.com (Oesterlin, Robert) Date: Fri, 28 Jul 2017 12:44:11 +0000 Subject: [gpfsug-discuss] LROC example Message-ID: <8103C497-EFA2-41E3-A047-4C3A3AA3EC0B@nuance.com> For those of you considering LROC, you may find this interesting. LROC can be very effective in some job mixes, as shown below. This is in a compute cluster of about 400 nodes. Each compute node has a 100GB LROC. In this particular job mix, LROC was recalling 3-4 times the traffic that was going to the NSDs. I see other cases where?s it?s less effective. [cid:image001.png at 01D30775.4ACF3D20] Bob Oesterlin Sr Principal Storage Engineer, Nuance -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image001.png Type: image/png Size: 54425 bytes Desc: image001.png URL: From knop at us.ibm.com Fri Jul 28 13:44:26 2017 From: knop at us.ibm.com (Felipe Knop) Date: Fri, 28 Jul 2017 08:44:26 -0400 Subject: [gpfsug-discuss] NSD V2 vs V1 - how can you tell? In-Reply-To: <6DDEF46C-D30E-4C68-B569-770DE04A9D5B@nuance.com> References: <6DDEF46C-D30E-4C68-B569-770DE04A9D5B@nuance.com> Message-ID: Bob, I believe the NSD format version (v1 vs v2) is shown in the " format version" line that starts with "NSDid" : # mmfsadm test readdescraw /dev/dm-11 NSD descriptor in sector 64 of /dev/dm-11 NSDid: 9461C0A85788693A format version: 1403 Label: It should say "1403" when the format is v2. Felipe ---- Felipe Knop knop at us.ibm.com GPFS Development and Security IBM Systems IBM Building 008 2455 South Rd, Poughkeepsie, NY 12601 (845) 433-9314 T/L 293-9314 From: "Oesterlin, Robert" To: gpfsug main discussion list Date: 07/28/2017 07:47 AM Subject: Re: [gpfsug-discuss] NSD V2 vs V1 - how can you tell? Sent by: gpfsug-discuss-bounces at spectrumscale.org Hi Scott This refers to the file system format which is independent of the NSD version number. File systems can be upgraded but all the NSDs are still at V1. For instance, here is an NSD I know is still V1: [root at gpfs2-gpfs01 ~]# grep msa0319VOL2 volmaps msa0319VOL2 mpathel (3600c0ff0001497e259ebac5001000000) dm-19 14T sdad 0[active][ready] sdft 1[active][ready] sdam 2[active][ready] sdgc 3[active][ready] [root at gpfs2-gpfs01 ~]# mmfsadm test readdescraw /dev/dm-19 | grep " original format" original format version 1001, cur version 1600 (mgr 1600, helper 1600, mnode 1600) The file system version is current however. Bob Oesterlin Sr Principal Storage Engineer, Nuance From: on behalf of Scott Fadden Reply-To: gpfsug main discussion list Date: Thursday, July 27, 2017 at 3:33 PM To: "gpfsug-discuss at spectrumscale.org" Cc: "gpfsug-discuss at spectrumscale.org" Subject: [EXTERNAL] Re: [gpfsug-discuss] NSD V2 vs V1 - how can you tell? # mmfsadm test readdescraw /dev/dm-14 | grep " original format" original format version 1600, cur version 1700 (mgr 1700, helper 1700, mnode 1700) The harder part is what version number = v2 and what matches version 1. The real answer is there is not a simple one, it is not really v1 vs v2 it is what feature you are interested in. Just one small example 4K Disk SECTOR support started in 1403 Dynamically enabling quotas started in 1404 _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From gcorneau at us.ibm.com Fri Jul 28 20:07:54 2017 From: gcorneau at us.ibm.com (Glen Corneau) Date: Fri, 28 Jul 2017 14:07:54 -0500 Subject: [gpfsug-discuss] NSD V2 vs V1 - how can you tell? In-Reply-To: References: <6DDEF46C-D30E-4C68-B569-770DE04A9D5B@nuance.com> Message-ID: Just a note for my AIX folks out there (and I know there's at least one!): When NSDv2 (version 1403) disks are defined in AIX we *don't* create GPTs on those LUNs. However with GPFS (Spectrum Scale) installed on AIX we will place the NSD name in the "VG" column of lsvg. But yes, we've had situations of customers creating new VGs on existing GPFS LUNs (force!) and destroying file systems. ------------------ Glen Corneau Power Systems Washington Systems Center gcorneau at us.ibm.com From: "Felipe Knop" To: gpfsug main discussion list Date: 07/28/2017 07:45 AM Subject: Re: [gpfsug-discuss] NSD V2 vs V1 - how can you tell? Sent by: gpfsug-discuss-bounces at spectrumscale.org Bob, I believe the NSD format version (v1 vs v2) is shown in the " format version" line that starts with "NSDid" : # mmfsadm test readdescraw /dev/dm-11 NSD descriptor in sector 64 of /dev/dm-11 NSDid: 9461C0A85788693A format version: 1403 Label: It should say "1403" when the format is v2. Felipe ---- Felipe Knop knop at us.ibm.com GPFS Development and Security IBM Systems IBM Building 008 2455 South Rd, Poughkeepsie, NY 12601 (845) 433-9314 T/L 293-9314 From: "Oesterlin, Robert" To: gpfsug main discussion list Date: 07/28/2017 07:47 AM Subject: Re: [gpfsug-discuss] NSD V2 vs V1 - how can you tell? Sent by: gpfsug-discuss-bounces at spectrumscale.org Hi Scott This refers to the file system format which is independent of the NSD version number. File systems can be upgraded but all the NSDs are still at V1. For instance, here is an NSD I know is still V1: [root at gpfs2-gpfs01 ~]# grep msa0319VOL2 volmaps msa0319VOL2 mpathel (3600c0ff0001497e259ebac5001000000) dm-19 14T sdad 0[active][ready] sdft 1[active][ready] sdam 2[active][ready] sdgc 3[active][ready] [root at gpfs2-gpfs01 ~]# mmfsadm test readdescraw /dev/dm-19 | grep " original format" original format version 1001, cur version 1600 (mgr 1600, helper 1600, mnode 1600) The file system version is current however. Bob Oesterlin Sr Principal Storage Engineer, Nuance -------------- next part -------------- An HTML attachment was scrubbed... URL: From scale at us.ibm.com Sun Jul 30 04:22:25 2017 From: scale at us.ibm.com (IBM Spectrum Scale) Date: Sat, 29 Jul 2017 23:22:25 -0400 Subject: [gpfsug-discuss] GPFS, LTFS/EE and data-in-inode? In-Reply-To: <1500908233.4387.194.camel@buzzard.me.uk> References: <33069.1500675853@turing-police.cc.vt.edu>, <28986.1500671597@turing-police.cc.vt.edu> <1500908233.4387.194.camel@buzzard.me.uk> Message-ID: Jonathan, all, We'll be introducing some clarification into the publications to highlight that data is not stored in the inode for encrypted files. Regards, The Spectrum Scale (GPFS) team ------------------------------------------------------------------------------------------------------------------ If you feel that your question can benefit other users of Spectrum Scale (GPFS), then please post it to the public IBM developerWroks Forum at https://www.ibm.com/developerworks/community/forums/html/forum?id=11111111-0000-0000-0000-000000000479 . If your query concerns a potential software error in Spectrum Scale (GPFS) and you have an IBM software maintenance contract please contact 1-800-237-5511 in the United States or your local IBM Service Center in other countries. The forum is informally monitored as time permits and should not be used for priority messages to the Spectrum Scale (GPFS) team. From: Jonathan Buzzard To: gpfsug main discussion list Date: 07/24/2017 10:57 AM Subject: Re: [gpfsug-discuss] GPFS, LTFS/EE and data-in-inode? Sent by: gpfsug-discuss-bounces at spectrumscale.org On Mon, 2017-07-24 at 14:45 +0000, James Davis wrote: > Hey all, > > On the documentation of encryption restrictions and encryption/HAWC > interplay... > > The encryption documentation currently states: > > "Secure storage uses encryption to make data unreadable to anyone who > does not possess the necessary encryption keys...Only data, not > metadata, is encrypted." > > The HAWC restrictions include: > > "Encrypted data is never stored in the recovery log..." > > If this is unclear, I'm open to suggestions for improvements. > Just because *DATA* is stored in the metadata does not make it magically metadata. It's still data so you could quite reasonably conclude that it is encrypted. We have now been disabused of this, but the documentation is not clear and needs clarifying. Perhaps say metadata blocks are not encrypted. Or just a simple data stored in inodes is not encrypted would suffice. JAB. -- Jonathan A. Buzzard Email: jonathan (at) buzzard.me.uk Fife, United Kingdom. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From scale at us.ibm.com Mon Jul 31 05:57:44 2017 From: scale at us.ibm.com (IBM Spectrum Scale) Date: Mon, 31 Jul 2017 00:57:44 -0400 Subject: [gpfsug-discuss] Lost disks In-Reply-To: <1501153088.26563.39.camel@buzzard.me.uk> References: <1501153088.26563.39.camel@buzzard.me.uk> Message-ID: Jonathan, Regarding >> Thing is GPFS does not look at the NSD descriptors that much. So in my >> case it was several days before it was noticed, and only then because I >> rebooted the last NSD server as part of a rolling upgrade of GPFS. I >> could have cruised for weeks/months with no NSD descriptors if I had not >> restarted all the NSD servers. The moral of this is the overwrite could >> have take place quite some time ago. While GPFS does not normally read the NSD descriptors in the course of performing file system operations, as of 4.1.1 a periodic check is done on the content of various descriptors, and a message like [E] On-disk NSD descriptor of is valid but has a different ID. ID in cache is and ID on-disk is should get issued if the content of the descriptor on disk changes. Regards, The Spectrum Scale (GPFS) team ------------------------------------------------------------------------------------------------------------------ If you feel that your question can benefit other users of Spectrum Scale (GPFS), then please post it to the public IBM developerWroks Forum at https://www.ibm.com/developerworks/community/forums/html/forum?id=11111111-0000-0000-0000-000000000479 . If your query concerns a potential software error in Spectrum Scale (GPFS) and you have an IBM software maintenance contract please contact 1-800-237-5511 in the United States or your local IBM Service Center in other countries. The forum is informally monitored as time permits and should not be used for priority messages to the Spectrum Scale (GPFS) team. From: Jonathan Buzzard To: gpfsug main discussion list Date: 07/27/2017 06:58 AM Subject: Re: [gpfsug-discuss] Lost disks Sent by: gpfsug-discuss-bounces at spectrumscale.org On Wed, 2017-07-26 at 17:45 +0000, Oesterlin, Robert wrote: > One way this could possible happen would be a system is being > installed (I?m assuming this is Linux) and the FC adapter is active; > then the OS install will see disks and wipe out the NSD descriptor on > those disks. (Which is why the NSD V2 format was invented, to prevent > this from happening) If you don?t lose all of the descriptors, it?s > sometimes possible to manually re-construct the missing header > information - I?m assuming since you opened a PMR, IBM has looked at > this. This is a scenario I?ve had to recover from - twice. Back-end > array issue seems unlikely to me, I?d keep looking at the systems with > access to those LUNs and see what commands/operations could have been > run. I would concur that this is the most likely scenario; an install where for whatever reason the machine could see the disks and they are gone. I know that RHEL6 and its derivatives will do that for you. Has happened to me at previous place of work where another admin forgot to de-zone a server, went to install CentOS6 as part of a cluster upgrade from CentOS5 and overwrote all the NSD descriptors. Thing is GPFS does not look at the NSD descriptors that much. So in my case it was several days before it was noticed, and only then because I rebooted the last NSD server as part of a rolling upgrade of GPFS. I could have cruised for weeks/months with no NSD descriptors if I had not restarted all the NSD servers. The moral of this is the overwrite could have take place quite some time ago. Basically if the disks are all missing then the NSD descriptor has been overwritten, and the protestations of the client are irrelevant. The chances of the disk array doing it to *ALL* the disks is somewhere around ? IMHO. JAB. -- Jonathan A. Buzzard Email: jonathan (at) buzzard.me.uk Fife, United Kingdom. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From Mark.Bush at siriuscom.com Mon Jul 31 18:30:34 2017 From: Mark.Bush at siriuscom.com (Mark Bush) Date: Mon, 31 Jul 2017 17:30:34 +0000 Subject: [gpfsug-discuss] Auditing Message-ID: Does someone already have a policy that can extract typical file audit items (user_id, last written, opened/accessed, modified, deleted, etc)? Am I barking up the wrong tree for this is there a better way to get this type of data from a Spectrum Scale filesystem? Mark This message (including any attachments) is intended only for the use of the individual or entity to which it is addressed and may contain information that is non-public, proprietary, privileged, confidential, and exempt from disclosure under applicable law. If you are not the intended recipient, you are hereby notified that any use, dissemination, distribution, or copying of this communication is strictly prohibited. This message may be viewed by parties at Sirius Computer Solutions other than those named in the message header. This message does not contain an official representation of Sirius Computer Solutions. If you have received this communication in error, notify Sirius Computer Solutions immediately and (i) destroy this message if a facsimile or (ii) delete this message immediately if this is an electronic communication. Thank you. Sirius Computer Solutions -------------- next part -------------- An HTML attachment was scrubbed... URL: From Renar.Grunenberg at huk-coburg.de Mon Jul 31 18:44:21 2017 From: Renar.Grunenberg at huk-coburg.de (Grunenberg, Renar) Date: Mon, 31 Jul 2017 17:44:21 +0000 Subject: [gpfsug-discuss] Quota and hardlimit enforcement Message-ID: <200a086c1740448da544e667c03887e5@SMXRF105.msg.hukrf.de> Hallo All, we are on Version 4.2.3.2 and see some missunderstandig in the enforcement of hardlimit definitions on a flieset quota. What we see is we put some 200 GB files on following quota definitions: quota 150 GB Limit 250 GB Grace none. After the creating of one 200 GB we hit the softquota limit, thats ok. But After the the second file was created!! we expect an io error but it don?t happen. We define all well know Parameters (-Q,..) on the filesystem . Is this a bug or a Feature? mmcheckquota are already running at first. Regards Renar. Renar Grunenberg Abteilung Informatik ? Betrieb HUK-COBURG Bahnhofsplatz 96444 Coburg Telefon: 09561 96-44110 Telefax: 09561 96-44104 E-Mail: Renar.Grunenberg at huk-coburg.de Internet: www.huk.de ________________________________ HUK-COBURG Haftpflicht-Unterst?tzungs-Kasse kraftfahrender Beamter Deutschlands a. G. in Coburg Reg.-Gericht Coburg HRB 100; St.-Nr. 9212/101/00021 Sitz der Gesellschaft: Bahnhofsplatz, 96444 Coburg Vorsitzender des Aufsichtsrats: Prof. Dr. Heinrich R. Schradin. Vorstand: Klaus-J?rgen Heitmann (Sprecher), Stefan Gronbach, Dr. Hans Olav Her?y, Dr. J?rg Rheinl?nder (stv.), Sarah R?ssler, Daniel Thomas (stv.). ________________________________ Diese Nachricht enth?lt vertrauliche und/oder rechtlich gesch?tzte Informationen. Wenn Sie nicht der richtige Adressat sind oder diese Nachricht irrt?mlich erhalten haben, informieren Sie bitte sofort den Absender und vernichten Sie diese Nachricht. Das unerlaubte Kopieren sowie die unbefugte Weitergabe dieser Nachricht ist nicht gestattet. This information may contain confidential and/or privileged information. If you are not the intended recipient (or have received this information in error) please notify the sender immediately and destroy this information. Any unauthorized copying, disclosure or distribution of the material in this information is strictly forbidden. ________________________________ -------------- next part -------------- An HTML attachment was scrubbed... URL: From eric.wonderley at vt.edu Mon Jul 31 18:54:52 2017 From: eric.wonderley at vt.edu (J. Eric Wonderley) Date: Mon, 31 Jul 2017 13:54:52 -0400 Subject: [gpfsug-discuss] Quota and hardlimit enforcement In-Reply-To: <200a086c1740448da544e667c03887e5@SMXRF105.msg.hukrf.de> References: <200a086c1740448da544e667c03887e5@SMXRF105.msg.hukrf.de> Message-ID: Hi Renar: What does 'mmlsquota -j fileset filesystem' report? I did not think you would get a grace period of none unless the hardlimit=softlimit. On Mon, Jul 31, 2017 at 1:44 PM, Grunenberg, Renar < Renar.Grunenberg at huk-coburg.de> wrote: > Hallo All, > we are on Version 4.2.3.2 and see some missunderstandig in the enforcement > of hardlimit definitions on a flieset quota. What we see is we put some 200 > GB files on following quota definitions: quota 150 GB Limit 250 GB Grace > none. > After the creating of one 200 GB we hit the softquota limit, thats ok. But > After the the second file was created!! we expect an io error but it don?t > happen. We define all well know Parameters (-Q,..) on the filesystem . Is > this a bug or a Feature? mmcheckquota are already running at first. > Regards Renar. > > > > Renar Grunenberg > Abteilung Informatik ? Betrieb > > HUK-COBURG > Bahnhofsplatz > 96444 Coburg > Telefon: 09561 96-44110 > Telefax: 09561 96-44104 > E-Mail: Renar.Grunenberg at huk-coburg.de > Internet: www.huk.de > > ------------------------------ > HUK-COBURG Haftpflicht-Unterst?tzungs-Kasse kraftfahrender Beamter > Deutschlands a. G. in Coburg > Reg.-Gericht Coburg HRB 100; St.-Nr. 9212/101/00021 > Sitz der Gesellschaft: Bahnhofsplatz, 96444 Coburg > Vorsitzender des Aufsichtsrats: Prof. Dr. Heinrich R. Schradin. > Vorstand: Klaus-J?rgen Heitmann (Sprecher), Stefan Gronbach, Dr. Hans Olav > Her?y, Dr. J?rg Rheinl?nder (stv.), Sarah R?ssler, Daniel Thomas (stv.). > ------------------------------ > Diese Nachricht enth?lt vertrauliche und/oder rechtlich gesch?tzte > Informationen. > Wenn Sie nicht der richtige Adressat sind oder diese Nachricht irrt?mlich > erhalten haben, > informieren Sie bitte sofort den Absender und vernichten Sie diese > Nachricht. > Das unerlaubte Kopieren sowie die unbefugte Weitergabe dieser Nachricht > ist nicht gestattet. > > This information may contain confidential and/or privileged information. > If you are not the intended recipient (or have received this information > in error) please notify the > sender immediately and destroy this information. > Any unauthorized copying, disclosure or distribution of the material in > this information is strictly forbidden. > ------------------------------ > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jfosburg at mdanderson.org Mon Jul 31 18:56:46 2017 From: jfosburg at mdanderson.org (Fosburgh,Jonathan) Date: Mon, 31 Jul 2017 17:56:46 +0000 Subject: [gpfsug-discuss] Auditing In-Reply-To: References: Message-ID: At present there is not a method to audit file access. Jonathan Fosburgh Principal Application Systems Analyst Storage Team IT Operations jfosburg at mdanderson.org (713) 745-9346 On 07/31/2017 12:30 PM, Mark Bush wrote: Does someone already have a policy that can extract typical file audit items (user_id, last written, opened/accessed, modified, deleted, etc)? Am I barking up the wrong tree for this is there a better way to get this type of data from a Spectrum Scale filesystem? Mark This message (including any attachments) is intended only for the use of the individual or entity to which it is addressed and may contain information that is non-public, proprietary, privileged, confidential, and exempt from disclosure under applicable law. If you are not the intended recipient, you are hereby notified that any use, dissemination, distribution, or copying of this communication is strictly prohibited. This message may be viewed by parties at Sirius Computer Solutions other than those named in the message header. This message does not contain an official representation of Sirius Computer Solutions. If you have received this communication in error, notify Sirius Computer Solutions immediately and (i) destroy this message if a facsimile or (ii) delete this message immediately if this is an electronic communication. Thank you. Sirius Computer Solutions _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss The information contained in this e-mail message may be privileged, confidential, and/or protected from disclosure. This e-mail message may contain protected health information (PHI); dissemination of PHI should comply with applicable federal and state laws. If you are not the intended recipient, or an authorized representative of the intended recipient, any further review, disclosure, use, dissemination, distribution, or copying of this message or any attachment (or the information contained therein) is strictly prohibited. If you think that you have received this e-mail message in error, please notify the sender by return e-mail and delete all references to it and its contents from your systems. -------------- next part -------------- An HTML attachment was scrubbed... URL: From Robert.Oesterlin at nuance.com Mon Jul 31 19:02:30 2017 From: Robert.Oesterlin at nuance.com (Oesterlin, Robert) Date: Mon, 31 Jul 2017 18:02:30 +0000 Subject: [gpfsug-discuss] Re Auditing Message-ID: We run a policy that looks like this: -- cut here -- define(daysToEpoch, days(timestamp('1970-01-01 00:00:00.0'))) define(unixTS, char(int( (( days(\$1) - daysToEpoch ) * 86400) + ( hour(\$1) * 3600) + (minute(\$1) * 60) + (second(\$1)) )) ) rule 'dumpall' list '"$filesystem"' DIRECTORIES_PLUS SHOW( '|' || varchar(user_id) || '|' || varchar(group_id) || '|' || char(mode) || '|' || varchar(file_size) || '|' || varchar(kb_allocated) || '|' || varchar(nlink) || '|' || unixTS(access_time,19) || '|' || unixTS(modification_time) || '|' || unixTS(creation_time) || '|' || char(misc_attributes,1) || '|' ) -- cut here -- Bob Oesterlin Sr Principal Storage Engineer, Nuance From: on behalf of Mark Bush Reply-To: gpfsug main discussion list Date: Monday, July 31, 2017 at 12:31 PM To: "gpfsug-discuss at spectrumscale.org" Subject: [EXTERNAL] [gpfsug-discuss] Auditing Does someone already have a policy that can extract typical file audit items (user_id, last written, opened/accessed, modified, deleted, etc)? Am I barking up the wrong tree for this is there a better way to get this type of data from a Spectrum Scale filesystem? -------------- next part -------------- An HTML attachment was scrubbed... URL: From Mark.Bush at siriuscom.com Mon Jul 31 19:05:37 2017 From: Mark.Bush at siriuscom.com (Mark Bush) Date: Mon, 31 Jul 2017 18:05:37 +0000 Subject: [gpfsug-discuss] Re Auditing In-Reply-To: References: Message-ID: Brilliant. Thanks Bob. From: Oesterlin, Robert [mailto:Robert.Oesterlin at nuance.com] Sent: Monday, July 31, 2017 1:03 PM To: gpfsug main discussion list Subject: [gpfsug-discuss] Re Auditing We run a policy that looks like this: -- cut here -- define(daysToEpoch, days(timestamp('1970-01-01 00:00:00.0'))) define(unixTS, char(int( (( days(\$1) - daysToEpoch ) * 86400) + ( hour(\$1) * 3600) + (minute(\$1) * 60) + (second(\$1)) )) ) rule 'dumpall' list '"$filesystem"' DIRECTORIES_PLUS SHOW( '|' || varchar(user_id) || '|' || varchar(group_id) || '|' || char(mode) || '|' || varchar(file_size) || '|' || varchar(kb_allocated) || '|' || varchar(nlink) || '|' || unixTS(access_time,19) || '|' || unixTS(modification_time) || '|' || unixTS(creation_time) || '|' || char(misc_attributes,1) || '|' ) -- cut here -- Bob Oesterlin Sr Principal Storage Engineer, Nuance From: > on behalf of Mark Bush > Reply-To: gpfsug main discussion list > Date: Monday, July 31, 2017 at 12:31 PM To: "gpfsug-discuss at spectrumscale.org" > Subject: [EXTERNAL] [gpfsug-discuss] Auditing Does someone already have a policy that can extract typical file audit items (user_id, last written, opened/accessed, modified, deleted, etc)? Am I barking up the wrong tree for this is there a better way to get this type of data from a Spectrum Scale filesystem? This message (including any attachments) is intended only for the use of the individual or entity to which it is addressed and may contain information that is non-public, proprietary, privileged, confidential, and exempt from disclosure under applicable law. If you are not the intended recipient, you are hereby notified that any use, dissemination, distribution, or copying of this communication is strictly prohibited. This message may be viewed by parties at Sirius Computer Solutions other than those named in the message header. This message does not contain an official representation of Sirius Computer Solutions. If you have received this communication in error, notify Sirius Computer Solutions immediately and (i) destroy this message if a facsimile or (ii) delete this message immediately if this is an electronic communication. Thank you. Sirius Computer Solutions -------------- next part -------------- An HTML attachment was scrubbed... URL: From makaplan at us.ibm.com Mon Jul 31 19:26:52 2017 From: makaplan at us.ibm.com (Marc A Kaplan) Date: Mon, 31 Jul 2017 14:26:52 -0400 Subject: [gpfsug-discuss] Re Auditing - timestamps In-Reply-To: References: Message-ID: The "ILM" chapter in the Admin Guide has some tips, among which: 18. You can convert a time interval value to a number of seconds with the SQL cast syntax, as in the following example: define([toSeconds],[(($1) SECONDS(12,6))]) define([toUnixSeconds],[toSeconds($1 - ?1970-1-1 at 0:00?)]) RULE external list b RULE list b SHOW(?sinceNow=? toSeconds(current_timestamp-modification_time) ) RULE external list c RULE list c SHOW(?sinceUnixEpoch=? toUnixSeconds(modification_time) ) The following method is also supported: define(access_age_in_days,( INTEGER(( (CURRENT_TIMESTAMP - ACCESS_TIME) SECONDS)) /(24*3600.0) ) ) RULE external list w exec ?? RULE list w weight(access_age_in_days) show(access_age_in_days) --marc of GPFS -------------- next part -------------- An HTML attachment was scrubbed... URL: From pinto at scinet.utoronto.ca Mon Jul 31 19:46:53 2017 From: pinto at scinet.utoronto.ca (Jaime Pinto) Date: Mon, 31 Jul 2017 14:46:53 -0400 Subject: [gpfsug-discuss] Quota and hardlimit enforcement In-Reply-To: <200a086c1740448da544e667c03887e5@SMXRF105.msg.hukrf.de> References: <200a086c1740448da544e667c03887e5@SMXRF105.msg.hukrf.de> Message-ID: <20170731144653.160355y5whmerokd@support.scinet.utoronto.ca> Renar For as long as the usage is below the hard limit (space or inodes) and below the grace period you'll be able to write. I don't think you can set the grace period to an specific value as a quota parameter, such as none. That is set at the filesystem creation time. BTW, grace period limit has been a mystery to me for many years. My impression is that GPFS keeps changing it internally depending on the position of the moon. I think ours is 2 hours, but at times I can see users writing for longer. Jaime Quoting "Grunenberg, Renar" : > Hallo All, > we are on Version 4.2.3.2 and see some missunderstandig in the > enforcement of hardlimit definitions on a flieset quota. What we see > is we put some 200 GB files on following quota definitions: quota > 150 GB Limit 250 GB Grace none. > After the creating of one 200 GB we hit the softquota limit, thats > ok. But After the the second file was created!! we expect an io > error but it don?t happen. We define all well know Parameters > (-Q,..) on the filesystem . Is this a bug or a Feature? mmcheckquota > are already running at first. > Regards Renar. > > > Renar Grunenberg > Abteilung Informatik ? Betrieb > > HUK-COBURG > Bahnhofsplatz > 96444 Coburg > Telefon: 09561 96-44110 > Telefax: 09561 96-44104 > E-Mail: Renar.Grunenberg at huk-coburg.de > Internet: www.huk.de > ________________________________ > HUK-COBURG Haftpflicht-Unterst?tzungs-Kasse kraftfahrender Beamter > Deutschlands a. G. in Coburg > Reg.-Gericht Coburg HRB 100; St.-Nr. 9212/101/00021 > Sitz der Gesellschaft: Bahnhofsplatz, 96444 Coburg > Vorsitzender des Aufsichtsrats: Prof. Dr. Heinrich R. Schradin. > Vorstand: Klaus-J?rgen Heitmann (Sprecher), Stefan Gronbach, Dr. > Hans Olav Her?y, Dr. J?rg Rheinl?nder (stv.), Sarah R?ssler, Daniel > Thomas (stv.). > ________________________________ > Diese Nachricht enth?lt vertrauliche und/oder rechtlich gesch?tzte > Informationen. > Wenn Sie nicht der richtige Adressat sind oder diese Nachricht > irrt?mlich erhalten haben, > informieren Sie bitte sofort den Absender und vernichten Sie diese Nachricht. > Das unerlaubte Kopieren sowie die unbefugte Weitergabe dieser > Nachricht ist nicht gestattet. > > This information may contain confidential and/or privileged information. > If you are not the intended recipient (or have received this > information in error) please notify the > sender immediately and destroy this information. > Any unauthorized copying, disclosure or distribution of the material > in this information is strictly forbidden. > ________________________________ > ************************************ TELL US ABOUT YOUR SUCCESS STORIES http://www.scinethpc.ca/testimonials ************************************ --- Jaime Pinto SciNet HPC Consortium - Compute/Calcul Canada www.scinet.utoronto.ca - www.computecanada.ca University of Toronto 661 University Ave. (MaRS), Suite 1140 Toronto, ON, M5G1M1 P: 416-978-2755 C: 416-505-1477 ---------------------------------------------------------------- This message was sent using IMP at SciNet Consortium, University of Toronto. From Renar.Grunenberg at huk-coburg.de Mon Jul 31 20:04:56 2017 From: Renar.Grunenberg at huk-coburg.de (Grunenberg, Renar) Date: Mon, 31 Jul 2017 19:04:56 +0000 Subject: [gpfsug-discuss] Quota and hardlimit enforcement In-Reply-To: References: <200a086c1740448da544e667c03887e5@SMXRF105.msg.hukrf.de> Message-ID: Hallo J. Eric, hallo Jaime, Ok after we hit the softlimit we see that the graceperiod are go to 7 days. I think that?s the default. But was does it mean. After we reach the ?hard?-limit. we see additionaly the gbytes in_doubt. My interpretation now we can write many gb to the nospace-left event in the filesystem. But our intention is to restricted some application to write only to the hardlimit in the fileset. Any hints to accomplish this? Renar Grunenberg Abteilung Informatik ? Betrieb HUK-COBURG Bahnhofsplatz 96444 Coburg Telefon: 09561 96-44110 Telefax: 09561 96-44104 E-Mail: Renar.Grunenberg at huk-coburg.de Internet: www.huk.de ________________________________ HUK-COBURG Haftpflicht-Unterst?tzungs-Kasse kraftfahrender Beamter Deutschlands a. G. in Coburg Reg.-Gericht Coburg HRB 100; St.-Nr. 9212/101/00021 Sitz der Gesellschaft: Bahnhofsplatz, 96444 Coburg Vorsitzender des Aufsichtsrats: Prof. Dr. Heinrich R. Schradin. Vorstand: Klaus-J?rgen Heitmann (Sprecher), Stefan Gronbach, Dr. Hans Olav Her?y, Dr. J?rg Rheinl?nder (stv.), Sarah R?ssler, Daniel Thomas (stv.). ________________________________ Diese Nachricht enth?lt vertrauliche und/oder rechtlich gesch?tzte Informationen. Wenn Sie nicht der richtige Adressat sind oder diese Nachricht irrt?mlich erhalten haben, informieren Sie bitte sofort den Absender und vernichten Sie diese Nachricht. Das unerlaubte Kopieren sowie die unbefugte Weitergabe dieser Nachricht ist nicht gestattet. This information may contain confidential and/or privileged information. If you are not the intended recipient (or have received this information in error) please notify the sender immediately and destroy this information. Any unauthorized copying, disclosure or distribution of the material in this information is strictly forbidden. ________________________________ Von: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] Im Auftrag von J. Eric Wonderley Gesendet: Montag, 31. Juli 2017 19:55 An: gpfsug main discussion list Betreff: Re: [gpfsug-discuss] Quota and hardlimit enforcement Hi Renar: What does 'mmlsquota -j fileset filesystem' report? I did not think you would get a grace period of none unless the hardlimit=softlimit. On Mon, Jul 31, 2017 at 1:44 PM, Grunenberg, Renar > wrote: Hallo All, we are on Version 4.2.3.2 and see some missunderstandig in the enforcement of hardlimit definitions on a flieset quota. What we see is we put some 200 GB files on following quota definitions: quota 150 GB Limit 250 GB Grace none. After the creating of one 200 GB we hit the softquota limit, thats ok. But After the the second file was created!! we expect an io error but it don?t happen. We define all well know Parameters (-Q,..) on the filesystem . Is this a bug or a Feature? mmcheckquota are already running at first. Regards Renar. Renar Grunenberg Abteilung Informatik ? Betrieb HUK-COBURG Bahnhofsplatz 96444 Coburg Telefon: 09561 96-44110 Telefax: 09561 96-44104 E-Mail: Renar.Grunenberg at huk-coburg.de Internet: www.huk.de ________________________________ HUK-COBURG Haftpflicht-Unterst?tzungs-Kasse kraftfahrender Beamter Deutschlands a. G. in Coburg Reg.-Gericht Coburg HRB 100; St.-Nr. 9212/101/00021 Sitz der Gesellschaft: Bahnhofsplatz, 96444 Coburg Vorsitzender des Aufsichtsrats: Prof. Dr. Heinrich R. Schradin. Vorstand: Klaus-J?rgen Heitmann (Sprecher), Stefan Gronbach, Dr. Hans Olav Her?y, Dr. J?rg Rheinl?nder (stv.), Sarah R?ssler, Daniel Thomas (stv.). ________________________________ Diese Nachricht enth?lt vertrauliche und/oder rechtlich gesch?tzte Informationen. Wenn Sie nicht der richtige Adressat sind oder diese Nachricht irrt?mlich erhalten haben, informieren Sie bitte sofort den Absender und vernichten Sie diese Nachricht. Das unerlaubte Kopieren sowie die unbefugte Weitergabe dieser Nachricht ist nicht gestattet. This information may contain confidential and/or privileged information. If you are not the intended recipient (or have received this information in error) please notify the sender immediately and destroy this information. Any unauthorized copying, disclosure or distribution of the material in this information is strictly forbidden. ________________________________ _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From Kevin.Buterbaugh at Vanderbilt.Edu Mon Jul 31 20:21:46 2017 From: Kevin.Buterbaugh at Vanderbilt.Edu (Buterbaugh, Kevin L) Date: Mon, 31 Jul 2017 19:21:46 +0000 Subject: [gpfsug-discuss] Quota and hardlimit enforcement In-Reply-To: References: <200a086c1740448da544e667c03887e5@SMXRF105.msg.hukrf.de> Message-ID: Hi Renar, I?m sure this is the case, but I don?t see anywhere in this thread where this is explicitly stated ? you?re not doing your tests as root, are you? root, of course, is not bound by any quotas. Kevin On Jul 31, 2017, at 2:04 PM, Grunenberg, Renar > wrote: Hallo J. Eric, hallo Jaime, Ok after we hit the softlimit we see that the graceperiod are go to 7 days. I think that?s the default. But was does it mean. After we reach the ?hard?-limit. we see additionaly the gbytes in_doubt. My interpretation now we can write many gb to the nospace-left event in the filesystem. But our intention is to restricted some application to write only to the hardlimit in the fileset. Any hints to accomplish this? Renar Grunenberg Abteilung Informatik ? Betrieb HUK-COBURG Bahnhofsplatz 96444 Coburg Telefon: 09561 96-44110 Telefax: 09561 96-44104 E-Mail: Renar.Grunenberg at huk-coburg.de Internet: www.huk.de ________________________________ HUK-COBURG Haftpflicht-Unterst?tzungs-Kasse kraftfahrender Beamter Deutschlands a. G. in Coburg Reg.-Gericht Coburg HRB 100; St.-Nr. 9212/101/00021 Sitz der Gesellschaft: Bahnhofsplatz, 96444 Coburg Vorsitzender des Aufsichtsrats: Prof. Dr. Heinrich R. Schradin. Vorstand: Klaus-J?rgen Heitmann (Sprecher), Stefan Gronbach, Dr. Hans Olav Her?y, Dr. J?rg Rheinl?nder (stv.), Sarah R?ssler, Daniel Thomas (stv.). ________________________________ Diese Nachricht enth?lt vertrauliche und/oder rechtlich gesch?tzte Informationen. Wenn Sie nicht der richtige Adressat sind oder diese Nachricht irrt?mlich erhalten haben, informieren Sie bitte sofort den Absender und vernichten Sie diese Nachricht. Das unerlaubte Kopieren sowie die unbefugte Weitergabe dieser Nachricht ist nicht gestattet. This information may contain confidential and/or privileged information. If you are not the intended recipient (or have received this information in error) please notify the sender immediately and destroy this information. Any unauthorized copying, disclosure or distribution of the material in this information is strictly forbidden. ________________________________ Von: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] Im Auftrag von J. Eric Wonderley Gesendet: Montag, 31. Juli 2017 19:55 An: gpfsug main discussion list > Betreff: Re: [gpfsug-discuss] Quota and hardlimit enforcement Hi Renar: What does 'mmlsquota -j fileset filesystem' report? I did not think you would get a grace period of none unless the hardlimit=softlimit. On Mon, Jul 31, 2017 at 1:44 PM, Grunenberg, Renar > wrote: Hallo All, we are on Version 4.2.3.2 and see some missunderstandig in the enforcement of hardlimit definitions on a flieset quota. What we see is we put some 200 GB files on following quota definitions: quota 150 GB Limit 250 GB Grace none. After the creating of one 200 GB we hit the softquota limit, thats ok. But After the the second file was created!! we expect an io error but it don?t happen. We define all well know Parameters (-Q,..) on the filesystem . Is this a bug or a Feature? mmcheckquota are already running at first. Regards Renar. Renar Grunenberg Abteilung Informatik ? Betrieb HUK-COBURG Bahnhofsplatz 96444 Coburg Telefon: 09561 96-44110 Telefax: 09561 96-44104 E-Mail: Renar.Grunenberg at huk-coburg.de Internet: www.huk.de ________________________________ HUK-COBURG Haftpflicht-Unterst?tzungs-Kasse kraftfahrender Beamter Deutschlands a. G. in Coburg Reg.-Gericht Coburg HRB 100; St.-Nr. 9212/101/00021 Sitz der Gesellschaft: Bahnhofsplatz, 96444 Coburg Vorsitzender des Aufsichtsrats: Prof. Dr. Heinrich R. Schradin. Vorstand: Klaus-J?rgen Heitmann (Sprecher), Stefan Gronbach, Dr. Hans Olav Her?y, Dr. J?rg Rheinl?nder (stv.), Sarah R?ssler, Daniel Thomas (stv.). ________________________________ Diese Nachricht enth?lt vertrauliche und/oder rechtlich gesch?tzte Informationen. Wenn Sie nicht der richtige Adressat sind oder diese Nachricht irrt?mlich erhalten haben, informieren Sie bitte sofort den Absender und vernichten Sie diese Nachricht. Das unerlaubte Kopieren sowie die unbefugte Weitergabe dieser Nachricht ist nicht gestattet. This information may contain confidential and/or privileged information. If you are not the intended recipient (or have received this information in error) please notify the sender immediately and destroy this information. Any unauthorized copying, disclosure or distribution of the material in this information is strictly forbidden. ________________________________ _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From Renar.Grunenberg at huk-coburg.de Mon Jul 31 20:30:20 2017 From: Renar.Grunenberg at huk-coburg.de (Grunenberg, Renar) Date: Mon, 31 Jul 2017 19:30:20 +0000 Subject: [gpfsug-discuss] Quota and hardlimit enforcement In-Reply-To: References: <200a086c1740448da544e667c03887e5@SMXRF105.msg.hukrf.de> Message-ID: Hallo Kevin, thanks for your hint i will check these tomorrow, and yes as root, lol. Regards Renar Renar Grunenberg Abteilung Informatik ? Betrieb HUK-COBURG Bahnhofsplatz 96444 Coburg Telefon: 09561 96-44110 Telefax: 09561 96-44104 E-Mail: Renar.Grunenberg at huk-coburg.de Internet: www.huk.de ________________________________ HUK-COBURG Haftpflicht-Unterst?tzungs-Kasse kraftfahrender Beamter Deutschlands a. G. in Coburg Reg.-Gericht Coburg HRB 100; St.-Nr. 9212/101/00021 Sitz der Gesellschaft: Bahnhofsplatz, 96444 Coburg Vorsitzender des Aufsichtsrats: Prof. Dr. Heinrich R. Schradin. Vorstand: Klaus-J?rgen Heitmann (Sprecher), Stefan Gronbach, Dr. Hans Olav Her?y, Dr. J?rg Rheinl?nder (stv.), Sarah R?ssler, Daniel Thomas (stv.). ________________________________ Diese Nachricht enth?lt vertrauliche und/oder rechtlich gesch?tzte Informationen. Wenn Sie nicht der richtige Adressat sind oder diese Nachricht irrt?mlich erhalten haben, informieren Sie bitte sofort den Absender und vernichten Sie diese Nachricht. Das unerlaubte Kopieren sowie die unbefugte Weitergabe dieser Nachricht ist nicht gestattet. This information may contain confidential and/or privileged information. If you are not the intended recipient (or have received this information in error) please notify the sender immediately and destroy this information. Any unauthorized copying, disclosure or distribution of the material in this information is strictly forbidden. ________________________________ Von: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] Im Auftrag von Buterbaugh, Kevin L Gesendet: Montag, 31. Juli 2017 21:22 An: gpfsug main discussion list Betreff: Re: [gpfsug-discuss] Quota and hardlimit enforcement Hi Renar, I?m sure this is the case, but I don?t see anywhere in this thread where this is explicitly stated ? you?re not doing your tests as root, are you? root, of course, is not bound by any quotas. Kevin On Jul 31, 2017, at 2:04 PM, Grunenberg, Renar > wrote: Hallo J. Eric, hallo Jaime, Ok after we hit the softlimit we see that the graceperiod are go to 7 days. I think that?s the default. But was does it mean. After we reach the ?hard?-limit. we see additionaly the gbytes in_doubt. My interpretation now we can write many gb to the nospace-left event in the filesystem. But our intention is to restricted some application to write only to the hardlimit in the fileset. Any hints to accomplish this? Renar Grunenberg Abteilung Informatik ? Betrieb HUK-COBURG Bahnhofsplatz 96444 Coburg Telefon: 09561 96-44110 Telefax: 09561 96-44104 E-Mail: Renar.Grunenberg at huk-coburg.de Internet: www.huk.de ________________________________ HUK-COBURG Haftpflicht-Unterst?tzungs-Kasse kraftfahrender Beamter Deutschlands a. G. in Coburg Reg.-Gericht Coburg HRB 100; St.-Nr. 9212/101/00021 Sitz der Gesellschaft: Bahnhofsplatz, 96444 Coburg Vorsitzender des Aufsichtsrats: Prof. Dr. Heinrich R. Schradin. Vorstand: Klaus-J?rgen Heitmann (Sprecher), Stefan Gronbach, Dr. Hans Olav Her?y, Dr. J?rg Rheinl?nder (stv.), Sarah R?ssler, Daniel Thomas (stv.). ________________________________ Diese Nachricht enth?lt vertrauliche und/oder rechtlich gesch?tzte Informationen. Wenn Sie nicht der richtige Adressat sind oder diese Nachricht irrt?mlich erhalten haben, informieren Sie bitte sofort den Absender und vernichten Sie diese Nachricht. Das unerlaubte Kopieren sowie die unbefugte Weitergabe dieser Nachricht ist nicht gestattet. This information may contain confidential and/or privileged information. If you are not the intended recipient (or have received this information in error) please notify the sender immediately and destroy this information. Any unauthorized copying, disclosure or distribution of the material in this information is strictly forbidden. ________________________________ Von: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] Im Auftrag von J. Eric Wonderley Gesendet: Montag, 31. Juli 2017 19:55 An: gpfsug main discussion list > Betreff: Re: [gpfsug-discuss] Quota and hardlimit enforcement Hi Renar: What does 'mmlsquota -j fileset filesystem' report? I did not think you would get a grace period of none unless the hardlimit=softlimit. On Mon, Jul 31, 2017 at 1:44 PM, Grunenberg, Renar > wrote: Hallo All, we are on Version 4.2.3.2 and see some missunderstandig in the enforcement of hardlimit definitions on a flieset quota. What we see is we put some 200 GB files on following quota definitions: quota 150 GB Limit 250 GB Grace none. After the creating of one 200 GB we hit the softquota limit, thats ok. But After the the second file was created!! we expect an io error but it don?t happen. We define all well know Parameters (-Q,..) on the filesystem . Is this a bug or a Feature? mmcheckquota are already running at first. Regards Renar. Renar Grunenberg Abteilung Informatik ? Betrieb HUK-COBURG Bahnhofsplatz 96444 Coburg Telefon: 09561 96-44110 Telefax: 09561 96-44104 E-Mail: Renar.Grunenberg at huk-coburg.de Internet: www.huk.de ________________________________ HUK-COBURG Haftpflicht-Unterst?tzungs-Kasse kraftfahrender Beamter Deutschlands a. G. in Coburg Reg.-Gericht Coburg HRB 100; St.-Nr. 9212/101/00021 Sitz der Gesellschaft: Bahnhofsplatz, 96444 Coburg Vorsitzender des Aufsichtsrats: Prof. Dr. Heinrich R. Schradin. Vorstand: Klaus-J?rgen Heitmann (Sprecher), Stefan Gronbach, Dr. Hans Olav Her?y, Dr. J?rg Rheinl?nder (stv.), Sarah R?ssler, Daniel Thomas (stv.). ________________________________ Diese Nachricht enth?lt vertrauliche und/oder rechtlich gesch?tzte Informationen. Wenn Sie nicht der richtige Adressat sind oder diese Nachricht irrt?mlich erhalten haben, informieren Sie bitte sofort den Absender und vernichten Sie diese Nachricht. Das unerlaubte Kopieren sowie die unbefugte Weitergabe dieser Nachricht ist nicht gestattet. This information may contain confidential and/or privileged information. If you are not the intended recipient (or have received this information in error) please notify the sender immediately and destroy this information. Any unauthorized copying, disclosure or distribution of the material in this information is strictly forbidden. ________________________________ _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From pinto at scinet.utoronto.ca Mon Jul 31 21:03:53 2017 From: pinto at scinet.utoronto.ca (Jaime Pinto) Date: Mon, 31 Jul 2017 16:03:53 -0400 Subject: [gpfsug-discuss] Quota and hardlimit enforcement In-Reply-To: References: <200a086c1740448da544e667c03887e5@SMXRF105.msg.hukrf.de> Message-ID: <20170731160353.54412s4i1r957eax@support.scinet.utoronto.ca> In addition, the in_doubt column is a function of the data turn-over and the internal gpfs accounting synchronization period (beyond root control). The higher the in_doubt values the less accurate the real amount of space/inodes a user/group/fileset has in the filesystem. What I noticed in practice is the the in_doubt values only get worst overtime, and work against the quotas, making them hit the limits sooner. Therefore, you may wish to run a 'mmcheckquota' crontab job once or twice a day, to reset the in_doubt column to zero mover often. GPFS has a very high lag to do this on its own in the most recent versions, and seldom really catches up on a very active filesystem. If your grace period is set to 7 days I can assure you that in an HPC environment it's the equivalent of not having quotas effectively. You should set it to 2 hours or 4 hours. In an environment such as ours a runway process can easily generate 500TB of data or 1 billion inodes in few hours, and choke the file system to all users/jobs. Jaime Quoting "Buterbaugh, Kevin L" : > Hi Renar, > > I?m sure this is the case, but I don?t see anywhere in this thread > where this is explicitly stated ? you?re not doing your tests as > root, are you? root, of course, is not bound by any quotas. > > Kevin > > On Jul 31, 2017, at 2:04 PM, Grunenberg, Renar > > > wrote: > > > Hallo J. Eric, hallo Jaime, > Ok after we hit the softlimit we see that the graceperiod are go to > 7 days. I think that?s the default. But was does it mean. > After we reach the ?hard?-limit. we see additionaly the gbytes in_doubt. > My interpretation now we can write many gb to the nospace-left event > in the filesystem. > But our intention is to restricted some application to write only to > the hardlimit in the fileset. Any hints to accomplish this? > > > > Renar Grunenberg > Abteilung Informatik ? Betrieb > > HUK-COBURG > Bahnhofsplatz > 96444 Coburg > Telefon: 09561 96-44110 > Telefax: 09561 96-44104 > E-Mail: Renar.Grunenberg at huk-coburg.de > Internet: www.huk.de > > ________________________________ > HUK-COBURG Haftpflicht-Unterst?tzungs-Kasse kraftfahrender Beamter > Deutschlands a. G. in Coburg > Reg.-Gericht Coburg HRB 100; St.-Nr. 9212/101/00021 > Sitz der Gesellschaft: Bahnhofsplatz, 96444 Coburg > Vorsitzender des Aufsichtsrats: Prof. Dr. Heinrich R. Schradin. > Vorstand: Klaus-J?rgen Heitmann (Sprecher), Stefan Gronbach, Dr. > Hans Olav Her?y, Dr. J?rg Rheinl?nder (stv.), Sarah R?ssler, Daniel > Thomas (stv.). > ________________________________ > Diese Nachricht enth?lt vertrauliche und/oder rechtlich gesch?tzte > Informationen. > Wenn Sie nicht der richtige Adressat sind oder diese Nachricht > irrt?mlich erhalten haben, > informieren Sie bitte sofort den Absender und vernichten Sie diese Nachricht. > Das unerlaubte Kopieren sowie die unbefugte Weitergabe dieser > Nachricht ist nicht gestattet. > > This information may contain confidential and/or privileged information. > If you are not the intended recipient (or have received this > information in error) please notify the > sender immediately and destroy this information. > Any unauthorized copying, disclosure or distribution of the material > in this information is strictly forbidden. > ________________________________ > > > > Von: > gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] Im Auftrag von J. Eric > Wonderley > Gesendet: Montag, 31. Juli 2017 19:55 > An: gpfsug main discussion list > > > Betreff: Re: [gpfsug-discuss] Quota and hardlimit enforcement > > Hi Renar: > What does 'mmlsquota -j fileset filesystem' report? > I did not think you would get a grace period of none unless the > hardlimit=softlimit. > > On Mon, Jul 31, 2017 at 1:44 PM, Grunenberg, Renar > > > wrote: > Hallo All, > we are on Version 4.2.3.2 and see some missunderstandig in the > enforcement of hardlimit definitions on a flieset quota. What we see > is we put some 200 GB files on following quota definitions: quota > 150 GB Limit 250 GB Grace none. > After the creating of one 200 GB we hit the softquota limit, thats > ok. But After the the second file was created!! we expect an io > error but it don?t happen. We define all well know Parameters > (-Q,..) on the filesystem . Is this a bug or a Feature? mmcheckquota > are already running at first. > Regards Renar. > > > Renar Grunenberg > Abteilung Informatik ? Betrieb > > HUK-COBURG > Bahnhofsplatz > 96444 Coburg > > Telefon: > > 09561 96-44110 > > Telefax: > > 09561 96-44104 > > E-Mail: > > Renar.Grunenberg at huk-coburg.de > > Internet: > > www.huk.de > > ________________________________ > HUK-COBURG Haftpflicht-Unterst?tzungs-Kasse kraftfahrender Beamter > Deutschlands a. G. in Coburg > Reg.-Gericht Coburg HRB 100; St.-Nr. 9212/101/00021 > Sitz der Gesellschaft: Bahnhofsplatz, 96444 Coburg > Vorsitzender des Aufsichtsrats: Prof. Dr. Heinrich R. Schradin. > Vorstand: Klaus-J?rgen Heitmann (Sprecher), Stefan Gronbach, Dr. > Hans Olav Her?y, Dr. J?rg Rheinl?nder (stv.), Sarah R?ssler, Daniel > Thomas (stv.). > ________________________________ > Diese Nachricht enth?lt vertrauliche und/oder rechtlich gesch?tzte > Informationen. > Wenn Sie nicht der richtige Adressat sind oder diese Nachricht > irrt?mlich erhalten haben, > informieren Sie bitte sofort den Absender und vernichten Sie diese Nachricht. > Das unerlaubte Kopieren sowie die unbefugte Weitergabe dieser > Nachricht ist nicht gestattet. > > This information may contain confidential and/or privileged information. > If you are not the intended recipient (or have received this > information in error) please notify the > sender immediately and destroy this information. > Any unauthorized copying, disclosure or distribution of the material > in this information is strictly forbidden. > ________________________________ > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > > > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > ************************************ TELL US ABOUT YOUR SUCCESS STORIES http://www.scinethpc.ca/testimonials ************************************ --- Jaime Pinto SciNet HPC Consortium - Compute/Calcul Canada www.scinet.utoronto.ca - www.computecanada.ca University of Toronto 661 University Ave. (MaRS), Suite 1140 Toronto, ON, M5G1M1 P: 416-978-2755 C: 416-505-1477 ---------------------------------------------------------------- This message was sent using IMP at SciNet Consortium, University of Toronto. From Kevin.Buterbaugh at Vanderbilt.Edu Mon Jul 31 21:11:14 2017 From: Kevin.Buterbaugh at Vanderbilt.Edu (Buterbaugh, Kevin L) Date: Mon, 31 Jul 2017 20:11:14 +0000 Subject: [gpfsug-discuss] Quota and hardlimit enforcement In-Reply-To: <20170731160353.54412s4i1r957eax@support.scinet.utoronto.ca> References: <200a086c1740448da544e667c03887e5@SMXRF105.msg.hukrf.de> <20170731160353.54412s4i1r957eax@support.scinet.utoronto.ca> Message-ID: <3789811E-523F-47AE-93F3-E2985DD84D60@vanderbilt.edu> Jaime, That?s heavily workload dependent. We run a traditional HPC cluster and have a 7 day grace on home and 14 days on scratch. By setting the soft and hard limits appropriately we?ve slammed the door on many a runaway user / group / fileset. YMMV? Kevin On Jul 31, 2017, at 3:03 PM, Jaime Pinto > wrote: If your grace period is set to 7 days I can assure you that in an HPC environment it's the equivalent of not having quotas effectively. You should set it to 2 hours or 4 hours. In an environment such as ours a runway process can easily generate 500TB of data or 1 billion inodes in few hours, and choke the file system to all users/jobs. Jaime ? Kevin Buterbaugh - Senior System Administrator Vanderbilt University - Advanced Computing Center for Research and Education Kevin.Buterbaugh at vanderbilt.edu - (615)875-9633 -------------- next part -------------- An HTML attachment was scrubbed... URL: From jonathan at buzzard.me.uk Sat Jul 1 10:20:18 2017 From: jonathan at buzzard.me.uk (Jonathan Buzzard) Date: Sat, 1 Jul 2017 10:20:18 +0100 Subject: [gpfsug-discuss] Mass UID migration suggestions In-Reply-To: <59566c3b.kqOSy9e2kg80XZ/k%hpc-luke@uconn.edu> References: <59566c3b.kqOSy9e2kg80XZ/k%hpc-luke@uconn.edu> Message-ID: On 30/06/17 16:20, hpc-luke at uconn.edu wrote: > Hello, > > We're trying to change most of our users uids, is there a clean way to > migrate all of one users files with say `mmapplypolicy`? We have to change the > owner of around 273539588 files, and my estimates for runtime are around 6 days. > > What we've been doing is indexing all of the files and splitting them up by > owner which takes around an hour, and then we were locking the user out while we > chown their files. I made it multi threaded as it weirdly gave a 10% speedup > despite my expectation that multi threading access from a single node would not > give any speedup. > > Generally I'm looking for advice on how to make the chowning faster. Would > spreading the chowning processes over multiple nodes improve performance? Should > I not stat the files before running lchown on them, since lchown checks the file > before changing it? I saw mention of inodescan(), in an old gpfsug email, which > speeds up disk read access, by not guaranteeing that the data is up to date. We > have a maintenance day coming up where all users will be locked out, so the file > handles(?) from GPFS's perspective will not be able to go stale. Is there a > function with similar constraints to inodescan that I can use to speed up this > process? My suggestion is to do some development work in C to write a custom program to do it for you. That way you can hook into the GPFS API to leverage the fast file system scanning API. Take a look at the tsbackup.C file in the samples directory. Obviously this is going to require someone with appropriate coding skills to develop. On the other hand given it is a one off and input is strictly controlled so error checking is a one off, then couple hundred lines C tops. My tip for this would be load the new UID's into a sparse array so you can just use the current UID to index into the array for the new UID, for speeding things up. It burns RAM but these days RAM is cheap and plentiful and speed is the major consideration here. This should in theory be able to do this in a few hours with this technique. One thing to bear in mind is that once the UID change is complete you will have to backup the entire file system again. JAB. -- Jonathan A. Buzzard Email: jonathan (at) buzzard.me.uk Fife, United Kingdom. From ilan84 at gmail.com Tue Jul 4 09:16:43 2017 From: ilan84 at gmail.com (Ilan Schwarts) Date: Tue, 4 Jul 2017 11:16:43 +0300 Subject: [gpfsug-discuss] Fail to mount file system Message-ID: Hi everyone, I have received existing corrupted GPFS 4.2.2 lab and I am trying to make it work. There are 2 nodes in a cluster: [root at LH20-GPFS1 ~]# mmgetstate -a Node number Node name GPFS state ------------------------------------------ 1 LH20-GPFS1 active 3 LH20-GPFS2 active The Cluster status is: [root at LH20-GPFS1 ~]# mmlscluster GPFS cluster information ======================== GPFS cluster name: MyCluster.LH20-GPFS2 GPFS cluster id: 10777108240438931454 GPFS UID domain: MyCluster.LH20-GPFS2 Remote shell command: /usr/bin/ssh Remote file copy command: /usr/bin/scp Repository type: CCR Node Daemon node name IP address Admin node name Designation -------------------------------------------------------------------- 1 LH20-GPFS1 10.10.158.61 LH20-GPFS1 quorum-manager 3 LH20-GPFS2 10.10.158.62 LH20-GPFS2 There is a file system: [root at LH20-GPFS1 ~]# mmlsnsd File system Disk name NSD servers --------------------------------------------------------------------------- fs_gpfs01 nynsd1 (directly attached) fs_gpfs01 nynsd2 (directly attached) [root at LH20-GPFS1 ~]# On each Node, There is folder /fs_gpfs01 The next step is to mount this fs_gpfs01 to be synced between the 2 nodes. Whilte executing mmmount i get exception: [root at LH20-GPFS1 ~]# mmmount /fs_gpfs01 Tue Jul 4 11:14:18 IDT 2017: mmmount: Mounting file systems ... mount: mount fs_gpfs01 on /fs_gpfs01 failed: Wrong medium type mmmount: Command failed. Examine previous error messages to determine cause. What am i doing wrong ? From scale at us.ibm.com Tue Jul 4 09:36:43 2017 From: scale at us.ibm.com (IBM Spectrum Scale) Date: Tue, 4 Jul 2017 14:06:43 +0530 Subject: [gpfsug-discuss] Fail to mount file system In-Reply-To: References: Message-ID: What exactly do you mean by "I have received existing corrupted GPFS 4.2.2 lab"? Is the file system corrupted ? Maybe this error is then due to file system corruption. Can you once try: mmmount fs_gpfs01 -a If this does not work then try: mmmount -o rs fs_gpfs01 Let me know which mount is working. Regards, The Spectrum Scale (GPFS) team ------------------------------------------------------------------------------------------------------------------ If you feel that your question can benefit other users of Spectrum Scale (GPFS), then please post it to the public IBM developerWroks Forum at https://www.ibm.com/developerworks/community/forums/html/forum?id=11111111-0000-0000-0000-000000000479 . If your query concerns a potential software error in Spectrum Scale (GPFS) and you have an IBM software maintenance contract please contact 1-800-237-5511 in the United States or your local IBM Service Center in other countries. The forum is informally monitored as time permits and should not be used for priority messages to the Spectrum Scale (GPFS) team. From: Ilan Schwarts To: gpfsug-discuss at spectrumscale.org Date: 07/04/2017 01:47 PM Subject: [gpfsug-discuss] Fail to mount file system Sent by: gpfsug-discuss-bounces at spectrumscale.org Hi everyone, I have received existing corrupted GPFS 4.2.2 lab and I am trying to make it work. There are 2 nodes in a cluster: [root at LH20-GPFS1 ~]# mmgetstate -a Node number Node name GPFS state ------------------------------------------ 1 LH20-GPFS1 active 3 LH20-GPFS2 active The Cluster status is: [root at LH20-GPFS1 ~]# mmlscluster GPFS cluster information ======================== GPFS cluster name: MyCluster.LH20-GPFS2 GPFS cluster id: 10777108240438931454 GPFS UID domain: MyCluster.LH20-GPFS2 Remote shell command: /usr/bin/ssh Remote file copy command: /usr/bin/scp Repository type: CCR Node Daemon node name IP address Admin node name Designation -------------------------------------------------------------------- 1 LH20-GPFS1 10.10.158.61 LH20-GPFS1 quorum-manager 3 LH20-GPFS2 10.10.158.62 LH20-GPFS2 There is a file system: [root at LH20-GPFS1 ~]# mmlsnsd File system Disk name NSD servers --------------------------------------------------------------------------- fs_gpfs01 nynsd1 (directly attached) fs_gpfs01 nynsd2 (directly attached) [root at LH20-GPFS1 ~]# On each Node, There is folder /fs_gpfs01 The next step is to mount this fs_gpfs01 to be synced between the 2 nodes. Whilte executing mmmount i get exception: [root at LH20-GPFS1 ~]# mmmount /fs_gpfs01 Tue Jul 4 11:14:18 IDT 2017: mmmount: Mounting file systems ... mount: mount fs_gpfs01 on /fs_gpfs01 failed: Wrong medium type mmmount: Command failed. Examine previous error messages to determine cause. What am i doing wrong ? _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From ilan84 at gmail.com Tue Jul 4 09:38:28 2017 From: ilan84 at gmail.com (Ilan Schwarts) Date: Tue, 4 Jul 2017 11:38:28 +0300 Subject: [gpfsug-discuss] Fail to mount file system In-Reply-To: References: Message-ID: I mean the person tried to configure it... didnt do good job so now its me to continue On Jul 4, 2017 11:37, "IBM Spectrum Scale" wrote: > What exactly do you mean by "I have received existing corrupted GPFS > 4.2.2 lab"? > Is the file system corrupted ? Maybe this error is then due to file system > corruption. > > Can you once try: mmmount fs_gpfs01 -a > If this does not work then try: mmmount -o rs fs_gpfs01 > > Let me know which mount is working. > > Regards, The Spectrum Scale (GPFS) team > > ------------------------------------------------------------ > ------------------------------------------------------ > If you feel that your question can benefit other users of Spectrum Scale > (GPFS), then please post it to the public IBM developerWroks Forum at > https://www.ibm.com/developerworks/community/ > forums/html/forum?id=11111111-0000-0000-0000-000000000479. > > If your query concerns a potential software error in Spectrum Scale (GPFS) > and you have an IBM software maintenance contract please contact > 1-800-237-5511 in the United States or your local IBM Service Center in > other countries. > > The forum is informally monitored as time permits and should not be used > for priority messages to the Spectrum Scale (GPFS) team. > > > > From: Ilan Schwarts > To: gpfsug-discuss at spectrumscale.org > Date: 07/04/2017 01:47 PM > Subject: [gpfsug-discuss] Fail to mount file system > Sent by: gpfsug-discuss-bounces at spectrumscale.org > ------------------------------ > > > > Hi everyone, I have received existing corrupted GPFS 4.2.2 lab and I > am trying to make it work. > There are 2 nodes in a cluster: > [root at LH20-GPFS1 ~]# mmgetstate -a > > Node number Node name GPFS state > ------------------------------------------ > 1 LH20-GPFS1 active > 3 LH20-GPFS2 active > > The Cluster status is: > [root at LH20-GPFS1 ~]# mmlscluster > > GPFS cluster information > ======================== > GPFS cluster name: MyCluster.LH20-GPFS2 > GPFS cluster id: 10777108240438931454 > GPFS UID domain: MyCluster.LH20-GPFS2 > Remote shell command: /usr/bin/ssh > Remote file copy command: /usr/bin/scp > Repository type: CCR > > Node Daemon node name IP address Admin node name Designation > -------------------------------------------------------------------- > 1 LH20-GPFS1 10.10.158.61 LH20-GPFS1 quorum-manager > 3 LH20-GPFS2 10.10.158.62 LH20-GPFS2 > > There is a file system: > [root at LH20-GPFS1 ~]# mmlsnsd > > File system Disk name NSD servers > ------------------------------------------------------------ > --------------- > fs_gpfs01 nynsd1 (directly attached) > fs_gpfs01 nynsd2 (directly attached) > > [root at LH20-GPFS1 ~]# > > On each Node, There is folder /fs_gpfs01 > The next step is to mount this fs_gpfs01 to be synced between the 2 nodes. > Whilte executing mmmount i get exception: > [root at LH20-GPFS1 ~]# mmmount /fs_gpfs01 > Tue Jul 4 11:14:18 IDT 2017: mmmount: Mounting file systems ... > mount: mount fs_gpfs01 on /fs_gpfs01 failed: Wrong medium type > mmmount: Command failed. Examine previous error messages to determine > cause. > > > What am i doing wrong ? > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From r.sobey at imperial.ac.uk Tue Jul 4 11:54:52 2017 From: r.sobey at imperial.ac.uk (Sobey, Richard A) Date: Tue, 4 Jul 2017 10:54:52 +0000 Subject: [gpfsug-discuss] Requirement to keep gpfs.smb the same on all nodes Message-ID: Hi all, For how long has this requirement been in force, and why? https://www.ibm.com/support/knowledgecenter/en/STXKQY_4.2.3/com.ibm.spectrum.scale.v4r23.doc/bl1ins_updatingsmb.htm All protocol nodes running the SMB service must have the same version of gpfs.smb installed at any time. This requires a brief outage of the SMB service to upgrade gpfs.smb to the newer version across all protocol nodes. The procedure outlined here is intended to reduce the outage to a minimum. Previously I've upgraded nodes one at a time over the course of a few days. Is the impact just that we won't be supported, or will a hole open up beneath my feet and swallow me whole? I really don't fancy the headache of getting approvals to get an outage of even 5 minutes at 6am.... Cheers Richard -------------- next part -------------- An HTML attachment was scrubbed... URL: From ilan84 at gmail.com Tue Jul 4 11:56:20 2017 From: ilan84 at gmail.com (Ilan Schwarts) Date: Tue, 4 Jul 2017 13:56:20 +0300 Subject: [gpfsug-discuss] Fail to mount file system In-Reply-To: References: Message-ID: [root at LH20-GPFS1 ~]# mmmount fs_gpfs01 -a Tue Jul 4 13:52:07 IDT 2017: mmmount: Mounting file systems ... LH20-GPFS1: mount: mount fs_gpfs01 on /fs_gpfs01 failed: Wrong medium type mmdsh: LH20-GPFS1 remote shell process had return code 32. LH20-GPFS2: mount: mount fs_gpfs01 on /fs_gpfs01 failed: Stale file handle mmdsh: LH20-GPFS2 remote shell process had return code 32. mmmount: Command failed. Examine previous error messages to determine cause. [root at LH20-GPFS1 ~]# mmmount -o rs /fs_gpfs01 mmmount: Mount point can not be a relative path name: rs [root at LH20-GPFS1 ~]# mmmount -o rs fs_gpfs01 mmmount: Mount point can not be a relative path name: rs I recieve in "dmesg": [ 18.338044] sd 2:0:0:1: [sdc] Attached SCSI disk [ 141.363422] hvt_cn_callback: unexpected netlink message! [ 141.366153] hvt_cn_callback: unexpected netlink message! [ 4479.292850] tracedev: loading out-of-tree module taints kernel. [ 4479.292888] tracedev: module verification failed: signature and/or required key missing - tainting kernel [ 4482.928413] ------------[ cut here ]------------ [ 4482.928445] WARNING: at fs/xfs/xfs_aops.c:906 xfs_do_writepage+0x537/0x550 [xfs]() [ 4482.928446] Modules linked in: mmfs26(OE) mmfslinux(OE) tracedev(OE) iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi ext4 mbcache jbd2 loop intel_powerclamp iosf_mbi sg pcspkr hv_utils i2c_piix4 i2c_core nfsd auth_rpcgss nfs_acl lockd grace sunrpc binfmt_misc ip_tables xfs libcrc32c sd_mod crc_t10dif crct10dif_generic crct10dif_common ata_generic pata_acpi hv_netvsc hyperv_keyboard hid_hyperv hv_storvsc hyperv_fb serio_raw fjes floppy libata hv_vmbus dm_mirror dm_region_hash dm_log dm_mod [ 4482.928471] CPU: 1 PID: 15210 Comm: mmfsd Tainted: G OE ------------ 3.10.0-514.21.2.el7.x86_64 #1 On Tue, Jul 4, 2017 at 11:36 AM, IBM Spectrum Scale wrote: > What exactly do you mean by "I have received existing corrupted GPFS 4.2.2 > lab"? > Is the file system corrupted ? Maybe this error is then due to file system > corruption. > > Can you once try: mmmount fs_gpfs01 -a > If this does not work then try: mmmount -o rs fs_gpfs01 > > Let me know which mount is working. > > Regards, The Spectrum Scale (GPFS) team > > ------------------------------------------------------------------------------------------------------------------ > If you feel that your question can benefit other users of Spectrum Scale > (GPFS), then please post it to the public IBM developerWroks Forum at > https://www.ibm.com/developerworks/community/forums/html/forum?id=11111111-0000-0000-0000-000000000479. > > If your query concerns a potential software error in Spectrum Scale (GPFS) > and you have an IBM software maintenance contract please contact > 1-800-237-5511 in the United States or your local IBM Service Center in > other countries. > > The forum is informally monitored as time permits and should not be used for > priority messages to the Spectrum Scale (GPFS) team. > > > > From: Ilan Schwarts > To: gpfsug-discuss at spectrumscale.org > Date: 07/04/2017 01:47 PM > Subject: [gpfsug-discuss] Fail to mount file system > Sent by: gpfsug-discuss-bounces at spectrumscale.org > ________________________________ > > > > Hi everyone, I have received existing corrupted GPFS 4.2.2 lab and I > am trying to make it work. > There are 2 nodes in a cluster: > [root at LH20-GPFS1 ~]# mmgetstate -a > > Node number Node name GPFS state > ------------------------------------------ > 1 LH20-GPFS1 active > 3 LH20-GPFS2 active > > The Cluster status is: > [root at LH20-GPFS1 ~]# mmlscluster > > GPFS cluster information > ======================== > GPFS cluster name: MyCluster.LH20-GPFS2 > GPFS cluster id: 10777108240438931454 > GPFS UID domain: MyCluster.LH20-GPFS2 > Remote shell command: /usr/bin/ssh > Remote file copy command: /usr/bin/scp > Repository type: CCR > > Node Daemon node name IP address Admin node name Designation > -------------------------------------------------------------------- > 1 LH20-GPFS1 10.10.158.61 LH20-GPFS1 quorum-manager > 3 LH20-GPFS2 10.10.158.62 LH20-GPFS2 > > There is a file system: > [root at LH20-GPFS1 ~]# mmlsnsd > > File system Disk name NSD servers > --------------------------------------------------------------------------- > fs_gpfs01 nynsd1 (directly attached) > fs_gpfs01 nynsd2 (directly attached) > > [root at LH20-GPFS1 ~]# > > On each Node, There is folder /fs_gpfs01 > The next step is to mount this fs_gpfs01 to be synced between the 2 nodes. > Whilte executing mmmount i get exception: > [root at LH20-GPFS1 ~]# mmmount /fs_gpfs01 > Tue Jul 4 11:14:18 IDT 2017: mmmount: Mounting file systems ... > mount: mount fs_gpfs01 on /fs_gpfs01 failed: Wrong medium type > mmmount: Command failed. Examine previous error messages to determine cause. > > > What am i doing wrong ? > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > -- - Ilan Schwarts From S.J.Thompson at bham.ac.uk Tue Jul 4 12:09:18 2017 From: S.J.Thompson at bham.ac.uk (Simon Thompson (IT Research Support)) Date: Tue, 4 Jul 2017 11:09:18 +0000 Subject: [gpfsug-discuss] Requirement to keep gpfs.smb the same on all nodes Message-ID: AFAIK. Always. We have had the service eat itself BTW by having different code releases and trying this. Yes its a PITA that we have to get a change approval for it (so we don't do it as often as we should)... The upgrade process upgrades the SMB registry, we have also seen the CTDB lock stuff break when they are not running the same code release, so now we just don't do this. Simon From: > on behalf of "Sobey, Richard A" > Reply-To: "gpfsug-discuss at spectrumscale.org" > Date: Tuesday, 4 July 2017 at 11:54 To: "gpfsug-discuss at spectrumscale.org" > Subject: [gpfsug-discuss] Requirement to keep gpfs.smb the same on all nodes Hi all, For how long has this requirement been in force, and why? https://www.ibm.com/support/knowledgecenter/en/STXKQY_4.2.3/com.ibm.spectrum.scale.v4r23.doc/bl1ins_updatingsmb.htm All protocol nodes running the SMB service must have the same version of gpfs.smb installed at any time. This requires a brief outage of the SMB service to upgrade gpfs.smb to the newer version across all protocol nodes. The procedure outlined here is intended to reduce the outage to a minimum. Previously I?ve upgraded nodes one at a time over the course of a few days. Is the impact just that we won?t be supported, or will a hole open up beneath my feet and swallow me whole? I really don?t fancy the headache of getting approvals to get an outage of even 5 minutes at 6am?. Cheers Richard -------------- next part -------------- An HTML attachment was scrubbed... URL: From r.sobey at imperial.ac.uk Tue Jul 4 12:12:10 2017 From: r.sobey at imperial.ac.uk (Sobey, Richard A) Date: Tue, 4 Jul 2017 11:12:10 +0000 Subject: [gpfsug-discuss] Requirement to keep gpfs.smb the same on all nodes In-Reply-To: References: Message-ID: OK Simon, thanks. I suppose we're all in the same boat having to get change management approval etc! From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Simon Thompson (IT Research Support) Sent: 04 July 2017 12:09 To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] Requirement to keep gpfs.smb the same on all nodes AFAIK. Always. We have had the service eat itself BTW by having different code releases and trying this. Yes its a PITA that we have to get a change approval for it (so we don't do it as often as we should)... The upgrade process upgrades the SMB registry, we have also seen the CTDB lock stuff break when they are not running the same code release, so now we just don't do this. Simon From: > on behalf of "Sobey, Richard A" > Reply-To: "gpfsug-discuss at spectrumscale.org" > Date: Tuesday, 4 July 2017 at 11:54 To: "gpfsug-discuss at spectrumscale.org" > Subject: [gpfsug-discuss] Requirement to keep gpfs.smb the same on all nodes Hi all, For how long has this requirement been in force, and why? https://www.ibm.com/support/knowledgecenter/en/STXKQY_4.2.3/com.ibm.spectrum.scale.v4r23.doc/bl1ins_updatingsmb.htm All protocol nodes running the SMB service must have the same version of gpfs.smb installed at any time. This requires a brief outage of the SMB service to upgrade gpfs.smb to the newer version across all protocol nodes. The procedure outlined here is intended to reduce the outage to a minimum. Previously I've upgraded nodes one at a time over the course of a few days. Is the impact just that we won't be supported, or will a hole open up beneath my feet and swallow me whole? I really don't fancy the headache of getting approvals to get an outage of even 5 minutes at 6am.... Cheers Richard -------------- next part -------------- An HTML attachment was scrubbed... URL: From scale at us.ibm.com Tue Jul 4 17:28:07 2017 From: scale at us.ibm.com (IBM Spectrum Scale) Date: Tue, 4 Jul 2017 21:58:07 +0530 Subject: [gpfsug-discuss] Fail to mount file system In-Reply-To: References: Message-ID: My bad gave the wrong command, the right one is: mmmount fs_gpfs01 -o rs Also can you send output of mmlsnsd -X, need to check device type of the NSDs. Are you ok with deleting the file system and disks and building everything from scratch? Regards, The Spectrum Scale (GPFS) team ------------------------------------------------------------------------------------------------------------------ If you feel that your question can benefit other users of Spectrum Scale (GPFS), then please post it to the public IBM developerWroks Forum at https://www.ibm.com/developerworks/community/forums/html/forum?id=11111111-0000-0000-0000-000000000479 . If your query concerns a potential software error in Spectrum Scale (GPFS) and you have an IBM software maintenance contract please contact 1-800-237-5511 in the United States or your local IBM Service Center in other countries. The forum is informally monitored as time permits and should not be used for priority messages to the Spectrum Scale (GPFS) team. From: Ilan Schwarts To: IBM Spectrum Scale Cc: gpfsug-discuss-bounces at spectrumscale.org, gpfsug main discussion list Date: 07/04/2017 04:26 PM Subject: Re: [gpfsug-discuss] Fail to mount file system [root at LH20-GPFS1 ~]# mmmount fs_gpfs01 -a Tue Jul 4 13:52:07 IDT 2017: mmmount: Mounting file systems ... LH20-GPFS1: mount: mount fs_gpfs01 on /fs_gpfs01 failed: Wrong medium type mmdsh: LH20-GPFS1 remote shell process had return code 32. LH20-GPFS2: mount: mount fs_gpfs01 on /fs_gpfs01 failed: Stale file handle mmdsh: LH20-GPFS2 remote shell process had return code 32. mmmount: Command failed. Examine previous error messages to determine cause. [root at LH20-GPFS1 ~]# mmmount -o rs /fs_gpfs01 mmmount: Mount point can not be a relative path name: rs [root at LH20-GPFS1 ~]# mmmount -o rs fs_gpfs01 mmmount: Mount point can not be a relative path name: rs I recieve in "dmesg": [ 18.338044] sd 2:0:0:1: [sdc] Attached SCSI disk [ 141.363422] hvt_cn_callback: unexpected netlink message! [ 141.366153] hvt_cn_callback: unexpected netlink message! [ 4479.292850] tracedev: loading out-of-tree module taints kernel. [ 4479.292888] tracedev: module verification failed: signature and/or required key missing - tainting kernel [ 4482.928413] ------------[ cut here ]------------ [ 4482.928445] WARNING: at fs/xfs/xfs_aops.c:906 xfs_do_writepage+0x537/0x550 [xfs]() [ 4482.928446] Modules linked in: mmfs26(OE) mmfslinux(OE) tracedev(OE) iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi ext4 mbcache jbd2 loop intel_powerclamp iosf_mbi sg pcspkr hv_utils i2c_piix4 i2c_core nfsd auth_rpcgss nfs_acl lockd grace sunrpc binfmt_misc ip_tables xfs libcrc32c sd_mod crc_t10dif crct10dif_generic crct10dif_common ata_generic pata_acpi hv_netvsc hyperv_keyboard hid_hyperv hv_storvsc hyperv_fb serio_raw fjes floppy libata hv_vmbus dm_mirror dm_region_hash dm_log dm_mod [ 4482.928471] CPU: 1 PID: 15210 Comm: mmfsd Tainted: G OE ------------ 3.10.0-514.21.2.el7.x86_64 #1 On Tue, Jul 4, 2017 at 11:36 AM, IBM Spectrum Scale wrote: > What exactly do you mean by "I have received existing corrupted GPFS 4.2.2 > lab"? > Is the file system corrupted ? Maybe this error is then due to file system > corruption. > > Can you once try: mmmount fs_gpfs01 -a > If this does not work then try: mmmount -o rs fs_gpfs01 > > Let me know which mount is working. > > Regards, The Spectrum Scale (GPFS) team > > ------------------------------------------------------------------------------------------------------------------ > If you feel that your question can benefit other users of Spectrum Scale > (GPFS), then please post it to the public IBM developerWroks Forum at > https://www.ibm.com/developerworks/community/forums/html/forum?id=11111111-0000-0000-0000-000000000479 . > > If your query concerns a potential software error in Spectrum Scale (GPFS) > and you have an IBM software maintenance contract please contact > 1-800-237-5511 in the United States or your local IBM Service Center in > other countries. > > The forum is informally monitored as time permits and should not be used for > priority messages to the Spectrum Scale (GPFS) team. > > > > From: Ilan Schwarts > To: gpfsug-discuss at spectrumscale.org > Date: 07/04/2017 01:47 PM > Subject: [gpfsug-discuss] Fail to mount file system > Sent by: gpfsug-discuss-bounces at spectrumscale.org > ________________________________ > > > > Hi everyone, I have received existing corrupted GPFS 4.2.2 lab and I > am trying to make it work. > There are 2 nodes in a cluster: > [root at LH20-GPFS1 ~]# mmgetstate -a > > Node number Node name GPFS state > ------------------------------------------ > 1 LH20-GPFS1 active > 3 LH20-GPFS2 active > > The Cluster status is: > [root at LH20-GPFS1 ~]# mmlscluster > > GPFS cluster information > ======================== > GPFS cluster name: MyCluster.LH20-GPFS2 > GPFS cluster id: 10777108240438931454 > GPFS UID domain: MyCluster.LH20-GPFS2 > Remote shell command: /usr/bin/ssh > Remote file copy command: /usr/bin/scp > Repository type: CCR > > Node Daemon node name IP address Admin node name Designation > -------------------------------------------------------------------- > 1 LH20-GPFS1 10.10.158.61 LH20-GPFS1 quorum-manager > 3 LH20-GPFS2 10.10.158.62 LH20-GPFS2 > > There is a file system: > [root at LH20-GPFS1 ~]# mmlsnsd > > File system Disk name NSD servers > --------------------------------------------------------------------------- > fs_gpfs01 nynsd1 (directly attached) > fs_gpfs01 nynsd2 (directly attached) > > [root at LH20-GPFS1 ~]# > > On each Node, There is folder /fs_gpfs01 > The next step is to mount this fs_gpfs01 to be synced between the 2 nodes. > Whilte executing mmmount i get exception: > [root at LH20-GPFS1 ~]# mmmount /fs_gpfs01 > Tue Jul 4 11:14:18 IDT 2017: mmmount: Mounting file systems ... > mount: mount fs_gpfs01 on /fs_gpfs01 failed: Wrong medium type > mmmount: Command failed. Examine previous error messages to determine cause. > > > What am i doing wrong ? > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > -- - Ilan Schwarts -------------- next part -------------- An HTML attachment was scrubbed... URL: From ilan84 at gmail.com Tue Jul 4 17:46:17 2017 From: ilan84 at gmail.com (Ilan Schwarts) Date: Tue, 4 Jul 2017 19:46:17 +0300 Subject: [gpfsug-discuss] Fail to mount file system In-Reply-To: References: Message-ID: Yes I am ok with deleting. I follow a guide from john olsen at the ibm team from tuscon.. but the guide had steps after the gpfs setup... Is there step by step guide for gpfs cluster setup other than the one in the ibm site? Thank My bad gave the wrong command, the right one is: mmmount fs_gpfs01-o rs Also can you send output of mmlsnsd -X, need to check device type of the NSDs. Are you ok with deleting the file system and disks and building everything from scratch? Regards, The Spectrum Scale (GPFS) team ------------------------------------------------------------ ------------------------------------------------------ If you feel that your question can benefit other users of Spectrum Scale (GPFS), then please post it to the public IBM developerWroks Forum at https://www.ibm.com/developerworks/community/forums/html/forum?id=11111111- 0000-0000-0000-000000000479. If your query concerns a potential software error in Spectrum Scale (GPFS) and you have an IBM software maintenance contract please contact 1-800-237-5511 in the United States or your local IBM Service Center in other countries. The forum is informally monitored as time permits and should not be used for priority messages to the Spectrum Scale (GPFS) team. From: Ilan Schwarts To: IBM Spectrum Scale Cc: gpfsug-discuss-bounces at spectrumscale.org, gpfsug main discussion list Date: 07/04/2017 04:26 PM Subject: Re: [gpfsug-discuss] Fail to mount file system ------------------------------ [root at LH20-GPFS1 ~]# mmmount fs_gpfs01 -a Tue Jul 4 13:52:07 IDT 2017: mmmount: Mounting file systems ... LH20-GPFS1: mount: mount fs_gpfs01 on /fs_gpfs01 failed: Wrong medium type mmdsh: LH20-GPFS1 remote shell process had return code 32. LH20-GPFS2: mount: mount fs_gpfs01 on /fs_gpfs01 failed: Stale file handle mmdsh: LH20-GPFS2 remote shell process had return code 32. mmmount: Command failed. Examine previous error messages to determine cause. [root at LH20-GPFS1 ~]# mmmount -o rs /fs_gpfs01 mmmount: Mount point can not be a relative path name: rs [root at LH20-GPFS1 ~]# mmmount -o rs fs_gpfs01 mmmount: Mount point can not be a relative path name: rs I recieve in "dmesg": [ 18.338044] sd 2:0:0:1: [sdc] Attached SCSI disk [ 141.363422] hvt_cn_callback: unexpected netlink message! [ 141.366153] hvt_cn_callback: unexpected netlink message! [ 4479.292850] tracedev: loading out-of-tree module taints kernel. [ 4479.292888] tracedev: module verification failed: signature and/or required key missing - tainting kernel [ 4482.928413] ------------[ cut here ]------------ [ 4482.928445] WARNING: at fs/xfs/xfs_aops.c:906 xfs_do_writepage+0x537/0x550 [xfs]() [ 4482.928446] Modules linked in: mmfs26(OE) mmfslinux(OE) tracedev(OE) iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi ext4 mbcache jbd2 loop intel_powerclamp iosf_mbi sg pcspkr hv_utils i2c_piix4 i2c_core nfsd auth_rpcgss nfs_acl lockd grace sunrpc binfmt_misc ip_tables xfs libcrc32c sd_mod crc_t10dif crct10dif_generic crct10dif_common ata_generic pata_acpi hv_netvsc hyperv_keyboard hid_hyperv hv_storvsc hyperv_fb serio_raw fjes floppy libata hv_vmbus dm_mirror dm_region_hash dm_log dm_mod [ 4482.928471] CPU: 1 PID: 15210 Comm: mmfsd Tainted: G OE ------------ 3.10.0-514.21.2.el7.x86_64 #1 On Tue, Jul 4, 2017 at 11:36 AM, IBM Spectrum Scale wrote: > What exactly do you mean by "I have received existing corrupted GPFS 4.2.2 > lab"? > Is the file system corrupted ? Maybe this error is then due to file system > corruption. > > Can you once try: mmmount fs_gpfs01 -a > If this does not work then try: mmmount -o rs fs_gpfs01 > > Let me know which mount is working. > > Regards, The Spectrum Scale (GPFS) team > > ------------------------------------------------------------ ------------------------------------------------------ > If you feel that your question can benefit other users of Spectrum Scale > (GPFS), then please post it to the public IBM developerWroks Forum at > https://www.ibm.com/developerworks/community/ forums/html/forum?id=11111111-0000-0000-0000-000000000479. > > If your query concerns a potential software error in Spectrum Scale (GPFS) > and you have an IBM software maintenance contract please contact > 1-800-237-5511 in the United States or your local IBM Service Center in > other countries. > > The forum is informally monitored as time permits and should not be used for > priority messages to the Spectrum Scale (GPFS) team. > > > > From: Ilan Schwarts > To: gpfsug-discuss at spectrumscale.org > Date: 07/04/2017 01:47 PM > Subject: [gpfsug-discuss] Fail to mount file system > Sent by: gpfsug-discuss-bounces at spectrumscale.org > ________________________________ > > > > Hi everyone, I have received existing corrupted GPFS 4.2.2 lab and I > am trying to make it work. > There are 2 nodes in a cluster: > [root at LH20-GPFS1 ~]# mmgetstate -a > > Node number Node name GPFS state > ------------------------------------------ > 1 LH20-GPFS1 active > 3 LH20-GPFS2 active > > The Cluster status is: > [root at LH20-GPFS1 ~]# mmlscluster > > GPFS cluster information > ======================== > GPFS cluster name: MyCluster.LH20-GPFS2 > GPFS cluster id: 10777108240438931454 > GPFS UID domain: MyCluster.LH20-GPFS2 > Remote shell command: /usr/bin/ssh > Remote file copy command: /usr/bin/scp > Repository type: CCR > > Node Daemon node name IP address Admin node name Designation > -------------------------------------------------------------------- > 1 LH20-GPFS1 10.10.158.61 LH20-GPFS1 quorum-manager > 3 LH20-GPFS2 10.10.158.62 LH20-GPFS2 > > There is a file system: > [root at LH20-GPFS1 ~]# mmlsnsd > > File system Disk name NSD servers > ------------------------------------------------------------ --------------- > fs_gpfs01 nynsd1 (directly attached) > fs_gpfs01 nynsd2 (directly attached) > > [root at LH20-GPFS1 ~]# > > On each Node, There is folder /fs_gpfs01 > The next step is to mount this fs_gpfs01 to be synced between the 2 nodes. > Whilte executing mmmount i get exception: > [root at LH20-GPFS1 ~]# mmmount /fs_gpfs01 > Tue Jul 4 11:14:18 IDT 2017: mmmount: Mounting file systems ... > mount: mount fs_gpfs01 on /fs_gpfs01 failed: Wrong medium type > mmmount: Command failed. Examine previous error messages to determine cause. > > > What am i doing wrong ? > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > -- - Ilan Schwarts -------------- next part -------------- An HTML attachment was scrubbed... URL: From jcatana at gmail.com Tue Jul 4 17:47:09 2017 From: jcatana at gmail.com (Josh Catana) Date: Tue, 4 Jul 2017 12:47:09 -0400 Subject: [gpfsug-discuss] Fail to mount file system In-Reply-To: References: Message-ID: Check /var/adm/ras/mmfs.log.latest The dmesg xfs bug is probably from boot if you look at the dmesg with -T to show the timestamp On Jul 4, 2017 12:29 PM, "IBM Spectrum Scale" wrote: > My bad gave the wrong command, the right one is: mmmount fs_gpfs01-o rs > > Also can you send output of mmlsnsd -X, need to check device type of the > NSDs. > > Are you ok with deleting the file system and disks and building everything > from scratch? > > > Regards, The Spectrum Scale (GPFS) team > > ------------------------------------------------------------ > ------------------------------------------------------ > If you feel that your question can benefit other users of Spectrum Scale > (GPFS), then please post it to the public IBM developerWroks Forum at > https://www.ibm.com/developerworks/community/ > forums/html/forum?id=11111111-0000-0000-0000-000000000479. > > If your query concerns a potential software error in Spectrum Scale (GPFS) > and you have an IBM software maintenance contract please contact > 1-800-237-5511 <(800)%20237-5511> in the United States or your local IBM > Service Center in other countries. > > The forum is informally monitored as time permits and should not be used > for priority messages to the Spectrum Scale (GPFS) team. > > > > From: Ilan Schwarts > To: IBM Spectrum Scale > Cc: gpfsug-discuss-bounces at spectrumscale.org, gpfsug main > discussion list > Date: 07/04/2017 04:26 PM > Subject: Re: [gpfsug-discuss] Fail to mount file system > ------------------------------ > > > > [root at LH20-GPFS1 ~]# mmmount fs_gpfs01 -a > Tue Jul 4 13:52:07 IDT 2017: mmmount: Mounting file systems ... > LH20-GPFS1: mount: mount fs_gpfs01 on /fs_gpfs01 failed: Wrong medium type > mmdsh: LH20-GPFS1 remote shell process had return code 32. > LH20-GPFS2: mount: mount fs_gpfs01 on /fs_gpfs01 failed: Stale file handle > mmdsh: LH20-GPFS2 remote shell process had return code 32. > mmmount: Command failed. Examine previous error messages to determine > cause. > > [root at LH20-GPFS1 ~]# mmmount -o rs /fs_gpfs01 > mmmount: Mount point can not be a relative path name: rs > [root at LH20-GPFS1 ~]# mmmount -o rs fs_gpfs01 > mmmount: Mount point can not be a relative path name: rs > > > > I recieve in "dmesg": > > [ 18.338044] sd 2:0:0:1: [sdc] Attached SCSI disk > [ 141.363422] hvt_cn_callback: unexpected netlink message! > [ 141.366153] hvt_cn_callback: unexpected netlink message! > [ 4479.292850] tracedev: loading out-of-tree module taints kernel. > [ 4479.292888] tracedev: module verification failed: signature and/or > required key missing - tainting kernel > [ 4482.928413] ------------[ cut here ]------------ > [ 4482.928445] WARNING: at fs/xfs/xfs_aops.c:906 > xfs_do_writepage+0x537/0x550 [xfs]() > [ 4482.928446] Modules linked in: mmfs26(OE) mmfslinux(OE) > tracedev(OE) iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi ext4 > mbcache jbd2 loop intel_powerclamp iosf_mbi sg pcspkr hv_utils > i2c_piix4 i2c_core nfsd auth_rpcgss nfs_acl lockd grace sunrpc > binfmt_misc ip_tables xfs libcrc32c sd_mod crc_t10dif > crct10dif_generic crct10dif_common ata_generic pata_acpi hv_netvsc > hyperv_keyboard hid_hyperv hv_storvsc hyperv_fb serio_raw fjes floppy > libata hv_vmbus dm_mirror dm_region_hash dm_log dm_mod > [ 4482.928471] CPU: 1 PID: 15210 Comm: mmfsd Tainted: G OE > ------------ 3.10.0-514.21.2.el7.x86_64 #1 > > On Tue, Jul 4, 2017 at 11:36 AM, IBM Spectrum Scale > wrote: > > What exactly do you mean by "I have received existing corrupted GPFS > 4.2.2 > > lab"? > > Is the file system corrupted ? Maybe this error is then due to file > system > > corruption. > > > > Can you once try: mmmount fs_gpfs01 -a > > If this does not work then try: mmmount -o rs fs_gpfs01 > > > > Let me know which mount is working. > > > > Regards, The Spectrum Scale (GPFS) team > > > > ------------------------------------------------------------ > ------------------------------------------------------ > > If you feel that your question can benefit other users of Spectrum Scale > > (GPFS), then please post it to the public IBM developerWroks Forum at > > https://www.ibm.com/developerworks/community/ > forums/html/forum?id=11111111-0000-0000-0000-000000000479. > > > > If your query concerns a potential software error in Spectrum Scale > (GPFS) > > and you have an IBM software maintenance contract please contact > > 1-800-237-5511 <(800)%20237-5511> in the United States or your local > IBM Service Center in > > other countries. > > > > The forum is informally monitored as time permits and should not be used > for > > priority messages to the Spectrum Scale (GPFS) team. > > > > > > > > From: Ilan Schwarts > > To: gpfsug-discuss at spectrumscale.org > > Date: 07/04/2017 01:47 PM > > Subject: [gpfsug-discuss] Fail to mount file system > > Sent by: gpfsug-discuss-bounces at spectrumscale.org > > ________________________________ > > > > > > > > Hi everyone, I have received existing corrupted GPFS 4.2.2 lab and I > > am trying to make it work. > > There are 2 nodes in a cluster: > > [root at LH20-GPFS1 ~]# mmgetstate -a > > > > Node number Node name GPFS state > > ------------------------------------------ > > 1 LH20-GPFS1 active > > 3 LH20-GPFS2 active > > > > The Cluster status is: > > [root at LH20-GPFS1 ~]# mmlscluster > > > > GPFS cluster information > > ======================== > > GPFS cluster name: MyCluster.LH20-GPFS2 > > GPFS cluster id: 10777108240438931454 > > GPFS UID domain: MyCluster.LH20-GPFS2 > > Remote shell command: /usr/bin/ssh > > Remote file copy command: /usr/bin/scp > > Repository type: CCR > > > > Node Daemon node name IP address Admin node name Designation > > -------------------------------------------------------------------- > > 1 LH20-GPFS1 10.10.158.61 LH20-GPFS1 quorum-manager > > 3 LH20-GPFS2 10.10.158.62 LH20-GPFS2 > > > > There is a file system: > > [root at LH20-GPFS1 ~]# mmlsnsd > > > > File system Disk name NSD servers > > ------------------------------------------------------------ > --------------- > > fs_gpfs01 nynsd1 (directly attached) > > fs_gpfs01 nynsd2 (directly attached) > > > > [root at LH20-GPFS1 ~]# > > > > On each Node, There is folder /fs_gpfs01 > > The next step is to mount this fs_gpfs01 to be synced between the 2 > nodes. > > Whilte executing mmmount i get exception: > > [root at LH20-GPFS1 ~]# mmmount /fs_gpfs01 > > Tue Jul 4 11:14:18 IDT 2017: mmmount: Mounting file systems ... > > mount: mount fs_gpfs01 on /fs_gpfs01 failed: Wrong medium type > > mmmount: Command failed. Examine previous error messages to determine > cause. > > > > > > What am i doing wrong ? > > _______________________________________________ > > gpfsug-discuss mailing list > > gpfsug-discuss at spectrumscale.org > > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > > > > > > > > > -- > > > - > Ilan Schwarts > > > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From scale at us.ibm.com Tue Jul 4 19:15:49 2017 From: scale at us.ibm.com (IBM Spectrum Scale) Date: Tue, 4 Jul 2017 23:45:49 +0530 Subject: [gpfsug-discuss] Fail to mount file system In-Reply-To: References: Message-ID: You can refer to the concepts, planning and installation guide at the link ( https://www.ibm.com/support/knowledgecenter/STXKQY_4.2.2/com.ibm.spectrum.scale.v4r22.doc/bl1xx_library_prodoc.htm ) for finding detailed steps on setting up a cluster or creating a file system. Or open a PMR and work with IBM support to set it up. In your case (just as an example) you can use the below simple steps to delete and recreate the file system: 1) To delete file system and NSDs: a) Unmount file system - mmumount -a b) Delete file system - mmdelfs c) Delete NSDs - mmdelnsd "nynsd1;nynsd2" 2) To create file system with both disks in one system pool and having dataAndMetadata and data and metadata replica and directly attached to the nodes, you can use following steps: a) Create a /tmp/nsd file and fill it up with below information :::dataAndMetadata:1:nynsd1:system :::dataAndMetadata:2:nynsd2:system b) Use mmcrnsd -F /tmp/nsd to create NSDs c) Create file system using (just an example with assumptions on config) - mmcrfs /dev/fs_gpfs01 -F /tmp/nsd -A yes -B 256K -n 32 -m 2 -r 2 -T /fs_gpfs01 You can refer to above guide for configuring it in other ways as you want. If you have any issues with these steps you can raise PMR and follow proper channel to setup file system as well. Regards, The Spectrum Scale (GPFS) team ------------------------------------------------------------------------------------------------------------------ If you feel that your question can benefit other users of Spectrum Scale (GPFS), then please post it to the public IBM developerWroks Forum at https://www.ibm.com/developerworks/community/forums/html/forum?id=11111111-0000-0000-0000-000000000479 . If your query concerns a potential software error in Spectrum Scale (GPFS) and you have an IBM software maintenance contract please contact 1-800-237-5511 in the United States or your local IBM Service Center in other countries. The forum is informally monitored as time permits and should not be used for priority messages to the Spectrum Scale (GPFS) team. From: Ilan Schwarts To: IBM Spectrum Scale Cc: gpfsug-discuss-bounces at spectrumscale.org, gpfsug main discussion list Date: 07/04/2017 10:16 PM Subject: Re: [gpfsug-discuss] Fail to mount file system Yes I am ok with deleting. I follow a guide from john olsen at the ibm team from tuscon.. but the guide had steps after the gpfs setup... Is there step by step guide for gpfs cluster setup other than the one in the ibm site? Thank My bad gave the wrong command, the right one is: mmmount fs_gpfs01-o rs Also can you send output of mmlsnsd -X, need to check device type of the NSDs. Are you ok with deleting the file system and disks and building everything from scratch? Regards, The Spectrum Scale (GPFS) team ------------------------------------------------------------------------------------------------------------------ If you feel that your question can benefit other users of Spectrum Scale (GPFS), then please post it to the public IBM developerWroks Forum at https://www.ibm.com/developerworks/community/forums/html/forum?id=11111111-0000-0000-0000-000000000479 . If your query concerns a potential software error in Spectrum Scale (GPFS) and you have an IBM software maintenance contract please contact 1-800-237-5511 in the United States or your local IBM Service Center in other countries. The forum is informally monitored as time permits and should not be used for priority messages to the Spectrum Scale (GPFS) team. From: Ilan Schwarts To: IBM Spectrum Scale Cc: gpfsug-discuss-bounces at spectrumscale.org, gpfsug main discussion list Date: 07/04/2017 04:26 PM Subject: Re: [gpfsug-discuss] Fail to mount file system [root at LH20-GPFS1 ~]# mmmount fs_gpfs01 -a Tue Jul 4 13:52:07 IDT 2017: mmmount: Mounting file systems ... LH20-GPFS1: mount: mount fs_gpfs01 on /fs_gpfs01 failed: Wrong medium type mmdsh: LH20-GPFS1 remote shell process had return code 32. LH20-GPFS2: mount: mount fs_gpfs01 on /fs_gpfs01 failed: Stale file handle mmdsh: LH20-GPFS2 remote shell process had return code 32. mmmount: Command failed. Examine previous error messages to determine cause. [root at LH20-GPFS1 ~]# mmmount -o rs /fs_gpfs01 mmmount: Mount point can not be a relative path name: rs [root at LH20-GPFS1 ~]# mmmount -o rs fs_gpfs01 mmmount: Mount point can not be a relative path name: rs I recieve in "dmesg": [ 18.338044] sd 2:0:0:1: [sdc] Attached SCSI disk [ 141.363422] hvt_cn_callback: unexpected netlink message! [ 141.366153] hvt_cn_callback: unexpected netlink message! [ 4479.292850] tracedev: loading out-of-tree module taints kernel. [ 4479.292888] tracedev: module verification failed: signature and/or required key missing - tainting kernel [ 4482.928413] ------------[ cut here ]------------ [ 4482.928445] WARNING: at fs/xfs/xfs_aops.c:906 xfs_do_writepage+0x537/0x550 [xfs]() [ 4482.928446] Modules linked in: mmfs26(OE) mmfslinux(OE) tracedev(OE) iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi ext4 mbcache jbd2 loop intel_powerclamp iosf_mbi sg pcspkr hv_utils i2c_piix4 i2c_core nfsd auth_rpcgss nfs_acl lockd grace sunrpc binfmt_misc ip_tables xfs libcrc32c sd_mod crc_t10dif crct10dif_generic crct10dif_common ata_generic pata_acpi hv_netvsc hyperv_keyboard hid_hyperv hv_storvsc hyperv_fb serio_raw fjes floppy libata hv_vmbus dm_mirror dm_region_hash dm_log dm_mod [ 4482.928471] CPU: 1 PID: 15210 Comm: mmfsd Tainted: G OE ------------ 3.10.0-514.21.2.el7.x86_64 #1 On Tue, Jul 4, 2017 at 11:36 AM, IBM Spectrum Scale wrote: > What exactly do you mean by "I have received existing corrupted GPFS 4.2.2 > lab"? > Is the file system corrupted ? Maybe this error is then due to file system > corruption. > > Can you once try: mmmount fs_gpfs01 -a > If this does not work then try: mmmount -o rs fs_gpfs01 > > Let me know which mount is working. > > Regards, The Spectrum Scale (GPFS) team > > ------------------------------------------------------------------------------------------------------------------ > If you feel that your question can benefit other users of Spectrum Scale > (GPFS), then please post it to the public IBM developerWroks Forum at > https://www.ibm.com/developerworks/community/forums/html/forum?id=11111111-0000-0000-0000-000000000479 . > > If your query concerns a potential software error in Spectrum Scale (GPFS) > and you have an IBM software maintenance contract please contact > 1-800-237-5511 in the United States or your local IBM Service Center in > other countries. > > The forum is informally monitored as time permits and should not be used for > priority messages to the Spectrum Scale (GPFS) team. > > > > From: Ilan Schwarts > To: gpfsug-discuss at spectrumscale.org > Date: 07/04/2017 01:47 PM > Subject: [gpfsug-discuss] Fail to mount file system > Sent by: gpfsug-discuss-bounces at spectrumscale.org > ________________________________ > > > > Hi everyone, I have received existing corrupted GPFS 4.2.2 lab and I > am trying to make it work. > There are 2 nodes in a cluster: > [root at LH20-GPFS1 ~]# mmgetstate -a > > Node number Node name GPFS state > ------------------------------------------ > 1 LH20-GPFS1 active > 3 LH20-GPFS2 active > > The Cluster status is: > [root at LH20-GPFS1 ~]# mmlscluster > > GPFS cluster information > ======================== > GPFS cluster name: MyCluster.LH20-GPFS2 > GPFS cluster id: 10777108240438931454 > GPFS UID domain: MyCluster.LH20-GPFS2 > Remote shell command: /usr/bin/ssh > Remote file copy command: /usr/bin/scp > Repository type: CCR > > Node Daemon node name IP address Admin node name Designation > -------------------------------------------------------------------- > 1 LH20-GPFS1 10.10.158.61 LH20-GPFS1 quorum-manager > 3 LH20-GPFS2 10.10.158.62 LH20-GPFS2 > > There is a file system: > [root at LH20-GPFS1 ~]# mmlsnsd > > File system Disk name NSD servers > --------------------------------------------------------------------------- > fs_gpfs01 nynsd1 (directly attached) > fs_gpfs01 nynsd2 (directly attached) > > [root at LH20-GPFS1 ~]# > > On each Node, There is folder /fs_gpfs01 > The next step is to mount this fs_gpfs01 to be synced between the 2 nodes. > Whilte executing mmmount i get exception: > [root at LH20-GPFS1 ~]# mmmount /fs_gpfs01 > Tue Jul 4 11:14:18 IDT 2017: mmmount: Mounting file systems ... > mount: mount fs_gpfs01 on /fs_gpfs01 failed: Wrong medium type > mmmount: Command failed. Examine previous error messages to determine cause. > > > What am i doing wrong ? > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > -- - Ilan Schwarts -------------- next part -------------- An HTML attachment was scrubbed... URL: From ilan84 at gmail.com Wed Jul 5 08:02:19 2017 From: ilan84 at gmail.com (Ilan Schwarts) Date: Wed, 5 Jul 2017 10:02:19 +0300 Subject: [gpfsug-discuss] Fail to mount file system In-Reply-To: References: Message-ID: Hi, [root at LH20-GPFS2 ~]# mmlsnsd -X Disk name NSD volume ID Device Devtype Node name Remarks --------------------------------------------------------------------------------------------------- nynsd1 0A0A9E3D594D5CA8 - - LH20-GPFS2 (not found) directly attached nynsd2 0A0A9E3D594D5CA9 - - LH20-GPFS2 (not found) directly attached mmmount failed with -o rs root at LH20-GPFS2 ~]# mmmount fs_gpfs01 -o rs Wed Jul 5 09:58:29 IDT 2017: mmmount: Mounting file systems ... mount: mount fs_gpfs01 on /fs_gpfs01 failed: Wrong medium type mmmount: Command failed. Examine previous error messages to determine cause. and in logs /var/adm/ras/mmfs.log.latest: 2017-07-05_09:58:30.009+0300: [I] Command: mount fs_gpfs01 2017-07-05_09:58:30.890+0300: Failed to open fs_gpfs01. 2017-07-05_09:58:30.890+0300: Wrong medium type 2017-07-05_09:58:30.890+0300: [E] Failed to open fs_gpfs01. 2017-07-05_09:58:30.890+0300: [W] Command: err 48: mount fs_gpfs01 From scale at us.ibm.com Wed Jul 5 08:44:19 2017 From: scale at us.ibm.com (IBM Spectrum Scale) Date: Wed, 5 Jul 2017 13:14:19 +0530 Subject: [gpfsug-discuss] Fail to mount file system In-Reply-To: References: Message-ID: >From mmlsnsd output can see that the disks are not found by gpfs (maybe some connection issue or they have been changed/removed from backend) Please open a PMR and work with IBM support to resolve this. Regards, The Spectrum Scale (GPFS) team ------------------------------------------------------------------------------------------------------------------ If you feel that your question can benefit other users of Spectrum Scale (GPFS), then please post it to the public IBM developerWroks Forum at https://www.ibm.com/developerworks/community/forums/html/forum?id=11111111-0000-0000-0000-000000000479 . If your query concerns a potential software error in Spectrum Scale (GPFS) and you have an IBM software maintenance contract please contact 1-800-237-5511 in the United States or your local IBM Service Center in other countries. The forum is informally monitored as time permits and should not be used for priority messages to the Spectrum Scale (GPFS) team. From: Ilan Schwarts To: IBM Spectrum Scale Cc: gpfsug main discussion list , gpfsug-discuss-bounces at spectrumscale.org Date: 07/05/2017 12:32 PM Subject: Re: [gpfsug-discuss] Fail to mount file system Hi, [root at LH20-GPFS2 ~]# mmlsnsd -X Disk name NSD volume ID Device Devtype Node name Remarks --------------------------------------------------------------------------------------------------- nynsd1 0A0A9E3D594D5CA8 - - LH20-GPFS2 (not found) directly attached nynsd2 0A0A9E3D594D5CA9 - - LH20-GPFS2 (not found) directly attached mmmount failed with -o rs root at LH20-GPFS2 ~]# mmmount fs_gpfs01 -o rs Wed Jul 5 09:58:29 IDT 2017: mmmount: Mounting file systems ... mount: mount fs_gpfs01 on /fs_gpfs01 failed: Wrong medium type mmmount: Command failed. Examine previous error messages to determine cause. and in logs /var/adm/ras/mmfs.log.latest: 2017-07-05_09:58:30.009+0300: [I] Command: mount fs_gpfs01 2017-07-05_09:58:30.890+0300: Failed to open fs_gpfs01. 2017-07-05_09:58:30.890+0300: Wrong medium type 2017-07-05_09:58:30.890+0300: [E] Failed to open fs_gpfs01. 2017-07-05_09:58:30.890+0300: [W] Command: err 48: mount fs_gpfs01 -------------- next part -------------- An HTML attachment was scrubbed... URL: From UWEFALKE at de.ibm.com Wed Jul 5 09:00:23 2017 From: UWEFALKE at de.ibm.com (Uwe Falke) Date: Wed, 5 Jul 2017 10:00:23 +0200 Subject: [gpfsug-discuss] Fail to mount file system In-Reply-To: References: Message-ID: Hi, maybe you need to specify your NSDs via the nsddevices user exit (Identifies local physical devices that are used as GPFS Network Shared Disks (NSDs).). script to list the NSDs , place it under /var/mmfs/etc/nsddevices. There is a template under /usr/lpp/mmfs/samples/nsddevices.sample which should provide the necessary details. Mit freundlichen Gr??en / Kind regards Dr. Uwe Falke IT Specialist High Performance Computing Services / Integrated Technology Services / Data Center Services ------------------------------------------------------------------------------------------------------------------------------------------- IBM Deutschland Rathausstr. 7 09111 Chemnitz Phone: +49 371 6978 2165 Mobile: +49 175 575 2877 E-Mail: uwefalke at de.ibm.com ------------------------------------------------------------------------------------------------------------------------------------------- IBM Deutschland Business & Technology Services GmbH / Gesch?ftsf?hrung: Andreas Hasse, Thomas Wolter Sitz der Gesellschaft: Ehningen / Registergericht: Amtsgericht Stuttgart, HRB 17122 From ilan84 at gmail.com Wed Jul 5 13:12:14 2017 From: ilan84 at gmail.com (Ilan Schwarts) Date: Wed, 5 Jul 2017 15:12:14 +0300 Subject: [gpfsug-discuss] update smb package ? Message-ID: Hi, while trying to enable SMB service i receive the following root at LH20-GPFS1 ~]# mmces service enable smb LH20-GPFS1: Cannot enable SMB service on LH20-GPFS1 LH20-GPFS1: mmcesop: Prerequisite libraries not found or correct version not LH20-GPFS1: installed. Ensure gpfs.smb is properly installed. LH20-GPFS1: mmcesop: Command failed. Examine previous error messages to determine cause. mmdsh: LH20-GPFS1 remote shell process had return code 1. Do i use normal yum update ? how to solve this issue ? Thanks From ilan84 at gmail.com Wed Jul 5 13:18:54 2017 From: ilan84 at gmail.com (Ilan Schwarts) Date: Wed, 5 Jul 2017 15:18:54 +0300 Subject: [gpfsug-discuss] Fwd: update smb package ? In-Reply-To: References: Message-ID: [root at LH20-GPFS1 ~]# rpm -qa | grep gpfs gpfs.ext-4.2.2-0.x86_64 gpfs.msg.en_US-4.2.2-0.noarch gpfs.gui-4.2.2-0.noarch gpfs.gpl-4.2.2-0.noarch gpfs.gskit-8.0.50-57.x86_64 gpfs.gss.pmsensors-4.2.2-0.el7.x86_64 gpfs.adv-4.2.2-0.x86_64 gpfs.java-4.2.2-0.x86_64 gpfs.gss.pmcollector-4.2.2-0.el7.x86_64 gpfs.base-4.2.2-0.x86_64 gpfs.crypto-4.2.2-0.x86_64 [root at LH20-GPFS1 ~]# uname -a Linux LH20-GPFS1.LH20.com 3.10.0-514.21.2.el7.x86_64 #1 SMP Tue Jun 20 12:24:47 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux [root at LH20-GPFS1 ~]# From r.sobey at imperial.ac.uk Wed Jul 5 13:23:10 2017 From: r.sobey at imperial.ac.uk (Sobey, Richard A) Date: Wed, 5 Jul 2017 12:23:10 +0000 Subject: [gpfsug-discuss] Fwd: update smb package ? In-Reply-To: References: Message-ID: You don't have the gpfs.smb package installed. Yum install gpfs.smb Or install the package manually from /usr/lpp/mmfs//smb_rpms [root at ces ~]# rpm -qa | grep gpfs gpfs.smb-4.3.11_gpfs_21-8.el7.x86_64 -----Original Message----- From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Ilan Schwarts Sent: 05 July 2017 13:19 To: gpfsug main discussion list Subject: [gpfsug-discuss] Fwd: update smb package ? [root at LH20-GPFS1 ~]# rpm -qa | grep gpfs gpfs.ext-4.2.2-0.x86_64 gpfs.msg.en_US-4.2.2-0.noarch gpfs.gui-4.2.2-0.noarch gpfs.gpl-4.2.2-0.noarch gpfs.gskit-8.0.50-57.x86_64 gpfs.gss.pmsensors-4.2.2-0.el7.x86_64 gpfs.adv-4.2.2-0.x86_64 gpfs.java-4.2.2-0.x86_64 gpfs.gss.pmcollector-4.2.2-0.el7.x86_64 gpfs.base-4.2.2-0.x86_64 gpfs.crypto-4.2.2-0.x86_64 [root at LH20-GPFS1 ~]# uname -a Linux LH20-GPFS1.LH20.com 3.10.0-514.21.2.el7.x86_64 #1 SMP Tue Jun 20 12:24:47 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux [root at LH20-GPFS1 ~]# _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From ilan84 at gmail.com Wed Jul 5 13:29:11 2017 From: ilan84 at gmail.com (Ilan Schwarts) Date: Wed, 5 Jul 2017 15:29:11 +0300 Subject: [gpfsug-discuss] Fwd: update smb package ? In-Reply-To: References: Message-ID: [root at LH20-GPFS1 ~]# yum install gpfs.smb Loaded plugins: fastestmirror, langpacks base | 3.6 kB 00:00:00 epel/x86_64/metalink | 24 kB 00:00:00 epel | 4.3 kB 00:00:00 extras | 3.4 kB 00:00:00 updates | 3.4 kB 00:00:00 (1/4): epel/x86_64/updateinfo | 789 kB 00:00:00 (2/4): extras/7/x86_64/primary_db | 188 kB 00:00:00 (3/4): epel/x86_64/primary_db | 4.8 MB 00:00:00 (4/4): updates/7/x86_64/primary_db | 7.7 MB 00:00:01 Loading mirror speeds from cached hostfile * base: centos.spd.co.il * epel: mirror.nonstop.co.il * extras: centos.spd.co.il * updates: centos.spd.co.il No package gpfs.smb available. Error: Nothing to do [root at LH20-GPFS1 ~]# ls /usr/lpp/mmfs/4.2.2.0/ gpfs_rpms/ license/ manifest zimon_debs/ zimon_rpms/ something is missing in my machine :) On Wed, Jul 5, 2017 at 3:23 PM, Sobey, Richard A wrote: > You don't have the gpfs.smb package installed. > > > > Yum install gpfs.smb > > > > Or install the package manually from /usr/lpp/mmfs//smb_rpms > > > > [root at ces ~]# rpm -qa | grep gpfs > > gpfs.smb-4.3.11_gpfs_21-8.el7.x86_64 > > > > > > -----Original Message----- > From: gpfsug-discuss-bounces at spectrumscale.org > [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Ilan Schwarts > Sent: 05 July 2017 13:19 > To: gpfsug main discussion list > Subject: [gpfsug-discuss] Fwd: update smb package ? > > > > [root at LH20-GPFS1 ~]# rpm -qa | grep gpfs > > gpfs.ext-4.2.2-0.x86_64 > > gpfs.msg.en_US-4.2.2-0.noarch > > gpfs.gui-4.2.2-0.noarch > > gpfs.gpl-4.2.2-0.noarch > > gpfs.gskit-8.0.50-57.x86_64 > > gpfs.gss.pmsensors-4.2.2-0.el7.x86_64 > > gpfs.adv-4.2.2-0.x86_64 > > gpfs.java-4.2.2-0.x86_64 > > gpfs.gss.pmcollector-4.2.2-0.el7.x86_64 > > gpfs.base-4.2.2-0.x86_64 > > gpfs.crypto-4.2.2-0.x86_64 > > [root at LH20-GPFS1 ~]# uname -a > > Linux LH20-GPFS1.LH20.com 3.10.0-514.21.2.el7.x86_64 #1 SMP Tue Jun 20 > > 12:24:47 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux > > [root at LH20-GPFS1 ~]# > > _______________________________________________ > > gpfsug-discuss mailing list > > gpfsug-discuss at spectrumscale.org > > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -- - Ilan Schwarts From r.sobey at imperial.ac.uk Wed Jul 5 13:41:29 2017 From: r.sobey at imperial.ac.uk (Sobey, Richard A) Date: Wed, 5 Jul 2017 12:41:29 +0000 Subject: [gpfsug-discuss] Fwd: update smb package ? In-Reply-To: References: Message-ID: Ah... yes you need to download the protocols version of gpfs from Fix Central. Same GPFS but with the SMB/Object etc packages. -----Original Message----- From: Ilan Schwarts [mailto:ilan84 at gmail.com] Sent: 05 July 2017 13:29 To: gpfsug main discussion list ; Sobey, Richard A Subject: Re: [gpfsug-discuss] Fwd: update smb package ? [root at LH20-GPFS1 ~]# yum install gpfs.smb Loaded plugins: fastestmirror, langpacks base | 3.6 kB 00:00:00 epel/x86_64/metalink | 24 kB 00:00:00 epel | 4.3 kB 00:00:00 extras | 3.4 kB 00:00:00 updates | 3.4 kB 00:00:00 (1/4): epel/x86_64/updateinfo | 789 kB 00:00:00 (2/4): extras/7/x86_64/primary_db | 188 kB 00:00:00 (3/4): epel/x86_64/primary_db | 4.8 MB 00:00:00 (4/4): updates/7/x86_64/primary_db | 7.7 MB 00:00:01 Loading mirror speeds from cached hostfile * base: centos.spd.co.il * epel: mirror.nonstop.co.il * extras: centos.spd.co.il * updates: centos.spd.co.il No package gpfs.smb available. Error: Nothing to do [root at LH20-GPFS1 ~]# ls /usr/lpp/mmfs/4.2.2.0/ gpfs_rpms/ license/ manifest zimon_debs/ zimon_rpms/ something is missing in my machine :) On Wed, Jul 5, 2017 at 3:23 PM, Sobey, Richard A wrote: > You don't have the gpfs.smb package installed. > > > > Yum install gpfs.smb > > > > Or install the package manually from /usr/lpp/mmfs//smb_rpms > > > > [root at ces ~]# rpm -qa | grep gpfs > > gpfs.smb-4.3.11_gpfs_21-8.el7.x86_64 > > > > > > -----Original Message----- > From: gpfsug-discuss-bounces at spectrumscale.org > [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Ilan > Schwarts > Sent: 05 July 2017 13:19 > To: gpfsug main discussion list > Subject: [gpfsug-discuss] Fwd: update smb package ? > > > > [root at LH20-GPFS1 ~]# rpm -qa | grep gpfs > > gpfs.ext-4.2.2-0.x86_64 > > gpfs.msg.en_US-4.2.2-0.noarch > > gpfs.gui-4.2.2-0.noarch > > gpfs.gpl-4.2.2-0.noarch > > gpfs.gskit-8.0.50-57.x86_64 > > gpfs.gss.pmsensors-4.2.2-0.el7.x86_64 > > gpfs.adv-4.2.2-0.x86_64 > > gpfs.java-4.2.2-0.x86_64 > > gpfs.gss.pmcollector-4.2.2-0.el7.x86_64 > > gpfs.base-4.2.2-0.x86_64 > > gpfs.crypto-4.2.2-0.x86_64 > > [root at LH20-GPFS1 ~]# uname -a > > Linux LH20-GPFS1.LH20.com 3.10.0-514.21.2.el7.x86_64 #1 SMP Tue Jun 20 > > 12:24:47 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux > > [root at LH20-GPFS1 ~]# > > _______________________________________________ > > gpfsug-discuss mailing list > > gpfsug-discuss at spectrumscale.org > > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -- - Ilan Schwarts From ilan84 at gmail.com Wed Jul 5 14:08:39 2017 From: ilan84 at gmail.com (Ilan Schwarts) Date: Wed, 5 Jul 2017 16:08:39 +0300 Subject: [gpfsug-discuss] Fwd: update smb package ? In-Reply-To: References: Message-ID: Sorry for newbish question, What do you mean by "from Fix Central", Do i need to define another repository for the yum ? or download manually ? its spectrum scale 4.2.2 On Wed, Jul 5, 2017 at 3:41 PM, Sobey, Richard A wrote: > Ah... yes you need to download the protocols version of gpfs from Fix Central. Same GPFS but with the SMB/Object etc packages. > > -----Original Message----- > From: Ilan Schwarts [mailto:ilan84 at gmail.com] > Sent: 05 July 2017 13:29 > To: gpfsug main discussion list ; Sobey, Richard A > Subject: Re: [gpfsug-discuss] Fwd: update smb package ? > > [root at LH20-GPFS1 ~]# yum install gpfs.smb Loaded plugins: fastestmirror, langpacks base > > | 3.6 kB 00:00:00 > epel/x86_64/metalink > > | 24 kB 00:00:00 > epel > > | 4.3 kB 00:00:00 > extras > > | 3.4 kB 00:00:00 > updates > > | 3.4 kB 00:00:00 > (1/4): epel/x86_64/updateinfo > > | 789 kB 00:00:00 > (2/4): extras/7/x86_64/primary_db > > | 188 kB 00:00:00 > (3/4): epel/x86_64/primary_db > > | 4.8 MB 00:00:00 > (4/4): updates/7/x86_64/primary_db > > | 7.7 MB 00:00:01 > Loading mirror speeds from cached hostfile > * base: centos.spd.co.il > * epel: mirror.nonstop.co.il > * extras: centos.spd.co.il > * updates: centos.spd.co.il > No package gpfs.smb available. > Error: Nothing to do > > > [root at LH20-GPFS1 ~]# ls /usr/lpp/mmfs/4.2.2.0/ > gpfs_rpms/ license/ manifest zimon_debs/ zimon_rpms/ > > > something is missing in my machine :) > > > On Wed, Jul 5, 2017 at 3:23 PM, Sobey, Richard A wrote: >> You don't have the gpfs.smb package installed. >> >> >> >> Yum install gpfs.smb >> >> >> >> Or install the package manually from /usr/lpp/mmfs//smb_rpms >> >> >> >> [root at ces ~]# rpm -qa | grep gpfs >> >> gpfs.smb-4.3.11_gpfs_21-8.el7.x86_64 >> >> >> >> >> >> -----Original Message----- >> From: gpfsug-discuss-bounces at spectrumscale.org >> [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Ilan >> Schwarts >> Sent: 05 July 2017 13:19 >> To: gpfsug main discussion list >> Subject: [gpfsug-discuss] Fwd: update smb package ? >> >> >> >> [root at LH20-GPFS1 ~]# rpm -qa | grep gpfs >> >> gpfs.ext-4.2.2-0.x86_64 >> >> gpfs.msg.en_US-4.2.2-0.noarch >> >> gpfs.gui-4.2.2-0.noarch >> >> gpfs.gpl-4.2.2-0.noarch >> >> gpfs.gskit-8.0.50-57.x86_64 >> >> gpfs.gss.pmsensors-4.2.2-0.el7.x86_64 >> >> gpfs.adv-4.2.2-0.x86_64 >> >> gpfs.java-4.2.2-0.x86_64 >> >> gpfs.gss.pmcollector-4.2.2-0.el7.x86_64 >> >> gpfs.base-4.2.2-0.x86_64 >> >> gpfs.crypto-4.2.2-0.x86_64 >> >> [root at LH20-GPFS1 ~]# uname -a >> >> Linux LH20-GPFS1.LH20.com 3.10.0-514.21.2.el7.x86_64 #1 SMP Tue Jun 20 >> >> 12:24:47 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux >> >> [root at LH20-GPFS1 ~]# >> >> _______________________________________________ >> >> gpfsug-discuss mailing list >> >> gpfsug-discuss at spectrumscale.org >> >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >> >> >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at spectrumscale.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >> > > > > -- > > > - > Ilan Schwarts -- - Ilan Schwarts From S.J.Thompson at bham.ac.uk Wed Jul 5 14:40:46 2017 From: S.J.Thompson at bham.ac.uk (Simon Thompson (IT Research Support)) Date: Wed, 5 Jul 2017 13:40:46 +0000 Subject: [gpfsug-discuss] Fwd: update smb package ? In-Reply-To: References: Message-ID: IBM code comes from either IBM Passport Advantage (where you sign in with a corporate account that lists your product associations), or from IBM Fix Central (google it). Fix Central is supposed to be for service updates. Give the lack of experience, you may want to look at the install toolkit which ships with Spectrum Scale. Simon On 05/07/2017, 14:08, "gpfsug-discuss-bounces at spectrumscale.org on behalf of ilan84 at gmail.com" wrote: >Sorry for newbish question, >What do you mean by "from Fix Central", >Do i need to define another repository for the yum ? or download manually >? >its spectrum scale 4.2.2 > >On Wed, Jul 5, 2017 at 3:41 PM, Sobey, Richard A >wrote: >> Ah... yes you need to download the protocols version of gpfs from Fix >>Central. Same GPFS but with the SMB/Object etc packages. >> >> -----Original Message----- >> From: Ilan Schwarts [mailto:ilan84 at gmail.com] >> Sent: 05 July 2017 13:29 >> To: gpfsug main discussion list ; >>Sobey, Richard A >> Subject: Re: [gpfsug-discuss] Fwd: update smb package ? >> >> [root at LH20-GPFS1 ~]# yum install gpfs.smb Loaded plugins: >>fastestmirror, langpacks base >> >> | 3.6 kB 00:00:00 >> epel/x86_64/metalink >> >> | 24 kB 00:00:00 >> epel >> >> | 4.3 kB 00:00:00 >> extras >> >> | 3.4 kB 00:00:00 >> updates >> >> | 3.4 kB 00:00:00 >> (1/4): epel/x86_64/updateinfo >> >> | 789 kB 00:00:00 >> (2/4): extras/7/x86_64/primary_db >> >> | 188 kB 00:00:00 >> (3/4): epel/x86_64/primary_db >> >> | 4.8 MB 00:00:00 >> (4/4): updates/7/x86_64/primary_db >> >> | 7.7 MB 00:00:01 >> Loading mirror speeds from cached hostfile >> * base: centos.spd.co.il >> * epel: mirror.nonstop.co.il >> * extras: centos.spd.co.il >> * updates: centos.spd.co.il >> No package gpfs.smb available. >> Error: Nothing to do >> >> >> [root at LH20-GPFS1 ~]# ls /usr/lpp/mmfs/4.2.2.0/ >> gpfs_rpms/ license/ manifest zimon_debs/ zimon_rpms/ >> >> >> something is missing in my machine :) >> >> >> On Wed, Jul 5, 2017 at 3:23 PM, Sobey, Richard A >> wrote: >>> You don't have the gpfs.smb package installed. >>> >>> >>> >>> Yum install gpfs.smb >>> >>> >>> >>> Or install the package manually from /usr/lpp/mmfs//smb_rpms >>> >>> >>> >>> [root at ces ~]# rpm -qa | grep gpfs >>> >>> gpfs.smb-4.3.11_gpfs_21-8.el7.x86_64 >>> >>> >>> >>> >>> >>> -----Original Message----- >>> From: gpfsug-discuss-bounces at spectrumscale.org >>> [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Ilan >>> Schwarts >>> Sent: 05 July 2017 13:19 >>> To: gpfsug main discussion list >>> Subject: [gpfsug-discuss] Fwd: update smb package ? >>> >>> >>> >>> [root at LH20-GPFS1 ~]# rpm -qa | grep gpfs >>> >>> gpfs.ext-4.2.2-0.x86_64 >>> >>> gpfs.msg.en_US-4.2.2-0.noarch >>> >>> gpfs.gui-4.2.2-0.noarch >>> >>> gpfs.gpl-4.2.2-0.noarch >>> >>> gpfs.gskit-8.0.50-57.x86_64 >>> >>> gpfs.gss.pmsensors-4.2.2-0.el7.x86_64 >>> >>> gpfs.adv-4.2.2-0.x86_64 >>> >>> gpfs.java-4.2.2-0.x86_64 >>> >>> gpfs.gss.pmcollector-4.2.2-0.el7.x86_64 >>> >>> gpfs.base-4.2.2-0.x86_64 >>> >>> gpfs.crypto-4.2.2-0.x86_64 >>> >>> [root at LH20-GPFS1 ~]# uname -a >>> >>> Linux LH20-GPFS1.LH20.com 3.10.0-514.21.2.el7.x86_64 #1 SMP Tue Jun 20 >>> >>> 12:24:47 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux >>> >>> [root at LH20-GPFS1 ~]# >>> >>> _______________________________________________ >>> >>> gpfsug-discuss mailing list >>> >>> gpfsug-discuss at spectrumscale.org >>> >>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >>> >>> >>> _______________________________________________ >>> gpfsug-discuss mailing list >>> gpfsug-discuss at spectrumscale.org >>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >>> >> >> >> >> -- >> >> >> - >> Ilan Schwarts > > > >-- > > >- >Ilan Schwarts >_______________________________________________ >gpfsug-discuss mailing list >gpfsug-discuss at spectrumscale.org >http://gpfsug.org/mailman/listinfo/gpfsug-discuss From hpc-luke at uconn.edu Wed Jul 5 15:52:52 2017 From: hpc-luke at uconn.edu (hpc-luke at uconn.edu) Date: Wed, 05 Jul 2017 10:52:52 -0400 Subject: [gpfsug-discuss] Mass UID migration suggestions Message-ID: <595cfd44.kc2G2OUXdgiX+srO%hpc-luke@uconn.edu> Thank you both, I was already using the c++ stl hash map to do the mapping of uid_t to uid_t, but I will use that example to learn how to use the proper gpfs apis. And thank you for the ACL suggestion, as that is likely the best way to handle certain users who are logged in/running jobs constantly, where we would not like to force them to logout. And thank you for the reminder to re-run backups. Thank you for your time, Luke Storrs-HPC University of Connecticut From mweil at wustl.edu Wed Jul 5 16:51:50 2017 From: mweil at wustl.edu (Matt Weil) Date: Wed, 5 Jul 2017 10:51:50 -0500 Subject: [gpfsug-discuss] pmcollector node Message-ID: <390fca36-e49a-7255-1ecf-a467a6ee92a5@wustl.edu> Hello all, Question on the requirements on pmcollector node/s for a 500+ node cluster. Is there a sizing guide? What specifics should we scale? CPU Disks memory? Thanks Matt From kkr at lbl.gov Wed Jul 5 17:23:38 2017 From: kkr at lbl.gov (Kristy Kallback-Rose) Date: Wed, 5 Jul 2017 09:23:38 -0700 Subject: [gpfsug-discuss] Zimon checks, ability to use a subset of checks Message-ID: As I understand it, there is currently no way to collect just a subset of stats in a category. For example, CPU stats are: cpu_contexts cpu_guest cpu_guest_nice cpu_hiq cpu_idle cpu_interrupts cpu_iowait cpu_nice cpu_siq cpu_steal cpu_system cpu_user but I'm only interested in tracking a subset. The config file seems to want the category "CPU" which seems like an all-or-nothing approach. I am thinking about submitting an RFE to request the ability to pick and choose checks within a category, but thought I'd float the idea here before I submit. Would others find value in this? Thanks, Kristy -------------- next part -------------- An HTML attachment was scrubbed... URL: From Robert.Oesterlin at nuance.com Wed Jul 5 18:00:44 2017 From: Robert.Oesterlin at nuance.com (Oesterlin, Robert) Date: Wed, 5 Jul 2017 17:00:44 +0000 Subject: [gpfsug-discuss] Zimon checks, ability to use a subset of checks Message-ID: <11A5144D-A5AF-4829-B7D4-4313F357C6CB@nuance.com> Count me in! Bob Oesterlin Sr Principal Storage Engineer, Nuance From: on behalf of Kristy Kallback-Rose Reply-To: gpfsug main discussion list Date: Wednesday, July 5, 2017 at 11:23 AM To: gpfsug main discussion list Subject: [EXTERNAL] [gpfsug-discuss] Zimon checks, ability to use a subset of checks I am thinking about submitting an RFE to request the ability to pick and choose checks within a category, but thought I'd float the idea here before I submit. Would others find value in this? -------------- next part -------------- An HTML attachment was scrubbed... URL: From kkr at lbl.gov Wed Jul 5 19:22:14 2017 From: kkr at lbl.gov (Kristy Kallback-Rose) Date: Wed, 5 Jul 2017 11:22:14 -0700 Subject: [gpfsug-discuss] Meaning of API Stats Category In-Reply-To: References: Message-ID: Thank you Eric. That did help. On Mon, Jun 12, 2017 at 2:01 PM, IBM Spectrum Scale wrote: > Hello Kristy, > > The GPFSFileSystemAPI and GPFSNodeAPI sensor metrics are from the point of > view of "applications" in the sense that they provide stats about I/O > requests made to files in GPFS file systems from user level applications > using POSIX interfaces like open(), close(), read(), write(), etc. > > This is in contrast to similarly named sensors without the "API" suffix, > like GPFSFilesystem and GPFSNode. Those sensors provide stats about I/O > requests made by the GPFS code to NSDs (disks) making up GPFS file systems. > > The relationship between application I/O and disk I/O might or might not > be obvious. Consider some examples. An application that starts > sequentially reading a file might, at least initially, cause more disk I/O > than expected because GPFS has decided to prefetch data. An application > write() might not immediately cause a the writing of disk blocks due to the > operation of the pagepool. Ultimately, application write()s might cause > twice as much data written to disk due to the replication factor of the > file system. Application I/O concerns itself with user data; disk I/O > might have to occur to handle the user data and associated file system > metadata (like inodes and indirect blocks). > > The difference between GPFSFileSystemAPI and GPFSNodeAPI: > GPFSFileSystemAPI reports stats for application I/O per filesystem per > node; GPFSNodeAPI reports application I/O stats per node. Similarly, > GPFSFilesystem reports stats for disk I/O per filesystem per node; GPFSNode > reports disk I/O stats per node. > > I hope this helps. > Eric Agar > > > Regards, The Spectrum Scale (GPFS) team > > ------------------------------------------------------------ > ------------------------------------------------------ > If you feel that your question can benefit other users of Spectrum Scale > (GPFS), then please post it to the public IBM developerWroks Forum at > https://www.ibm.com/developerworks/community/ > forums/html/forum?id=11111111-0000-0000-0000-000000000479. > > If your query concerns a potential software error in Spectrum Scale (GPFS) > and you have an IBM software maintenance contract please contact > 1-800-237-5511 <(800)%20237-5511> in the United States or your local IBM > Service Center in other countries. > > The forum is informally monitored as time permits and should not be used > for priority messages to the Spectrum Scale (GPFS) team. > > > > From: "Oesterlin, Robert" > To: gpfsug main discussion list > Date: 06/12/2017 04:43 PM > Subject: Re: [gpfsug-discuss] Meaning of API Stats Category > Sent by: gpfsug-discuss-bounces at spectrumscale.org > ------------------------------ > > > > Hi Kristy > > What I *think* the difference is: > > gpfs_fis: - calls to the GPFS file system interface > gpfs_fs: calls from the node that actually make it to the NSD > server/metadata > > The difference being what?s served out of the local node pagepool. > > Bob Oesterlin > Sr Principal Storage Engineer, Nuance > > > *From: * on behalf of Kristy > Kallback-Rose > * Reply-To: *gpfsug main discussion list > > * Date: *Monday, June 12, 2017 at 3:17 PM > * To: *gpfsug main discussion list > * Subject: *[EXTERNAL] [gpfsug-discuss] Meaning of API Stats Category > > Hi, > > Can anyone provide more detail about what is meant by the following two > categories of stats? The PDG has a limited description as far as I could > see. I'm not sure what is meant by Application PoV. Would the Grafana > bridge count as an "application"? > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From sfadden at us.ibm.com Wed Jul 5 19:50:24 2017 From: sfadden at us.ibm.com (Scott Fadden) Date: Wed, 5 Jul 2017 18:50:24 +0000 Subject: [gpfsug-discuss] Zimon checks, ability to use a subset of checks In-Reply-To: <11A5144D-A5AF-4829-B7D4-4313F357C6CB@nuance.com> Message-ID: What do You mean by category? Node class, metric type or something else? On Jul 5, 2017, 10:01:33 AM, Robert.Oesterlin at nuance.com wrote: From: Robert.Oesterlin at nuance.com To: gpfsug-discuss at spectrumscale.org Cc: Date: Jul 5, 2017 10:01:33 AM Subject: Re: [gpfsug-discuss] Zimon checks, ability to use a subset of checks Count me in! Bob Oesterlin Sr Principal Storage Engineer, Nuance From: on behalf of Kristy Kallback-Rose Reply-To: gpfsug main discussion list Date: Wednesday, July 5, 2017 at 11:23 AM To: gpfsug main discussion list Subject: [EXTERNAL] [gpfsug-discuss] Zimon checks, ability to use a subset of checks I am thinking about submitting an RFE to request the ability to pick and choose checks within a category, but thought I'd float the idea here before I submit. Would others find value in this? -------------- next part -------------- An HTML attachment was scrubbed... URL: From sfadden at us.ibm.com Wed Jul 5 19:51:46 2017 From: sfadden at us.ibm.com (Scott Fadden) Date: Wed, 5 Jul 2017 18:51:46 +0000 Subject: [gpfsug-discuss] Zimon checks, ability to use a subset of checks In-Reply-To: Message-ID: Never mind just saw your earlier email On Jul 5, 2017, 11:50:24 AM, sfadden at us.ibm.com wrote: From: sfadden at us.ibm.com To: gpfsug-discuss at spectrumscale.org Cc: gpfsug-discuss at spectrumscale.org Date: Jul 5, 2017 11:50:24 AM Subject: Re: [gpfsug-discuss] Zimon checks, ability to use a subset of checks What do You mean by category? Node class, metric type or something else? On Jul 5, 2017, 10:01:33 AM, Robert.Oesterlin at nuance.com wrote: From: Robert.Oesterlin at nuance.com To: gpfsug-discuss at spectrumscale.org Cc: Date: Jul 5, 2017 10:01:33 AM Subject: Re: [gpfsug-discuss] Zimon checks, ability to use a subset of checks Count me in! Bob Oesterlin Sr Principal Storage Engineer, Nuance From: on behalf of Kristy Kallback-Rose Reply-To: gpfsug main discussion list Date: Wednesday, July 5, 2017 at 11:23 AM To: gpfsug main discussion list Subject: [EXTERNAL] [gpfsug-discuss] Zimon checks, ability to use a subset of checks I am thinking about submitting an RFE to request the ability to pick and choose checks within a category, but thought I'd float the idea here before I submit. Would others find value in this? -------------- next part -------------- An HTML attachment was scrubbed... URL: From scale at us.ibm.com Thu Jul 6 06:37:33 2017 From: scale at us.ibm.com (IBM Spectrum Scale) Date: Thu, 6 Jul 2017 11:07:33 +0530 Subject: [gpfsug-discuss] pmcollector node In-Reply-To: <390fca36-e49a-7255-1ecf-a467a6ee92a5@wustl.edu> References: <390fca36-e49a-7255-1ecf-a467a6ee92a5@wustl.edu> Message-ID: Hi Anna, Can you please check if you can answer this. Or else let me know who to contact for this. Regards, The Spectrum Scale (GPFS) team ------------------------------------------------------------------------------------------------------------------ If you feel that your question can benefit other users of Spectrum Scale (GPFS), then please post it to the public IBM developerWroks Forum at https://www.ibm.com/developerworks/community/forums/html/forum?id=11111111-0000-0000-0000-000000000479 . If your query concerns a potential software error in Spectrum Scale (GPFS) and you have an IBM software maintenance contract please contact 1-800-237-5511 in the United States or your local IBM Service Center in other countries. The forum is informally monitored as time permits and should not be used for priority messages to the Spectrum Scale (GPFS) team. From: Matt Weil To: gpfsug-discuss at spectrumscale.org Date: 07/05/2017 09:22 PM Subject: [gpfsug-discuss] pmcollector node Sent by: gpfsug-discuss-bounces at spectrumscale.org Hello all, Question on the requirements on pmcollector node/s for a 500+ node cluster. Is there a sizing guide? What specifics should we scale? CPU Disks memory? Thanks Matt _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From Wei1.Guo at UTSouthwestern.edu Thu Jul 6 18:49:32 2017 From: Wei1.Guo at UTSouthwestern.edu (Wei Guo) Date: Thu, 6 Jul 2017 17:49:32 +0000 Subject: [gpfsug-discuss] depmod: ERROR: fstatat(4, mmfs26.ko): No such file or directory Message-ID: <7304eaf93aa74265ae45a214288dfe4c@SWMS13MAIL10.swmed.org> Hi, All, We are testing to upgrade our clients to new RHEL 7.3 kernel with GPFS 4.2.1.0. When we have 3.10.0-514.26.2.el7, installing the gplbin has the following errors: # ./mmbuildgpl --build-package -v # cd /root/rpmbuild/RPMS/x86_64/ # rpm -ivh gpfs.gplbin-3.10.0-514.26.2.el7.x86_64-4.2.1-0.x86_64.rpm Running transaction Installing : gpfs.gplbin-3.10.0-514.26.2.el7.x86_64-4.2.1-0.x86_64 1/1 depmod: ERROR: fstatat(4, mmfs26.ko): No such file or directory depmod: ERROR: fstatat(4, mmfslinux.ko): No such file or directory depmod: ERROR: fstatat(4, tracedev.ko): No such file or directory depmod -a also show the three kernel extension not found. However, in the following directory, they are there. # pwd /lib/modules/3.10.0-514.26.2.el7.x86_64/extra # ls kernel mmfs26.ko mmfslinux.ko tracedev.ko The error does not show in a slightly older kernel -3.10.0-514.21.2 version. From https://www.ibm.com/support/knowledgecenter/en/STXKQY/gpfsclustersfaq.html Table 29, both versions should be supported. RHEL Distribution Latest Kernel Level Tested1 Minimum Kernel Level Required2 Minimum IBM Spectrum Scale Level Tested3 Minimum IBM Spectrum Scale Level Supported4 7.3 3.10.0-514 3.10.0-514 V4.1.1.11/V4.2.2.1 V4.1.1.11/V4.2.1.2 For technical reasons, this test node will not be added to production. A previous thread http://www.spectrumscale.org/pipermail/gpfsug-discuss/2016-April/001529.html indicated that this will be OK. However, it is better to get a clear conclusion before we update other client nodes. Shall we recompile the kernel? Thanks all. Wei Guo ________________________________ UT Southwestern Medical Center The future of medicine, today. -------------- next part -------------- An HTML attachment was scrubbed... URL: From S.J.Thompson at bham.ac.uk Thu Jul 6 18:52:44 2017 From: S.J.Thompson at bham.ac.uk (Simon Thompson (IT Research Support)) Date: Thu, 6 Jul 2017 17:52:44 +0000 Subject: [gpfsug-discuss] depmod: ERROR: fstatat(4, mmfs26.ko): No such file or directory In-Reply-To: <7304eaf93aa74265ae45a214288dfe4c@SWMS13MAIL10.swmed.org> References: <7304eaf93aa74265ae45a214288dfe4c@SWMS13MAIL10.swmed.org> Message-ID: Look in the kernel weak-updates directory, you will probably find some broken files in there. These come from things trying to update the kernel modules when you do the kernel upgrade. Just delete the three gpfs related ones and run depmod The safest way is to remove the gpfs.gplbin packages, then upgrade the kernel, reboot and add the new gpfs.gplbin packages for the new kernel. Simon ________________________________________ From: gpfsug-discuss-bounces at spectrumscale.org [gpfsug-discuss-bounces at spectrumscale.org] on behalf of Wei Guo [Wei1.Guo at UTSouthwestern.edu] Sent: 06 July 2017 18:49 To: gpfsug-discuss at spectrumscale.org Subject: [gpfsug-discuss] depmod: ERROR: fstatat(4, mmfs26.ko): No such file or directory Hi, All, We are testing to upgrade our clients to new RHEL 7.3 kernel with GPFS 4.2.1.0. When we have 3.10.0-514.26.2.el7, installing the gplbin has the following errors: # ./mmbuildgpl --build-package ?v # cd /root/rpmbuild/RPMS/x86_64/ # rpm -ivh gpfs.gplbin-3.10.0-514.26.2.el7.x86_64-4.2.1-0.x86_64.rpm Running transaction Installing : gpfs.gplbin-3.10.0-514.26.2.el7.x86_64-4.2.1-0.x86_64 1/1 depmod: ERROR: fstatat(4, mmfs26.ko): No such file or directory depmod: ERROR: fstatat(4, mmfslinux.ko): No such file or directory depmod: ERROR: fstatat(4, tracedev.ko): No such file or directory depmod -a also show the three kernel extension not found. However, in the following directory, they are there. # pwd /lib/modules/3.10.0-514.26.2.el7.x86_64/extra # ls kernel mmfs26.ko mmfslinux.ko tracedev.ko The error does not show in a slightly older kernel -3.10.0-514.21.2 version. From https://www.ibm.com/support/knowledgecenter/en/STXKQY/gpfsclustersfaq.html Table 29, both versions should be supported. RHEL Distribution Latest Kernel Level Tested1 Minimum Kernel Level Required2 Minimum IBM Spectrum Scale Level Tested3 Minimum IBM Spectrum Scale Level Supported4 7.3 3.10.0-514 3.10.0-514 V4.1.1.11/V4.2.2.1 V4.1.1.11/V4.2.1.2 For technical reasons, this test node will not be added to production. A previous thread http://www.spectrumscale.org/pipermail/gpfsug-discuss/2016-April/001529.html indicated that this will be OK. However, it is better to get a clear conclusion before we update other client nodes. Shall we recompile the kernel? Thanks all. Wei Guo ________________________________ UT Southwestern Medical Center The future of medicine, today. From abeattie at au1.ibm.com Thu Jul 6 06:07:07 2017 From: abeattie at au1.ibm.com (Andrew Beattie) Date: Thu, 6 Jul 2017 05:07:07 +0000 Subject: [gpfsug-discuss] Scale / Perfmon / Grafana - services running but no data Message-ID: An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: Image.14992893800360.png Type: image/png Size: 431718 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: Image.14992893800362.png Type: image/png Size: 1001127 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: Image.14993172756190.png Type: image/png Size: 381651 bytes Desc: not available URL: From neil.wilson at metoffice.gov.uk Fri Jul 7 10:18:40 2017 From: neil.wilson at metoffice.gov.uk (Wilson, Neil) Date: Fri, 7 Jul 2017 09:18:40 +0000 Subject: [gpfsug-discuss] Scale / Perfmon / Grafana - services running but no data In-Reply-To: References: Message-ID: Hi Andrew, Have you created new dashboards for GPFS? This shows you how to do it https://www.ibm.com/developerworks/community/wikis/home?lang=en#!/wiki/General%20Parallel%20File%20System%20(GPFS)/page/Creating%20Grafana%20dashboard Alternatively there are some predefined dashboards here https://www.ibm.com/developerworks/community/wikis/home?lang=en#!/wiki/General%20Parallel%20File%20System%20%28GPFS%29/page/Importing%20predefined%20Grafana%20dashboards that you can import and have a play around with? Regards Neil Neil Wilson Senior IT Practitioner Storage Team IT Services Met Office FitzRoy Road Exeter Devon EX1 3PB United Kingdom Tel: +44 (0)1392 885959 Email: neil.wilson at metoffice.gov.uk Website www.metoffice.gov.uk Our magazine Barometer is now available online at http://www.metoffice.gov.uk/barometer/ P Please consider the environment before printing this e-mail. Thank you. From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Andrew Beattie Sent: 06 July 2017 06:07 To: gpfsug-discuss at spectrumscale.org Subject: [gpfsug-discuss] Scale / Perfmon / Grafana - services running but no data Greetings, I'm currently setting up Grafana to interact with one of our Scale Clusters and i've followed the knowledge centre link in terms of setup. https://www.ibm.com/support/knowledgecenter/en/STXKQY_4.2.3/com.ibm.spectrum.scale.v4r23.doc/bl1adv_setuppmbridgeforgrafana.htm However while everything appears to be working i'm not seeing any data coming through the reports within the grafana server, even though I can see data in the Scale GUI The current environment: [root at sc01n02 ~]# mmlscluster GPFS cluster information ======================== GPFS cluster name: sc01.spectrum GPFS cluster id: 18085710661892594990 GPFS UID domain: sc01.spectrum Remote shell command: /usr/bin/ssh Remote file copy command: /usr/bin/scp Repository type: CCR Node Daemon node name IP address Admin node name Designation ------------------------------------------------------------------ 1 sc01n01 10.2.12.11 sc01n01 quorum-manager-perfmon 2 sc01n02 10.2.12.12 sc01n02 quorum-manager-perfmon 3 sc01n03 10.2.12.13 sc01n03 quorum-manager-perfmon [root at sc01n02 ~]# [root at sc01n02 ~]# mmlsconfig Configuration data for cluster sc01.spectrum: --------------------------------------------- clusterName sc01.spectrum clusterId 18085710661892594990 autoload yes profile gpfsProtocolDefaults dmapiFileHandleSize 32 minReleaseLevel 4.2.2.0 ccrEnabled yes cipherList AUTHONLY maxblocksize 16M [cesNodes] maxMBpS 5000 numaMemoryInterleave yes enforceFilesetQuotaOnRoot yes workerThreads 512 [common] tscCmdPortRange 60000-61000 cesSharedRoot /ibm/cesSharedRoot/ces cifsBypassTraversalChecking yes syncSambaMetadataOps yes cifsBypassShareLocksOnRename yes adminMode central File systems in cluster sc01.spectrum: -------------------------------------- /dev/cesSharedRoot /dev/icos_demo /dev/scale01 [root at sc01n02 ~]# [root at sc01n02 ~]# systemctl status pmcollector ? pmcollector.service - LSB: Start the ZIMon performance monitor collector. Loaded: loaded (/etc/rc.d/init.d/pmcollector) Active: active (running) since Tue 2017-05-30 08:46:32 AEST; 1 months 6 days ago Docs: man:systemd-sysv-generator(8) Main PID: 2693 (ZIMonCollector) CGroup: /system.slice/pmcollector.service ??2693 /opt/IBM/zimon/ZIMonCollector -C /opt/IBM/zimon/ZIMonCollector.cfg... ??2698 python /opt/IBM/zimon/bin/pmmonitor.py -f /opt/IBM/zimon/syshealth... May 30 08:46:32 sc01n02 systemd[1]: Starting LSB: Start the ZIMon performance mon...... May 30 08:46:32 sc01n02 pmcollector[2584]: Starting performance monitor collector... May 30 08:46:32 sc01n02 systemd[1]: Started LSB: Start the ZIMon performance moni...r.. Hint: Some lines were ellipsized, use -l to show in full. From Grafana Server: [cid:image002.jpg at 01D2F70A.17F595F0] when I send a set of files to the cluster (3.8GB) I can see performance metrics within the Scale GUI [cid:image004.jpg at 01D2F70A.17F595F0] yet from the Grafana Dashboard im not seeing any data points [cid:image006.jpg at 01D2F70A.17F595F0] Can anyone provide some hints as to what might be happening? Regards, Andrew Beattie Software Defined Storage - IT Specialist Phone: 614-2133-7927 E-mail: abeattie at au1.ibm.com -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image002.jpg Type: image/jpeg Size: 14522 bytes Desc: image002.jpg URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image004.jpg Type: image/jpeg Size: 60060 bytes Desc: image004.jpg URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image006.jpg Type: image/jpeg Size: 25781 bytes Desc: image006.jpg URL: From olaf.weiser at de.ibm.com Fri Jul 7 10:18:13 2017 From: olaf.weiser at de.ibm.com (Olaf Weiser) Date: Fri, 7 Jul 2017 09:18:13 +0000 Subject: [gpfsug-discuss] Scale / Perfmon / Grafana - services running but no data In-Reply-To: References: Message-ID: An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/png Size: 431718 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/png Size: 1001127 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/png Size: 381651 bytes Desc: not available URL: From r.sobey at imperial.ac.uk Fri Jul 7 13:01:39 2017 From: r.sobey at imperial.ac.uk (Sobey, Richard A) Date: Fri, 7 Jul 2017 12:01:39 +0000 Subject: [gpfsug-discuss] Requirement to keep gpfs.smb the same on all nodes In-Reply-To: References: Message-ID: Just following up on this, has anyone successfully deployed Protocols (SMB) on RHEL 7.3 with the 4.2.3-2 packages? Richard From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Sobey, Richard A Sent: 04 July 2017 12:12 To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] Requirement to keep gpfs.smb the same on all nodes OK Simon, thanks. I suppose we're all in the same boat having to get change management approval etc! From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Simon Thompson (IT Research Support) Sent: 04 July 2017 12:09 To: gpfsug main discussion list > Subject: Re: [gpfsug-discuss] Requirement to keep gpfs.smb the same on all nodes AFAIK. Always. We have had the service eat itself BTW by having different code releases and trying this. Yes its a PITA that we have to get a change approval for it (so we don't do it as often as we should)... The upgrade process upgrades the SMB registry, we have also seen the CTDB lock stuff break when they are not running the same code release, so now we just don't do this. Simon From: > on behalf of "Sobey, Richard A" > Reply-To: "gpfsug-discuss at spectrumscale.org" > Date: Tuesday, 4 July 2017 at 11:54 To: "gpfsug-discuss at spectrumscale.org" > Subject: [gpfsug-discuss] Requirement to keep gpfs.smb the same on all nodes Hi all, For how long has this requirement been in force, and why? https://www.ibm.com/support/knowledgecenter/en/STXKQY_4.2.3/com.ibm.spectrum.scale.v4r23.doc/bl1ins_updatingsmb.htm All protocol nodes running the SMB service must have the same version of gpfs.smb installed at any time. This requires a brief outage of the SMB service to upgrade gpfs.smb to the newer version across all protocol nodes. The procedure outlined here is intended to reduce the outage to a minimum. Previously I've upgraded nodes one at a time over the course of a few days. Is the impact just that we won't be supported, or will a hole open up beneath my feet and swallow me whole? I really don't fancy the headache of getting approvals to get an outage of even 5 minutes at 6am.... Cheers Richard -------------- next part -------------- An HTML attachment was scrubbed... URL: From kkr at lbl.gov Fri Jul 7 23:32:40 2017 From: kkr at lbl.gov (Kristy Kallback-Rose) Date: Fri, 7 Jul 2017 15:32:40 -0700 Subject: [gpfsug-discuss] Hold the Date - Spectrum Scale Day @ HPCXXL (Sept 2017, NYC) Message-ID: Hello, More details will be provided as they become available, but just so you can make a placeholder on your calendar, there will be a Spectrum Scale Day the week of September 25th - 29th, likely Thursday, September 28th. This will be a part of the larger HPCXXL meeting (https://www.spxxl.org/?q=New-York-City-2017 ). You may recall this group was formerly called SPXXL and the website is in the process of transitioning to the new name (and getting a new certificate). You will be able to attend *just* the Spectrum Scale day if that is the only portion of the event you would like to attend. The NYC location is great for Spectrum Scale events because many IBMers, including developers, can come in from Poughkeepsie. More as we get closer to the date and details are settled. Cheers, Kristy -------------- next part -------------- An HTML attachment was scrubbed... URL: From a.khiredine at meteo.dz Sun Jul 9 08:26:44 2017 From: a.khiredine at meteo.dz (Atmane) Date: Sun, 9 Jul 2017 08:26:44 +0100 Subject: [gpfsug-discuss] GPFS Storage Server (GSS) Message-ID: From a.khiredine at meteo.dz Sun Jul 9 09:00:07 2017 From: a.khiredine at meteo.dz (Atmane) Date: Sun, 9 Jul 2017 09:00:07 +0100 Subject: [gpfsug-discuss] get free space in GSS Message-ID: Dear all, My name is Khiredine Atmane and I am a HPC system administrator at the National Office of Meteorology Algeria . We have a GSS24 running gss2.5.10.3-3b and gpfs-4.2.0.3. GSS configuration: 4 enclosures, 6 SSDs, 1 empty slots, 239 disks total, 0 NVRAM partitions disks = 3Tb SSD = 200 Gb df -h Filesystem Size Used Avail Use% Mounted on /dev/gpfs1 49T 18T 31T 38% /gpfs1 /dev/gpfs2 53T 13T 40T 25% /gpfs2 /dev/gpfs3 25T 4.9T 20T 21% /gpfs3 /dev/gpfs4 11T 133M 11T 1% /gpfs4 /dev/gpfs5 323T 34T 290T 11% /gpfs5 Total Is 461 To I think we have more space Could anyone make recommendation to troubleshoot find how many free space in GSS ? How to find the available space ? Thank you! Atmane -- Atmane Khiredine HPC System Admin | Office National de la M?t?orologie T?l : +213 21 50 73 93 Poste 303 | Fax : +213 21 50 79 40 | E-mail : a.khiredine at meteo.dz From laurence at qsplace.co.uk Sun Jul 9 09:58:05 2017 From: laurence at qsplace.co.uk (Laurence Horrocks-Barlow) Date: Sun, 09 Jul 2017 09:58:05 +0100 Subject: [gpfsug-discuss] get free space in GSS In-Reply-To: References: Message-ID: You can check the recovery groups to see if there is any remaining space. I don't have access to my test system to confirm the syntax however if memory serves. Run mmlsrecoverygroup to get a list of all the recovery groups then: mmlsrecoverygroup -L This will list all your declustered arrays and their free space. Their might be another method, however this way has always worked well for me. -- Lauz On 9 July 2017 09:00:07 BST, Atmane wrote: >Dear all, > >My name is Khiredine Atmane and I am a HPC system administrator at the > >National Office of Meteorology Algeria . We have a GSS24 running >gss2.5.10.3-3b and gpfs-4.2.0.3. > >GSS configuration: 4 enclosures, 6 SSDs, 1 empty slots, 239 disks >total, 0 >NVRAM partitions > >disks = 3Tb >SSD = 200 Gb >df -h >Filesystem Size Used Avail Use% Mounted on > >/dev/gpfs1 49T 18T 31T 38% /gpfs1 >/dev/gpfs2 53T 13T 40T 25% /gpfs2 >/dev/gpfs3 25T 4.9T 20T 21% /gpfs3 >/dev/gpfs4 11T 133M 11T 1% /gpfs4 >/dev/gpfs5 323T 34T 290T 11% /gpfs5 > >Total Is 461 To > >I think we have more space >Could anyone make recommendation to troubleshoot find how many free >space >in GSS ? >How to find the available space ? >Thank you! > >Atmane > > > >-- >Atmane Khiredine >HPC System Admin | Office National de la M?t?orologie >T?l : +213 21 50 73 93 Poste 303 | Fax : +213 21 50 79 40 | E-mail : >a.khiredine at meteo.dz >_______________________________________________ >gpfsug-discuss mailing list >gpfsug-discuss at spectrumscale.org >http://gpfsug.org/mailman/listinfo/gpfsug-discuss -- Sent from my Android device with K-9 Mail. Please excuse my brevity. -------------- next part -------------- An HTML attachment was scrubbed... URL: From a.khiredine at meteo.dz Sun Jul 9 13:26:26 2017 From: a.khiredine at meteo.dz (atmane khiredine) Date: Sun, 9 Jul 2017 12:26:26 +0000 Subject: [gpfsug-discuss] get free space in GSS In-Reply-To: References: , Message-ID: <4B32CB5C696F2849BDEF7DF9EACE884B5B41A465@SDEB-EXC02.meteo.dz> thank you very much for replying. I can not find the free space Here is the output of mmlsrecoverygroup [root at server1 ~]#mmlsrecoverygroup declustered arrays with recovery group vdisks vdisks servers ------------------ ----------- ------ ------- BB1RGL 3 18 server1,server2 BB1RGR 3 18 server2,server1 -------------------------------------------------------------- [root at server ~]# mmlsrecoverygroup BB1RGL -L declustered recovery group arrays vdisks pdisks format version ----------------- ----------- ------ ------ -------------- BB1RGL 3 18 119 4.2.0.1 declustered needs replace scrub background activity array service vdisks pdisks spares threshold free space duration task progress priority ----------- ------- ------ ------ ------ --------- ---------- -------- ------------------------- LOG no 1 3 0,0 1 558 GiB 14 days scrub 51% low DA1 no 11 58 2,31 2 12 GiB 14 days scrub 78% low DA2 no 6 58 2,31 2 4096 MiB 14 days scrub 10% low declustered checksum vdisk RAID code array vdisk size block size granularity state remarks ------------------ ------------------ ----------- ---------- ---------- ----------- ----- ------- gss0_logtip 3WayReplication LOG 128 MiB 1 MiB 512 ok logTip gss0_loghome 4WayReplication DA1 40 GiB 1 MiB 512 ok log BB1RGL_GPFS4_META1 4WayReplication DA1 451 GiB 1 MiB 32 KiB ok BB1RGL_GPFS4_DATA1 8+2p DA1 5133 GiB 1 MiB 32 KiB ok BB1RGL_GPFS1_META1 4WayReplication DA1 451 GiB 1 MiB 32 KiB ok BB1RGL_GPFS1_DATA1 8+2p DA1 12 TiB 1 MiB 32 KiB ok BB1RGL_GPFS3_META1 4WayReplication DA1 451 GiB 1 MiB 32 KiB ok BB1RGL_GPFS3_DATA1 8+2p DA1 12 TiB 1 MiB 32 KiB ok BB1RGL_GPFS2_META1 4WayReplication DA1 451 GiB 1 MiB 32 KiB ok BB1RGL_GPFS2_DATA1 8+2p DA1 13 TiB 2 MiB 32 KiB ok BB1RGL_GPFS2_META2 4WayReplication DA2 451 GiB 1 MiB 32 KiB ok BB1RGL_GPFS2_DATA2 8+2p DA2 13 TiB 2 MiB 32 KiB ok BB1RGL_GPFS1_META2 4WayReplication DA2 451 GiB 1 MiB 32 KiB ok BB1RGL_GPFS1_DATA2 8+2p DA2 12 TiB 1 MiB 32 KiB ok BB1RGL_GPFS5_META1 4WayReplication DA1 750 GiB 1 MiB 32 KiB ok BB1RGL_GPFS5_DATA1 8+2p DA1 70 TiB 16 MiB 32 KiB ok BB1RGL_GPFS5_META2 4WayReplication DA2 750 GiB 1 MiB 32 KiB ok BB1RGL_GPFS5_DATA2 8+2p DA2 90 TiB 16 MiB 32 KiB ok config data declustered array VCD spares actual rebuild spare space remarks ------------------ ------------------ ------------- --------------------------------- ---------------- rebuild space DA1 31 34 pdisk rebuild space DA2 31 35 pdisk config data max disk group fault tolerance actual disk group fault tolerance remarks ------------------ --------------------------------- --------------------------------- ---------------- rg descriptor 1 enclosure + 1 drawer 1 enclosure + 1 drawer limiting fault tolerance system index 2 enclosure 1 enclosure + 1 drawer limited by rg descriptor vdisk max disk group fault tolerance actual disk group fault tolerance remarks ------------------ --------------------------------- --------------------------------- ---------------- gss0_logtip 2 enclosure 1 enclosure + 1 drawer limited by rg descriptor gss0_loghome 1 enclosure + 1 drawer 1 enclosure + 1 drawer BB1RGL_GPFS4_META1 1 enclosure + 1 drawer 1 enclosure + 1 drawer BB1RGL_GPFS4_DATA1 2 drawer 2 drawer BB1RGL_GPFS1_META1 1 enclosure + 1 drawer 1 enclosure + 1 drawer BB1RGL_GPFS1_DATA1 2 drawer 2 drawer BB1RGL_GPFS3_META1 1 enclosure + 1 drawer 1 enclosure + 1 drawer BB1RGL_GPFS3_DATA1 2 drawer 2 drawer BB1RGL_GPFS2_META1 1 enclosure + 1 drawer 1 enclosure + 1 drawer BB1RGL_GPFS2_DATA1 2 drawer 2 drawer BB1RGL_GPFS2_META2 3 enclosure 1 enclosure + 1 drawer limited by rg descriptor BB1RGL_GPFS2_DATA2 2 drawer 2 drawer BB1RGL_GPFS1_META2 3 enclosure 1 enclosure + 1 drawer limited by rg descriptor BB1RGL_GPFS1_DATA2 2 drawer 2 drawer BB1RGL_GPFS5_META1 1 enclosure + 1 drawer 1 enclosure + 1 drawer BB1RGL_GPFS5_DATA1 2 drawer 2 drawer BB1RGL_GPFS5_META2 3 enclosure 1 enclosure + 1 drawer limited by rg descriptor BB1RGL_GPFS5_DATA2 2 drawer 2 drawer active recovery group server servers ----------------------------------------------- ------- server1 server1,server2 Atmane Khiredine HPC System Administrator | Office National de la M?t?orologie T?l : +213 21 50 73 93 # 303 | Fax : +213 21 50 79 40 | E-mail : a.khiredine at meteo.dz ________________________________ De : Laurence Horrocks-Barlow [laurence at qsplace.co.uk] Envoy? : dimanche 9 juillet 2017 09:58 ? : gpfsug main discussion list; atmane khiredine; gpfsug-discuss at spectrumscale.org Objet : Re: [gpfsug-discuss] get free space in GSS You can check the recovery groups to see if there is any remaining space. I don't have access to my test system to confirm the syntax however if memory serves. Run mmlsrecoverygroup to get a list of all the recovery groups then: mmlsrecoverygroup -L This will list all your declustered arrays and their free space. Their might be another method, however this way has always worked well for me. -- Lauz On 9 July 2017 09:00:07 BST, Atmane wrote: Dear all, My name is Khiredine Atmane and I am a HPC system administrator at the National Office of Meteorology Algeria . We have a GSS24 running gss2.5.10.3-3b and gpfs-4.2.0.3. GSS configuration: 4 enclosures, 6 SSDs, 1 empty slots, 239 disks total, 0 NVRAM partitions disks = 3Tb SSD = 200 Gb df -h Filesystem Size Used Avail Use% Mounted on /dev/gpfs1 49T 18T 31T 38% /gpfs1 /dev/gpfs2 53T 13T 40T 25% /gpfs2 /dev/gpfs3 25T 4.9T 20T 21% /gpfs3 /dev/gpfs4 11T 133M 11T 1% /gpfs4 /dev/gpfs5 323T 34T 290T 11% /gpfs5 Total Is 461 To I think we have more space Could anyone make recommendation to troubleshoot find how many free space in GSS ? How to find the available space ? Thank you! Atmane -- Sent from my Android device with K-9 Mail. Please excuse my brevity. From janfrode at tanso.net Sun Jul 9 17:45:32 2017 From: janfrode at tanso.net (Jan-Frode Myklebust) Date: Sun, 09 Jul 2017 16:45:32 +0000 Subject: [gpfsug-discuss] get free space in GSS In-Reply-To: <4B32CB5C696F2849BDEF7DF9EACE884B5B41A465@SDEB-EXC02.meteo.dz> References: <4B32CB5C696F2849BDEF7DF9EACE884B5B41A465@SDEB-EXC02.meteo.dz> Message-ID: You had it here: [root at server ~]# mmlsrecoverygroup BB1RGL -L declustered recovery group arrays vdisks pdisks format version ----------------- ----------- ------ ------ -------------- BB1RGL 3 18 119 4.2.0.1 declustered needs replace scrub background activity array service vdisks pdisks spares threshold free space duration task progress priority ----------- ------- ------ ------ ------ --------- ---------- -------- ------------------------- LOG no 1 3 0,0 1 558 GiB 14 days scrub 51% low DA1 no 11 58 2,31 2 12 GiB 14 days scrub 78% low DA2 no 6 58 2,31 2 4096 MiB 14 days scrub 10% low 12 GiB in DA1, and 4096 MiB i DA2, but effectively you'll get less when you add a raidCode to the vdisk. Best way to use it id to just don't specify a size to the vdisk, and max possible size will be used. -jf s?n. 9. jul. 2017 kl. 14.26 skrev atmane khiredine : > thank you very much for replying. I can not find the free space > > Here is the output of mmlsrecoverygroup > > [root at server1 ~]#mmlsrecoverygroup > > declustered > arrays with > recovery group vdisks vdisks servers > ------------------ ----------- ------ ------- > BB1RGL 3 18 server1,server2 > BB1RGR 3 18 server2,server1 > -------------------------------------------------------------- > [root at server ~]# mmlsrecoverygroup BB1RGL -L > > declustered > recovery group arrays vdisks pdisks format version > ----------------- ----------- ------ ------ -------------- > BB1RGL 3 18 119 4.2.0.1 > > declustered needs replace > scrub background activity > array service vdisks pdisks spares threshold free space > duration task progress priority > ----------- ------- ------ ------ ------ --------- ---------- > -------- ------------------------- > LOG no 1 3 0,0 1 558 GiB 14 > days scrub 51% low > DA1 no 11 58 2,31 2 12 GiB 14 > days scrub 78% low > DA2 no 6 58 2,31 2 4096 MiB 14 > days scrub 10% low > > declustered > checksum > vdisk RAID code array vdisk size block > size granularity state remarks > ------------------ ------------------ ----------- ---------- > ---------- ----------- ----- ------- > gss0_logtip 3WayReplication LOG 128 MiB 1 > MiB 512 ok logTip > gss0_loghome 4WayReplication DA1 40 GiB 1 > MiB 512 ok log > BB1RGL_GPFS4_META1 4WayReplication DA1 451 GiB 1 > MiB 32 KiB ok > BB1RGL_GPFS4_DATA1 8+2p DA1 5133 GiB 1 > MiB 32 KiB ok > BB1RGL_GPFS1_META1 4WayReplication DA1 451 GiB 1 > MiB 32 KiB ok > BB1RGL_GPFS1_DATA1 8+2p DA1 12 TiB 1 > MiB 32 KiB ok > BB1RGL_GPFS3_META1 4WayReplication DA1 451 GiB 1 > MiB 32 KiB ok > BB1RGL_GPFS3_DATA1 8+2p DA1 12 TiB 1 > MiB 32 KiB ok > BB1RGL_GPFS2_META1 4WayReplication DA1 451 GiB 1 > MiB 32 KiB ok > BB1RGL_GPFS2_DATA1 8+2p DA1 13 TiB 2 > MiB 32 KiB ok > BB1RGL_GPFS2_META2 4WayReplication DA2 451 GiB 1 > MiB 32 KiB ok > BB1RGL_GPFS2_DATA2 8+2p DA2 13 TiB 2 > MiB 32 KiB ok > BB1RGL_GPFS1_META2 4WayReplication DA2 451 GiB 1 > MiB 32 KiB ok > BB1RGL_GPFS1_DATA2 8+2p DA2 12 TiB 1 > MiB 32 KiB ok > BB1RGL_GPFS5_META1 4WayReplication DA1 750 GiB 1 > MiB 32 KiB ok > BB1RGL_GPFS5_DATA1 8+2p DA1 70 TiB 16 > MiB 32 KiB ok > BB1RGL_GPFS5_META2 4WayReplication DA2 750 GiB 1 > MiB 32 KiB ok > BB1RGL_GPFS5_DATA2 8+2p DA2 90 TiB 16 > MiB 32 KiB ok > > config data declustered array VCD spares actual rebuild > spare space remarks > ------------------ ------------------ ------------- > --------------------------------- ---------------- > rebuild space DA1 31 34 pdisk > rebuild space DA2 31 35 pdisk > > > config data max disk group fault tolerance actual disk group > fault tolerance remarks > ------------------ --------------------------------- > --------------------------------- ---------------- > rg descriptor 1 enclosure + 1 drawer 1 enclosure + 1 > drawer limiting fault tolerance > system index 2 enclosure 1 enclosure + 1 > drawer limited by rg descriptor > > vdisk max disk group fault tolerance actual disk group > fault tolerance remarks > ------------------ --------------------------------- > --------------------------------- ---------------- > gss0_logtip 2 enclosure 1 enclosure + 1 > drawer limited by rg descriptor > gss0_loghome 1 enclosure + 1 drawer 1 enclosure + 1 > drawer > BB1RGL_GPFS4_META1 1 enclosure + 1 drawer 1 enclosure + 1 > drawer > BB1RGL_GPFS4_DATA1 2 drawer 2 drawer > BB1RGL_GPFS1_META1 1 enclosure + 1 drawer 1 enclosure + 1 > drawer > BB1RGL_GPFS1_DATA1 2 drawer 2 drawer > BB1RGL_GPFS3_META1 1 enclosure + 1 drawer 1 enclosure + 1 > drawer > BB1RGL_GPFS3_DATA1 2 drawer 2 drawer > BB1RGL_GPFS2_META1 1 enclosure + 1 drawer 1 enclosure + 1 > drawer > BB1RGL_GPFS2_DATA1 2 drawer 2 drawer > BB1RGL_GPFS2_META2 3 enclosure 1 enclosure + 1 > drawer limited by rg descriptor > BB1RGL_GPFS2_DATA2 2 drawer 2 drawer > BB1RGL_GPFS1_META2 3 enclosure 1 enclosure + 1 > drawer limited by rg descriptor > BB1RGL_GPFS1_DATA2 2 drawer 2 drawer > BB1RGL_GPFS5_META1 1 enclosure + 1 drawer 1 enclosure + 1 > drawer > BB1RGL_GPFS5_DATA1 2 drawer 2 drawer > BB1RGL_GPFS5_META2 3 enclosure 1 enclosure + 1 > drawer limited by rg descriptor > BB1RGL_GPFS5_DATA2 2 drawer 2 drawer > > active recovery group server servers > ----------------------------------------------- ------- > server1 server1,server2 > > > Atmane Khiredine > HPC System Administrator | Office National de la M?t?orologie > T?l : +213 21 50 73 93 # 303 | Fax : +213 21 50 79 40 | E-mail : > a.khiredine at meteo.dz > ________________________________ > De : Laurence Horrocks-Barlow [laurence at qsplace.co.uk] > Envoy? : dimanche 9 juillet 2017 09:58 > ? : gpfsug main discussion list; atmane khiredine; > gpfsug-discuss at spectrumscale.org > Objet : Re: [gpfsug-discuss] get free space in GSS > > You can check the recovery groups to see if there is any remaining space. > > I don't have access to my test system to confirm the syntax however if > memory serves. > > Run mmlsrecoverygroup to get a list of all the recovery groups then: > > mmlsrecoverygroup -L > > This will list all your declustered arrays and their free space. > > Their might be another method, however this way has always worked well for > me. > > -- Lauz > > > > On 9 July 2017 09:00:07 BST, Atmane wrote: > > Dear all, > > My name is Khiredine Atmane and I am a HPC system administrator at the > National Office of Meteorology Algeria . We have a GSS24 running > gss2.5.10.3-3b and gpfs-4.2.0.3. > > GSS configuration: 4 enclosures, 6 SSDs, 1 empty slots, 239 disks total, 0 > NVRAM partitions > > disks = 3Tb > SSD = 200 Gb > df -h > Filesystem Size Used Avail Use% Mounted on > > /dev/gpfs1 49T 18T 31T 38% /gpfs1 > /dev/gpfs2 53T 13T 40T 25% /gpfs2 > /dev/gpfs3 25T 4.9T 20T 21% /gpfs3 > /dev/gpfs4 11T 133M 11T 1% /gpfs4 > /dev/gpfs5 323T 34T 290T 11% /gpfs5 > > Total Is 461 To > > I think we have more space > Could anyone make recommendation to troubleshoot find how many free space > in GSS ? > How to find the available space ? > Thank you! > > Atmane > > > > -- > Sent from my Android device with K-9 Mail. Please excuse my brevity. > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: From kums at us.ibm.com Sun Jul 9 17:52:02 2017 From: kums at us.ibm.com (Kumaran Rajaram) Date: Sun, 9 Jul 2017 12:52:02 -0400 Subject: [gpfsug-discuss] get free space in GSS In-Reply-To: <4B32CB5C696F2849BDEF7DF9EACE884B5B41A465@SDEB-EXC02.meteo.dz> References: , <4B32CB5C696F2849BDEF7DF9EACE884B5B41A465@SDEB-EXC02.meteo.dz> Message-ID: Hi Atmane, >> I can not find the free space Based on your output below, your setup currently has two recovery groups BB1RGL and BB1RGR. Issue "mmlsrecoverygroup BB1RGL -L" and "mmlsrecoverygroup BB1RGR -L" to obtain free space in each DA. Based on your "mmlsrecoverygroup BB1RGL -L" output below, BB1RGL "DA1" has 12GiB and "DA2" has 4GiB free space. The metadataOnly and dataOnly vdisk/NSD are created from DA1 and DA2. declustered needs replace scrub background activity array service vdisks pdisks spares threshold free space duration task progress priority ----------- ------- ------ ------ ------ --------- ---------- -------- ------------------------- LOG no 1 3 0,0 1 558 GiB 14 days scrub 51% low DA1 no 11 58 2,31 2 12 GiB 14 days scrub 78% low DA2 no 6 58 2,31 2 4096 MiB 14 days scrub 10% low In addition, you may use "mmlsnsd" to obtain mapping of file-system to vdisk/NSD + use "mmdf " command to query user or available capacity on a GPFS file system. Hope this helps, -Kums From: atmane khiredine To: Laurence Horrocks-Barlow , "gpfsug main discussion list" Date: 07/09/2017 08:27 AM Subject: Re: [gpfsug-discuss] get free space in GSS Sent by: gpfsug-discuss-bounces at spectrumscale.org thank you very much for replying. I can not find the free space Here is the output of mmlsrecoverygroup [root at server1 ~]#mmlsrecoverygroup declustered arrays with recovery group vdisks vdisks servers ------------------ ----------- ------ ------- BB1RGL 3 18 server1,server2 BB1RGR 3 18 server2,server1 -------------------------------------------------------------- [root at server ~]# mmlsrecoverygroup BB1RGL -L declustered recovery group arrays vdisks pdisks format version ----------------- ----------- ------ ------ -------------- BB1RGL 3 18 119 4.2.0.1 declustered needs replace scrub background activity array service vdisks pdisks spares threshold free space duration task progress priority ----------- ------- ------ ------ ------ --------- ---------- -------- ------------------------- LOG no 1 3 0,0 1 558 GiB 14 days scrub 51% low DA1 no 11 58 2,31 2 12 GiB 14 days scrub 78% low DA2 no 6 58 2,31 2 4096 MiB 14 days scrub 10% low declustered checksum vdisk RAID code array vdisk size block size granularity state remarks ------------------ ------------------ ----------- ---------- ---------- ----------- ----- ------- gss0_logtip 3WayReplication LOG 128 MiB 1 MiB 512 ok logTip gss0_loghome 4WayReplication DA1 40 GiB 1 MiB 512 ok log BB1RGL_GPFS4_META1 4WayReplication DA1 451 GiB 1 MiB 32 KiB ok BB1RGL_GPFS4_DATA1 8+2p DA1 5133 GiB 1 MiB 32 KiB ok BB1RGL_GPFS1_META1 4WayReplication DA1 451 GiB 1 MiB 32 KiB ok BB1RGL_GPFS1_DATA1 8+2p DA1 12 TiB 1 MiB 32 KiB ok BB1RGL_GPFS3_META1 4WayReplication DA1 451 GiB 1 MiB 32 KiB ok BB1RGL_GPFS3_DATA1 8+2p DA1 12 TiB 1 MiB 32 KiB ok BB1RGL_GPFS2_META1 4WayReplication DA1 451 GiB 1 MiB 32 KiB ok BB1RGL_GPFS2_DATA1 8+2p DA1 13 TiB 2 MiB 32 KiB ok BB1RGL_GPFS2_META2 4WayReplication DA2 451 GiB 1 MiB 32 KiB ok BB1RGL_GPFS2_DATA2 8+2p DA2 13 TiB 2 MiB 32 KiB ok BB1RGL_GPFS1_META2 4WayReplication DA2 451 GiB 1 MiB 32 KiB ok BB1RGL_GPFS1_DATA2 8+2p DA2 12 TiB 1 MiB 32 KiB ok BB1RGL_GPFS5_META1 4WayReplication DA1 750 GiB 1 MiB 32 KiB ok BB1RGL_GPFS5_DATA1 8+2p DA1 70 TiB 16 MiB 32 KiB ok BB1RGL_GPFS5_META2 4WayReplication DA2 750 GiB 1 MiB 32 KiB ok BB1RGL_GPFS5_DATA2 8+2p DA2 90 TiB 16 MiB 32 KiB ok config data declustered array VCD spares actual rebuild spare space remarks ------------------ ------------------ ------------- --------------------------------- ---------------- rebuild space DA1 31 34 pdisk rebuild space DA2 31 35 pdisk config data max disk group fault tolerance actual disk group fault tolerance remarks ------------------ --------------------------------- --------------------------------- ---------------- rg descriptor 1 enclosure + 1 drawer 1 enclosure + 1 drawer limiting fault tolerance system index 2 enclosure 1 enclosure + 1 drawer limited by rg descriptor vdisk max disk group fault tolerance actual disk group fault tolerance remarks ------------------ --------------------------------- --------------------------------- ---------------- gss0_logtip 2 enclosure 1 enclosure + 1 drawer limited by rg descriptor gss0_loghome 1 enclosure + 1 drawer 1 enclosure + 1 drawer BB1RGL_GPFS4_META1 1 enclosure + 1 drawer 1 enclosure + 1 drawer BB1RGL_GPFS4_DATA1 2 drawer 2 drawer BB1RGL_GPFS1_META1 1 enclosure + 1 drawer 1 enclosure + 1 drawer BB1RGL_GPFS1_DATA1 2 drawer 2 drawer BB1RGL_GPFS3_META1 1 enclosure + 1 drawer 1 enclosure + 1 drawer BB1RGL_GPFS3_DATA1 2 drawer 2 drawer BB1RGL_GPFS2_META1 1 enclosure + 1 drawer 1 enclosure + 1 drawer BB1RGL_GPFS2_DATA1 2 drawer 2 drawer BB1RGL_GPFS2_META2 3 enclosure 1 enclosure + 1 drawer limited by rg descriptor BB1RGL_GPFS2_DATA2 2 drawer 2 drawer BB1RGL_GPFS1_META2 3 enclosure 1 enclosure + 1 drawer limited by rg descriptor BB1RGL_GPFS1_DATA2 2 drawer 2 drawer BB1RGL_GPFS5_META1 1 enclosure + 1 drawer 1 enclosure + 1 drawer BB1RGL_GPFS5_DATA1 2 drawer 2 drawer BB1RGL_GPFS5_META2 3 enclosure 1 enclosure + 1 drawer limited by rg descriptor BB1RGL_GPFS5_DATA2 2 drawer 2 drawer active recovery group server servers ----------------------------------------------- ------- server1 server1,server2 Atmane Khiredine HPC System Administrator | Office National de la M?t?orologie T?l : +213 21 50 73 93 # 303 | Fax : +213 21 50 79 40 | E-mail : a.khiredine at meteo.dz ________________________________ De : Laurence Horrocks-Barlow [laurence at qsplace.co.uk] Envoy? : dimanche 9 juillet 2017 09:58 ? : gpfsug main discussion list; atmane khiredine; gpfsug-discuss at spectrumscale.org Objet : Re: [gpfsug-discuss] get free space in GSS You can check the recovery groups to see if there is any remaining space. I don't have access to my test system to confirm the syntax however if memory serves. Run mmlsrecoverygroup to get a list of all the recovery groups then: mmlsrecoverygroup -L This will list all your declustered arrays and their free space. Their might be another method, however this way has always worked well for me. -- Lauz On 9 July 2017 09:00:07 BST, Atmane wrote: Dear all, My name is Khiredine Atmane and I am a HPC system administrator at the National Office of Meteorology Algeria . We have a GSS24 running gss2.5.10.3-3b and gpfs-4.2.0.3. GSS configuration: 4 enclosures, 6 SSDs, 1 empty slots, 239 disks total, 0 NVRAM partitions disks = 3Tb SSD = 200 Gb df -h Filesystem Size Used Avail Use% Mounted on /dev/gpfs1 49T 18T 31T 38% /gpfs1 /dev/gpfs2 53T 13T 40T 25% /gpfs2 /dev/gpfs3 25T 4.9T 20T 21% /gpfs3 /dev/gpfs4 11T 133M 11T 1% /gpfs4 /dev/gpfs5 323T 34T 290T 11% /gpfs5 Total Is 461 To I think we have more space Could anyone make recommendation to troubleshoot find how many free space in GSS ? How to find the available space ? Thank you! Atmane -- Sent from my Android device with K-9 Mail. Please excuse my brevity. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From a.khiredine at meteo.dz Mon Jul 10 10:39:27 2017 From: a.khiredine at meteo.dz (Atmane) Date: Mon, 10 Jul 2017 10:39:27 +0100 Subject: [gpfsug-discuss] New Version Of GSS 3.1b 16-Feb-2017 Message-ID: Dear all, There is a new version of GSS Is there someone who made the update ? thanks Lenovo System x GPFS Storage Server (GSS) Version 3.1b 16-Feb-2017 What?s new in Lenovo GSS, Version 3.1 ? New features: - RHEL 7.2 ? GSS Expandability ? Online addition of more JBODs to an existing GSS building block (max. 6 JBOD total) ? Must be same JBOD type and drive type as in the existing building block ? Selectable Spectrum Scale (GPFS) software version and edition ?Four GSS tarballs, for Spectrum Scale {Standard or Advanced Edition} @ {v4.1.1 or v4.2.1} ? Hardware news: ? 10TB drive support: two JBOD MTMs (0796-HCJ/16X and 0796-HCK/17X), drive FRU (01GV110), no drive option ? Withdrawal of the 3TB drive models (0796-HC3/07X and 0796-HC4/08X) ? GSS22 in xConfig (no more need for special-bid) ? Software and firmware news: ? Update of IBM Spectrum Scale v4.2.1 to latest PTF level ? Update of Intel OPA from 10.1 to 10.2 (incl. performance fixes) ? Refresh of server and adapter FW levels to Scalable Infrastructure ?16C? recommended levels ? Not much news this time, as ?16C? FW is almost identical to ?16B - List GPFS RPM gpfs.adv-4.2.1-2.12.x86_64.rpm gpfs.base-4.2.1-2.12.x86_64.rpm gpfs.callhome-4.2.1-1.000.el7.noarch.rpm gpfs.callhome-ecc-client-4.2.1-1.000.noarch.rpm gpfs.crypto-4.2.1-2.12.x86_64.rpm gpfs.docs-4.2.1-2.12.noarch.rpm gpfs.ext-4.2.1-2.12.x86_64.rpm gpfs.gnr-4.2.1-2.12.x86_64.rpm gpfs.gnr.base-1.0.0-0.x86_64.rpm gpfs.gpl-4.2.1-2.12.noarch.rpm gpfs.gskit-8.0.50-57.x86_64.rpm gpfs.gss.firmware-4.2.0-5.x86_64.rpm gpfs.gss.pmcollector-4.2.2-2.el7.x86_64.rpm gpfs.gss.pmsensors-4.2.2-2.el7.x86_64.rpm gpfs.gui-4.2.1-2.3.noarch.rpm gpfs.java-4.2.2-2.x86_64.rpm gpfs.msg.en_US-4.2.1-2.12.noarch.rpm -- Atmane Khiredine HPC System Admin | Office National de la M?t?orologie T?l : +213 21 50 73 93 Poste 303 | Fax : +213 21 50 79 40 | E-mail : a.khiredine at meteo.dz From Greg.Lehmann at csiro.au Tue Jul 11 05:54:39 2017 From: Greg.Lehmann at csiro.au (Greg.Lehmann at csiro.au) Date: Tue, 11 Jul 2017 04:54:39 +0000 Subject: [gpfsug-discuss] Requirement to keep gpfs.smb the same on all nodes In-Reply-To: References: Message-ID: <4c9ae144c1114b85b7f2cdc27eefd749@exch1-cdc.nexus.csiro.au> Yes, although it is early days for us and I would not say we have finished testing as yet. We have upgraded twice to get there from 4.2.3-0. It seems OK and I have not noticed any changes from 4.2.3.0. Greg From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Sobey, Richard A Sent: Friday, 7 July 2017 10:02 PM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] Requirement to keep gpfs.smb the same on all nodes Just following up on this, has anyone successfully deployed Protocols (SMB) on RHEL 7.3 with the 4.2.3-2 packages? Richard From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Sobey, Richard A Sent: 04 July 2017 12:12 To: gpfsug main discussion list > Subject: Re: [gpfsug-discuss] Requirement to keep gpfs.smb the same on all nodes OK Simon, thanks. I suppose we're all in the same boat having to get change management approval etc! From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Simon Thompson (IT Research Support) Sent: 04 July 2017 12:09 To: gpfsug main discussion list > Subject: Re: [gpfsug-discuss] Requirement to keep gpfs.smb the same on all nodes AFAIK. Always. We have had the service eat itself BTW by having different code releases and trying this. Yes its a PITA that we have to get a change approval for it (so we don't do it as often as we should)... The upgrade process upgrades the SMB registry, we have also seen the CTDB lock stuff break when they are not running the same code release, so now we just don't do this. Simon From: > on behalf of "Sobey, Richard A" > Reply-To: "gpfsug-discuss at spectrumscale.org" > Date: Tuesday, 4 July 2017 at 11:54 To: "gpfsug-discuss at spectrumscale.org" > Subject: [gpfsug-discuss] Requirement to keep gpfs.smb the same on all nodes Hi all, For how long has this requirement been in force, and why? https://www.ibm.com/support/knowledgecenter/en/STXKQY_4.2.3/com.ibm.spectrum.scale.v4r23.doc/bl1ins_updatingsmb.htm All protocol nodes running the SMB service must have the same version of gpfs.smb installed at any time. This requires a brief outage of the SMB service to upgrade gpfs.smb to the newer version across all protocol nodes. The procedure outlined here is intended to reduce the outage to a minimum. Previously I've upgraded nodes one at a time over the course of a few days. Is the impact just that we won't be supported, or will a hole open up beneath my feet and swallow me whole? I really don't fancy the headache of getting approvals to get an outage of even 5 minutes at 6am.... Cheers Richard -------------- next part -------------- An HTML attachment was scrubbed... URL: From heiner.billich at psi.ch Tue Jul 11 10:36:39 2017 From: heiner.billich at psi.ch (Billich Heinrich Rainer (PSI)) Date: Tue, 11 Jul 2017 09:36:39 +0000 Subject: [gpfsug-discuss] does AFM support NFS via RDMA Message-ID: <3478917F-7864-4D84-847F-1ED249098AD9@psi.ch> Hello, We run AFM using NFS as transport between home and cache. Using IP-over-Infiniband we see a throughput between 1 and 2 GB/s. This is not bad but far from what a native IB link provides ? 6GB/s . Does AFM?s nfs client on gateway nodes support NFS using RDMA? I would like to try. Or should we try to tune nfs and the IP stack ? I wonder if anybody got throughput above 2 GB/s using IPoIB and FDR between two nodes? We can?t use a native gpfs multicluster mount ? this links home and cache much too strong: If home fails cache will unmount the cache fileset ? this is what I get from the manuals. We run spectrum scale 4.2.2/4.2.3 on Redhat 7. Thank you, Heiner Billich -- Paul Scherrer Institut Heiner Billich WHGA 106 CH 5232 Villigen 056 310 36 02 From abeattie at au1.ibm.com Tue Jul 11 11:14:37 2017 From: abeattie at au1.ibm.com (Andrew Beattie) Date: Tue, 11 Jul 2017 10:14:37 +0000 Subject: [gpfsug-discuss] does AFM support NFS via RDMA In-Reply-To: <3478917F-7864-4D84-847F-1ED249098AD9@psi.ch> References: <3478917F-7864-4D84-847F-1ED249098AD9@psi.ch> Message-ID: An HTML attachment was scrubbed... URL: From bbanister at jumptrading.com Tue Jul 11 15:46:42 2017 From: bbanister at jumptrading.com (Bryan Banister) Date: Tue, 11 Jul 2017 14:46:42 +0000 Subject: [gpfsug-discuss] does AFM support NFS via RDMA In-Reply-To: References: <3478917F-7864-4D84-847F-1ED249098AD9@psi.ch> Message-ID: <44d78fdff53f457a998f240cdf4510d0@jumptrading.com> Sounds like a very interesting topic for an upcoming GPFS UG meeting? say SC?17? -B From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Andrew Beattie Sent: Tuesday, July 11, 2017 5:15 AM To: gpfsug-discuss at spectrumscale.org Cc: gpfsug-discuss at spectrumscale.org; jake.carroll at uq.edu.au Subject: Re: [gpfsug-discuss] does AFM support NFS via RDMA Bilich, Reach out to Jake Carrol at Uni of QLD UQ have been playing with NFS over 10GB / 40GB and 100GB Ethernet and there is LOTS of tuning that you can do to improve how things work Regards, Andrew Beattie Software Defined Storage - IT Specialist Phone: 614-2133-7927 E-mail: abeattie at au1.ibm.com ----- Original message ----- From: "Billich Heinrich Rainer (PSI)" > Sent by: gpfsug-discuss-bounces at spectrumscale.org To: "gpfsug-discuss at spectrumscale.org" > Cc: Subject: [gpfsug-discuss] does AFM support NFS via RDMA Date: Tue, Jul 11, 2017 7:36 PM Hello, We run AFM using NFS as transport between home and cache. Using IP-over-Infiniband we see a throughput between 1 and 2 GB/s. This is not bad but far from what a native IB link provides ? 6GB/s . Does AFM?s nfs client on gateway nodes support NFS using RDMA? I would like to try. Or should we try to tune nfs and the IP stack ? I wonder if anybody got throughput above 2 GB/s using IPoIB and FDR between two nodes? We can?t use a native gpfs multicluster mount ? this links home and cache much too strong: If home fails cache will unmount the cache fileset ? this is what I get from the manuals. We run spectrum scale 4.2.2/4.2.3 on Redhat 7. Thank you, Heiner Billich -- Paul Scherrer Institut Heiner Billich WHGA 106 CH 5232 Villigen 056 310 36 02 _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss ________________________________ Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you are hereby notified that any review, dissemination or copying of this email is strictly prohibited, and to please notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request or solicitation of any kind to buy, sell, subscribe, redeem or perform any type of transaction of a financial product. -------------- next part -------------- An HTML attachment was scrubbed... URL: From jake.carroll at uq.edu.au Tue Jul 11 22:38:43 2017 From: jake.carroll at uq.edu.au (Jake Carroll) Date: Tue, 11 Jul 2017 21:38:43 +0000 Subject: [gpfsug-discuss] does AFM support NFS via RDMA In-Reply-To: <44d78fdff53f457a998f240cdf4510d0@jumptrading.com> References: <3478917F-7864-4D84-847F-1ED249098AD9@psi.ch> <44d78fdff53f457a998f240cdf4510d0@jumptrading.com> Message-ID: <72D0CC62-8663-4072-AFA1-735D75EEBBE1@uq.edu.au> I?ll be there! From: Bryan Banister Date: Wednesday, 12 July 2017 at 12:46 am To: gpfsug main discussion list Cc: Jake Carroll Subject: RE: [gpfsug-discuss] does AFM support NFS via RDMA Sounds like a very interesting topic for an upcoming GPFS UG meeting? say SC?17? -B From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Andrew Beattie Sent: Tuesday, July 11, 2017 5:15 AM To: gpfsug-discuss at spectrumscale.org Cc: gpfsug-discuss at spectrumscale.org; jake.carroll at uq.edu.au Subject: Re: [gpfsug-discuss] does AFM support NFS via RDMA Bilich, Reach out to Jake Carrol at Uni of QLD UQ have been playing with NFS over 10GB / 40GB and 100GB Ethernet and there is LOTS of tuning that you can do to improve how things work Regards, Andrew Beattie Software Defined Storage - IT Specialist Phone: 614-2133-7927 E-mail: abeattie at au1.ibm.com ----- Original message ----- From: "Billich Heinrich Rainer (PSI)" > Sent by: gpfsug-discuss-bounces at spectrumscale.org To: "gpfsug-discuss at spectrumscale.org" > Cc: Subject: [gpfsug-discuss] does AFM support NFS via RDMA Date: Tue, Jul 11, 2017 7:36 PM Hello, We run AFM using NFS as transport between home and cache. Using IP-over-Infiniband we see a throughput between 1 and 2 GB/s. This is not bad but far from what a native IB link provides ? 6GB/s . Does AFM?s nfs client on gateway nodes support NFS using RDMA? I would like to try. Or should we try to tune nfs and the IP stack ? I wonder if anybody got throughput above 2 GB/s using IPoIB and FDR between two nodes? We can?t use a native gpfs multicluster mount ? this links home and cache much too strong: If home fails cache will unmount the cache fileset ? this is what I get from the manuals. We run spectrum scale 4.2.2/4.2.3 on Redhat 7. Thank you, Heiner Billich -- Paul Scherrer Institut Heiner Billich WHGA 106 CH 5232 Villigen 056 310 36 02 _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss ________________________________ Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you are hereby notified that any review, dissemination or copying of this email is strictly prohibited, and to please notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request or solicitation of any kind to buy, sell, subscribe, redeem or perform any type of transaction of a financial product. -------------- next part -------------- An HTML attachment was scrubbed... URL: From kkr at lbl.gov Tue Jul 11 23:07:49 2017 From: kkr at lbl.gov (Kristy Kallback-Rose) Date: Tue, 11 Jul 2017 15:07:49 -0700 Subject: [gpfsug-discuss] does AFM support NFS via RDMA In-Reply-To: <44d78fdff53f457a998f240cdf4510d0@jumptrading.com> References: <3478917F-7864-4D84-847F-1ED249098AD9@psi.ch> <44d78fdff53f457a998f240cdf4510d0@jumptrading.com> Message-ID: <9BA6A8E3-D633-4DFF-826F-5ACE49361694@lbl.gov> Sounds good. Is someone willing to take on this talk? User-driven talks on real experiences are always welcome. Cheers, Kristy > On Jul 11, 2017, at 7:46 AM, Bryan Banister wrote: > > Sounds like a very interesting topic for an upcoming GPFS UG meeting? say SC?17? > -B > > From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org ] On Behalf Of Andrew Beattie > Sent: Tuesday, July 11, 2017 5:15 AM > To: gpfsug-discuss at spectrumscale.org > Cc: gpfsug-discuss at spectrumscale.org ; jake.carroll at uq.edu.au > Subject: Re: [gpfsug-discuss] does AFM support NFS via RDMA > > Bilich, > > Reach out to Jake Carrol at Uni of QLD > > UQ have been playing with NFS over 10GB / 40GB and 100GB Ethernet > and there is LOTS of tuning that you can do to improve how things work > > Regards, > Andrew Beattie > Software Defined Storage - IT Specialist > Phone: 614-2133-7927 > E-mail: abeattie at au1.ibm.com > > > ----- Original message ----- > From: "Billich Heinrich Rainer (PSI)" > > Sent by: gpfsug-discuss-bounces at spectrumscale.org > To: "gpfsug-discuss at spectrumscale.org " > > Cc: > Subject: [gpfsug-discuss] does AFM support NFS via RDMA > Date: Tue, Jul 11, 2017 7:36 PM > > Hello, > > We run AFM using NFS as transport between home and cache. Using IP-over-Infiniband we see a throughput between 1 and 2 GB/s. This is not bad but far from what a native IB link provides ? 6GB/s . Does AFM?s nfs client on gateway nodes support NFS using RDMA? I would like to try. Or should we try to tune nfs and the IP stack ? I wonder if anybody got throughput above 2 GB/s using IPoIB and FDR between two nodes? > > We can?t use a native gpfs multicluster mount ? this links home and cache much too strong: If home fails cache will unmount the cache fileset ? this is what I get from the manuals. > > We run spectrum scale 4.2.2/4.2.3 on Redhat 7. > > Thank you, > > Heiner Billich > > -- > Paul Scherrer Institut > Heiner Billich > WHGA 106 > CH 5232 Villigen > 056 310 36 02 > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > > Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you are hereby notified that any review, dissemination or copying of this email is strictly prohibited, and to please notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request or solicitation of any kind to buy, sell, subscribe, redeem or perform any type of transaction of a financial product. > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From Robert.Oesterlin at nuance.com Wed Jul 12 17:06:40 2017 From: Robert.Oesterlin at nuance.com (Oesterlin, Robert) Date: Wed, 12 Jul 2017 16:06:40 +0000 Subject: [gpfsug-discuss] WekaIO Unveils Cloud-Native Scalable File System Message-ID: Interesting. Performance is one thing, but how usable. IBM, watch your back :-) ?WekaIO is the world?s fastest distributed file system, processing four times the workload compared to IBM Spectrum Scale measured on Standard Performance Evaluation Corp. (SPEC) SFS 2014, an independent industry benchmark. Utilizing only 120 cloud compute instances with locally attached storage, WekaIO completed 1,000 simultaneous software builds compared to 240 on IBM?s high-end FlashSystem 900.? https://www.hpcwire.com/off-the-wire/wekaio-unveils-cloud-native-scalable-file-system/ Bob Oesterlin Sr Principal Storage Engineer, Nuance 507-269-0413 -------------- next part -------------- An HTML attachment was scrubbed... URL: From oehmes at gmail.com Wed Jul 12 18:24:19 2017 From: oehmes at gmail.com (Sven Oehme) Date: Wed, 12 Jul 2017 17:24:19 +0000 Subject: [gpfsug-discuss] WekaIO Unveils Cloud-Native Scalable File System In-Reply-To: References: Message-ID: while i really like competition on SpecSFS, the claims from the WekaIO people are lets say 'alternative facts' at best The Spectrum Scale results were done on 4 Nodes with 2 Flash Storage devices attached, they compare this to a WekaIO system with 14 times more memory (14 TB vs 1TB) , 120 SSD's (vs 64 Flashcore Modules) across 15 times more compute nodes (60 vs 4) . said all this, the article claims 1000 builds, while the actual submission only delivers 500 --> https://www.spec.org/sfs2014/results/sfs2014.html so they need 14 times more memory and cores and 2 times flash to show twice as many builds at double the response time, i leave this to everybody who understands this facts to judge how great that result really is. Said all this, Spectrum Scale scales almost linear if you double the nodes , network and storage accordingly, so there is no reason to believe we couldn't easily beat this, its just a matter of assemble the HW in a lab and run the test. btw we scale to 10k+ nodes , 2500 times the number we used in our publication :-D Sven On Wed, Jul 12, 2017 at 9:06 AM Oesterlin, Robert < Robert.Oesterlin at nuance.com> wrote: > Interesting. Performance is one thing, but how usable. IBM, watch your > back :-) > > > > *?WekaIO is the world?s fastest distributed file system, processing four > times the workload compared to IBM Spectrum Scale measured on Standard > Performance Evaluation Corp. (SPEC) SFS 2014, an independent industry > benchmark. Utilizing only 120 cloud compute instances with locally attached > storage, WekaIO completed 1,000 simultaneous software builds compared to > 240 on IBM?s high-end FlashSystem 900.?* > > > > > https://www.hpcwire.com/off-the-wire/wekaio-unveils-cloud-native-scalable-file-system/ > > > > Bob Oesterlin > Sr Principal Storage Engineer, Nuance > 507-269-0413 <(507)%20269-0413> > > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ewahl at osc.edu Wed Jul 12 19:20:06 2017 From: ewahl at osc.edu (Edward Wahl) Date: Wed, 12 Jul 2017 14:20:06 -0400 Subject: [gpfsug-discuss] WekaIO Unveils Cloud-Native Scalable File System In-Reply-To: References: Message-ID: <20170712142006.297cc9f2@osc.edu> Ah benchmarks... There are Lies, damn Lies, and then benchmarks. I've been in HPC a while on both the vendor (Cray) and customer side, and until I see Lustre, BeeGFS, Spectrum Scale, StorNext, OrangeFS, CEPH, Gluster, 'Flash in the pan v1', etc. all run on the EXACT same hardware I take ALL benchmarks with a POUND of salt. Too easy to finagle whatever result you want. Besides, benchmarks and real world performance are vastly different unless you are using IO kernels based on your local apps as your benchmark. I have a feeling MANY of the folks on this list feel similarly. ;) I recall when we figured out how someone cheated a SPEC test once by only using the inner-track of drives. ^_^ Ed On Wed, 12 Jul 2017 16:06:40 +0000 "Oesterlin, Robert" wrote: > Interesting. Performance is one thing, but how usable. IBM, watch your > back :-) > > ?WekaIO is the world?s fastest distributed file system, processing four times > the workload compared to IBM Spectrum Scale measured on Standard Performance > Evaluation Corp. (SPEC) SFS 2014, an independent industry benchmark. > Utilizing only 120 cloud compute instances with locally attached storage, > WekaIO completed 1,000 simultaneous software builds compared to 240 on IBM?s > high-end FlashSystem 900.? > > https://www.hpcwire.com/off-the-wire/wekaio-unveils-cloud-native-scalable-file-system/ > > Bob Oesterlin > Sr Principal Storage Engineer, Nuance > 507-269-0413 -- Ed Wahl Ohio Supercomputer Center 614-292-9302 From r.sobey at imperial.ac.uk Wed Jul 12 19:20:32 2017 From: r.sobey at imperial.ac.uk (Sobey, Richard A) Date: Wed, 12 Jul 2017 18:20:32 +0000 Subject: [gpfsug-discuss] WekaIO Unveils Cloud-Native Scalable File System In-Reply-To: References: Message-ID: I'm reading it as "WeakIO" which probably isn't a good thing.. both in the context of my eyesight and the negative connotation of the product :) ________________________________ From: gpfsug-discuss-bounces at spectrumscale.org on behalf of Oesterlin, Robert Sent: 12 July 2017 17:06 To: gpfsug main discussion list Subject: [gpfsug-discuss] WekaIO Unveils Cloud-Native Scalable File System Interesting. Performance is one thing, but how usable. IBM, watch your back :-) ?WekaIO is the world?s fastest distributed file system, processing four times the workload compared to IBM Spectrum Scale measured on Standard Performance Evaluation Corp. (SPEC) SFS 2014, an independent industry benchmark. Utilizing only 120 cloud compute instances with locally attached storage, WekaIO completed 1,000 simultaneous software builds compared to 240 on IBM?s high-end FlashSystem 900.? https://www.hpcwire.com/off-the-wire/wekaio-unveils-cloud-native-scalable-file-system/ Bob Oesterlin Sr Principal Storage Engineer, Nuance 507-269-0413 -------------- next part -------------- An HTML attachment was scrubbed... URL: From Robert.Oesterlin at nuance.com Wed Jul 12 19:27:12 2017 From: Robert.Oesterlin at nuance.com (Oesterlin, Robert) Date: Wed, 12 Jul 2017 18:27:12 +0000 Subject: [gpfsug-discuss] WekaIO Unveils Cloud-Native Scalable File System Message-ID: <92349D18-3614-4235-B30C-ADCCE3782CDD@nuance.com> Ah yes - Sven keeping us honest! Bob Oesterlin Sr Principal Storage Engineer, Nuance From: on behalf of Sven Oehme Reply-To: gpfsug main discussion list Date: Wednesday, July 12, 2017 at 12:24 PM To: gpfsug main discussion list Subject: [EXTERNAL] Re: [gpfsug-discuss] WekaIO Unveils Cloud-Native Scalable File System while i really like competition on SpecSFS, the claims from the WekaIO people are lets say 'alternative facts' at best The Spectrum Scale results were done on 4 Nodes with 2 Flash Storage devices attached, they compare this to a WekaIO system with 14 times more memory (14 TB vs 1TB) , 120 SSD's (vs 64 Flashcore Modules) across 15 times more compute nodes (60 vs 4) . said all this, the article claims 1000 builds, while the actual submission only delivers 500 --> https://www.spec.org/sfs2014/results/sfs2014.html -------------- next part -------------- An HTML attachment was scrubbed... URL: From sannaik2 at in.ibm.com Fri Jul 14 06:55:30 2017 From: sannaik2 at in.ibm.com (Sandeep Naik1) Date: Fri, 14 Jul 2017 11:25:30 +0530 Subject: [gpfsug-discuss] get free space in GSS In-Reply-To: References: , <4B32CB5C696F2849BDEF7DF9EACE884B5B41A465@SDEB-EXC02.meteo.dz> Message-ID: Hi Atmane, There can be two meaning of available free space? One what is available on existing filesystem. For this you rightly referred to df -h command o/p. This is the actual free space available in already created filesystem. Filesystem Size Used Avail Use% Mounted on /dev/gpfs1 49T 18T 31T 38% /gpfs1 /dev/gpfs2 53T 13T 40T 25% /gpfs2 /dev/gpfs3 25T 4.9T 20T 21% /gpfs3 /dev/gpfs4 11T 133M 11T 1% /gpfs4 /dev/gpfs5 323T 34T 290T 11% /gpfs5 The other is free space available in DA. For which as every one said use mmlsrecoverygroup -L Please note that is will give you raw free capacity. For usable free capacity in DA you have to add RAID over head. But based on your o/p you have very little/no free space left in DA. [root at server1 ~]#mmlsrecoverygroup declustered arrays with recovery group vdisks vdisks servers ------------------ ----------- ------ ------- BB1RGL 3 18 server1,server2 BB1RGR 3 18 server2,server1 -------------------------------------------------------------- [root at server ~]# mmlsrecoverygroup BB1RGL -L declustered recovery group arrays vdisks pdisks format version ----------------- ----------- ------ ------ -------------- BB1RGL 3 18 119 4.2.0.1 declustered needs replace scrub background activity array service vdisks pdisks spares threshold free space duration task progress priority ----------- ------- ------ ------ ------ --------- ---------- -------- ------------------------- LOG no 1 3 0,0 1 558 GiB 14 days scrub 51% low DA1 no 11 58 2,31 2 12 GiB 14 days scrub 78% low DA2 no 6 58 2,31 2 4096 MiB 14 days scrub 10% low Thanks, Sandeep Naik Elastic Storage server / GPFS Test ETZ-B, Hinjewadi Pune India (+91) 8600994314 From: "Kumaran Rajaram" To: gpfsug main discussion list , atmane khiredine Date: 09/07/2017 10:22 PM Subject: Re: [gpfsug-discuss] get free space in GSS Sent by: gpfsug-discuss-bounces at spectrumscale.org Hi Atmane, >> I can not find the free space Based on your output below, your setup currently has two recovery groups BB1RGL and BB1RGR. Issue "mmlsrecoverygroup BB1RGL -L" and "mmlsrecoverygroup BB1RGR -L" to obtain free space in each DA. Based on your "mmlsrecoverygroup BB1RGL -L" output below, BB1RGL "DA1" has 12GiB and "DA2" has 4GiB free space. The metadataOnly and dataOnly vdisk/NSD are created from DA1 and DA2. declustered needs replace scrub background activity array service vdisks pdisks spares threshold free space duration task progress priority ----------- ------- ------ ------ ------ --------- ---------- -------- ------------------------- LOG no 1 3 0,0 1 558 GiB 14 days scrub 51% low DA1 no 11 58 2,31 2 12 GiB 14 days scrub 78% low DA2 no 6 58 2,31 2 4096 MiB 14 days scrub 10% low In addition, you may use "mmlsnsd" to obtain mapping of file-system to vdisk/NSD + use "mmdf " command to query user or available capacity on a GPFS file system. Hope this helps, -Kums From: atmane khiredine To: Laurence Horrocks-Barlow , "gpfsug main discussion list" Date: 07/09/2017 08:27 AM Subject: Re: [gpfsug-discuss] get free space in GSS Sent by: gpfsug-discuss-bounces at spectrumscale.org thank you very much for replying. I can not find the free space Here is the output of mmlsrecoverygroup [root at server1 ~]#mmlsrecoverygroup declustered arrays with recovery group vdisks vdisks servers ------------------ ----------- ------ ------- BB1RGL 3 18 server1,server2 BB1RGR 3 18 server2,server1 -------------------------------------------------------------- [root at server ~]# mmlsrecoverygroup BB1RGL -L declustered recovery group arrays vdisks pdisks format version ----------------- ----------- ------ ------ -------------- BB1RGL 3 18 119 4.2.0.1 declustered needs replace scrub background activity array service vdisks pdisks spares threshold free space duration task progress priority ----------- ------- ------ ------ ------ --------- ---------- -------- ------------------------- LOG no 1 3 0,0 1 558 GiB 14 days scrub 51% low DA1 no 11 58 2,31 2 12 GiB 14 days scrub 78% low DA2 no 6 58 2,31 2 4096 MiB 14 days scrub 10% low declustered checksum vdisk RAID code array vdisk size block size granularity state remarks ------------------ ------------------ ----------- ---------- ---------- ----------- ----- ------- gss0_logtip 3WayReplication LOG 128 MiB 1 MiB 512 ok logTip gss0_loghome 4WayReplication DA1 40 GiB 1 MiB 512 ok log BB1RGL_GPFS4_META1 4WayReplication DA1 451 GiB 1 MiB 32 KiB ok BB1RGL_GPFS4_DATA1 8+2p DA1 5133 GiB 1 MiB 32 KiB ok BB1RGL_GPFS1_META1 4WayReplication DA1 451 GiB 1 MiB 32 KiB ok BB1RGL_GPFS1_DATA1 8+2p DA1 12 TiB 1 MiB 32 KiB ok BB1RGL_GPFS3_META1 4WayReplication DA1 451 GiB 1 MiB 32 KiB ok BB1RGL_GPFS3_DATA1 8+2p DA1 12 TiB 1 MiB 32 KiB ok BB1RGL_GPFS2_META1 4WayReplication DA1 451 GiB 1 MiB 32 KiB ok BB1RGL_GPFS2_DATA1 8+2p DA1 13 TiB 2 MiB 32 KiB ok BB1RGL_GPFS2_META2 4WayReplication DA2 451 GiB 1 MiB 32 KiB ok BB1RGL_GPFS2_DATA2 8+2p DA2 13 TiB 2 MiB 32 KiB ok BB1RGL_GPFS1_META2 4WayReplication DA2 451 GiB 1 MiB 32 KiB ok BB1RGL_GPFS1_DATA2 8+2p DA2 12 TiB 1 MiB 32 KiB ok BB1RGL_GPFS5_META1 4WayReplication DA1 750 GiB 1 MiB 32 KiB ok BB1RGL_GPFS5_DATA1 8+2p DA1 70 TiB 16 MiB 32 KiB ok BB1RGL_GPFS5_META2 4WayReplication DA2 750 GiB 1 MiB 32 KiB ok BB1RGL_GPFS5_DATA2 8+2p DA2 90 TiB 16 MiB 32 KiB ok config data declustered array VCD spares actual rebuild spare space remarks ------------------ ------------------ ------------- --------------------------------- ---------------- rebuild space DA1 31 34 pdisk rebuild space DA2 31 35 pdisk config data max disk group fault tolerance actual disk group fault tolerance remarks ------------------ --------------------------------- --------------------------------- ---------------- rg descriptor 1 enclosure + 1 drawer 1 enclosure + 1 drawer limiting fault tolerance system index 2 enclosure 1 enclosure + 1 drawer limited by rg descriptor vdisk max disk group fault tolerance actual disk group fault tolerance remarks ------------------ --------------------------------- --------------------------------- ---------------- gss0_logtip 2 enclosure 1 enclosure + 1 drawer limited by rg descriptor gss0_loghome 1 enclosure + 1 drawer 1 enclosure + 1 drawer BB1RGL_GPFS4_META1 1 enclosure + 1 drawer 1 enclosure + 1 drawer BB1RGL_GPFS4_DATA1 2 drawer 2 drawer BB1RGL_GPFS1_META1 1 enclosure + 1 drawer 1 enclosure + 1 drawer BB1RGL_GPFS1_DATA1 2 drawer 2 drawer BB1RGL_GPFS3_META1 1 enclosure + 1 drawer 1 enclosure + 1 drawer BB1RGL_GPFS3_DATA1 2 drawer 2 drawer BB1RGL_GPFS2_META1 1 enclosure + 1 drawer 1 enclosure + 1 drawer BB1RGL_GPFS2_DATA1 2 drawer 2 drawer BB1RGL_GPFS2_META2 3 enclosure 1 enclosure + 1 drawer limited by rg descriptor BB1RGL_GPFS2_DATA2 2 drawer 2 drawer BB1RGL_GPFS1_META2 3 enclosure 1 enclosure + 1 drawer limited by rg descriptor BB1RGL_GPFS1_DATA2 2 drawer 2 drawer BB1RGL_GPFS5_META1 1 enclosure + 1 drawer 1 enclosure + 1 drawer BB1RGL_GPFS5_DATA1 2 drawer 2 drawer BB1RGL_GPFS5_META2 3 enclosure 1 enclosure + 1 drawer limited by rg descriptor BB1RGL_GPFS5_DATA2 2 drawer 2 drawer active recovery group server servers ----------------------------------------------- ------- server1 server1,server2 Atmane Khiredine HPC System Administrator | Office National de la M?t?orologie T?l : +213 21 50 73 93 # 303 | Fax : +213 21 50 79 40 | E-mail : a.khiredine at meteo.dz ________________________________ De : Laurence Horrocks-Barlow [laurence at qsplace.co.uk] Envoy? : dimanche 9 juillet 2017 09:58 ? : gpfsug main discussion list; atmane khiredine; gpfsug-discuss at spectrumscale.org Objet : Re: [gpfsug-discuss] get free space in GSS You can check the recovery groups to see if there is any remaining space. I don't have access to my test system to confirm the syntax however if memory serves. Run mmlsrecoverygroup to get a list of all the recovery groups then: mmlsrecoverygroup -L This will list all your declustered arrays and their free space. Their might be another method, however this way has always worked well for me. -- Lauz On 9 July 2017 09:00:07 BST, Atmane wrote: Dear all, My name is Khiredine Atmane and I am a HPC system administrator at the National Office of Meteorology Algeria . We have a GSS24 running gss2.5.10.3-3b and gpfs-4.2.0.3. GSS configuration: 4 enclosures, 6 SSDs, 1 empty slots, 239 disks total, 0 NVRAM partitions disks = 3Tb SSD = 200 Gb df -h Filesystem Size Used Avail Use% Mounted on /dev/gpfs1 49T 18T 31T 38% /gpfs1 /dev/gpfs2 53T 13T 40T 25% /gpfs2 /dev/gpfs3 25T 4.9T 20T 21% /gpfs3 /dev/gpfs4 11T 133M 11T 1% /gpfs4 /dev/gpfs5 323T 34T 290T 11% /gpfs5 Total Is 461 To I think we have more space Could anyone make recommendation to troubleshoot find how many free space in GSS ? How to find the available space ? Thank you! Atmane -- Sent from my Android device with K-9 Mail. Please excuse my brevity. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From S.J.Thompson at bham.ac.uk Mon Jul 17 13:13:58 2017 From: S.J.Thompson at bham.ac.uk (Simon Thompson (IT Research Support)) Date: Mon, 17 Jul 2017 12:13:58 +0000 Subject: [gpfsug-discuss] Job Vacancy: Research Storage Systems Senior Specialist/Specialist Message-ID: Hi all, Members of this group may be particularly interested in the role "Research Storage Systems Senior Specialist/Specialist"... As part of the University of Birmingham's investment in our ability to support outstanding research by providing technical computing facilities, we are expanding the team and currently have 6 vacancies. I've provided a short description of each post, but please do follow the links where you will find the full job description attached at the bottom of the page. For some of the posts, they are graded either at 7 or 8 and will be appointed based upon skills and experience, the expectation is that if the appointment is made at grade 7 that as the successful candidate grows into the role, we should be able to regrade up. Research Storage Systems Senior Specialist/Specialist: https://goo.gl/NsL1EG Responsible for the delivery and maintenance of research storage systems, focussed on the delivery of Spectrum Scale storage systems and data protection. (this is available either as a grade 8 or grade 7 post depending on skills and experience so may suit someone wishing to grow into the senior role) HPC Specialist post (Research Systems Administrator / Senior Research Systems Administrator): https://goo.gl/1SxM4j Helping to deliver and operationally support the technical computing environments, with a focus on supporting and delivery of HPC and HTC services. (this is available either as a grade 7 or grade 8 post depending on skills and experience so may suit someone wishing to grow into the senior role) Research Computing (Analytics): https://goo.gl/uCNdMH Helping our researchers to understand data analytics and supporting their research Senior Research Software Engineer: https://goo.gl/dcGgAz Working with research groups to develop and deliver bespoke software solutions to support their research Research Training and Engagement Officer: https://goo.gl/U48m7z Helping with the delivery and coordination of training and engagement works to support users helping ensure they are able to use the facilities to support their research. Research IT Partner in the College of Arts and Law: https://goo.gl/A7czEA Providing technical knowledge and skills to support project delivery through research bid preparation to successful solution delivery. Simon From cgirda at wustl.edu Mon Jul 17 20:40:42 2017 From: cgirda at wustl.edu (Chakravarthy Girda) Date: Mon, 17 Jul 2017 14:40:42 -0500 Subject: [gpfsug-discuss] Setting up IBM Spectrum Scale performance monitoring bridge for Grafana Message-ID: Hello Team, This is Chakri from Washu at STL. Thank you for the great opportunity to join this group. I am trying to setup performance monitoring for our GPFS cluster. As part of the project configured pmcollector and pmsensors on our GPFS cluster. 1. Created a 'spectrumscale' data-source bridge on our grafana ( NOT SET TO DEFAULT ) https://www.ibm.com/support/knowledgecenter/en/STXKQY_4.2.3/com.ibm.spectrum.scale.v4r23.doc/bl1adv_setuppmbridgeforgrafana.htm 2. Created a new dash-board by importing the pre-built dashboard. https://www.ibm.com/developerworks/community/wikis/home?lang=en#!/wiki/General%20Parallel%20File%20System%20%28GPFS%29/page/Importing%20predefined%20Grafana%20dashboards Here is the issue. I don't get any graph updates if I don't set "spectrumscale" as DEFAULT data-source but that is breaking rest of the graphs ( we have ton of dashboards). So I had to uncheck the "spectrumscale" as default data-source. If I go and set the "data-source" manually to "spectrumscale" on the pre-built dashboard graphs. I see the wheel spinning but no updates. Any ideas? Thank you Chakri From Robert.Oesterlin at nuance.com Tue Jul 18 12:45:38 2017 From: Robert.Oesterlin at nuance.com (Oesterlin, Robert) Date: Tue, 18 Jul 2017 11:45:38 +0000 Subject: [gpfsug-discuss] Setting up IBM Spectrum Scale performance monitoring bridge for Grafana Message-ID: Hi Chakri If you?re getting the ?ole ?spinning wheel? on your dashboard, then it?s one of two things: 1) The Grafana bridge is not running 2) The dashboard is requesting a metric that isn?t available. Assuming that you?ve verified that the pmcollector/pmsensor setup is work right in your cluster, I?d then start looking at the log files for the Grafana Bridge and the pmcollector to see if you can determine if either is producing an error - like the metric wasn?t found. The other thing to try is setup a small test graph with a known metric being collected by you pmsensor configuration, rather than try one of Helene?s default dashboards, which are fairly complex. Drop me a note directly if you need to. Bob Oesterlin Sr Principal Storage Engineer, Nuance From cgirda at wustl.edu Tue Jul 18 15:57:05 2017 From: cgirda at wustl.edu (Chakravarthy Girda) Date: Tue, 18 Jul 2017 09:57:05 -0500 Subject: [gpfsug-discuss] Setting up IBM Spectrum Scale performance monitoring bridge for Grafana In-Reply-To: References: Message-ID: Bob, Found the issue to be with https is getting blocked with "direct" connection. Switched it to proxy on the bridge-port. That helped and now I can see graphs. Thank you Chakri On 7/18/17 6:45 AM, Oesterlin, Robert wrote: > Hi Chakri > > If you?re getting the ?ole ?spinning wheel? on your dashboard, then it?s one of two things: > > 1) The Grafana bridge is not running > 2) The dashboard is requesting a metric that isn?t available. > > Assuming that you?ve verified that the pmcollector/pmsensor setup is work right in your cluster, I?d then start looking at the log files for the Grafana Bridge and the pmcollector to see if you can determine if either is producing an error - like the metric wasn?t found. The other thing to try is setup a small test graph with a known metric being collected by you pmsensor configuration, rather than try one of Helene?s default dashboards, which are fairly complex. > > Drop me a note directly if you need to. > > Bob Oesterlin > Sr Principal Storage Engineer, Nuance > > > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss From david_johnson at brown.edu Tue Jul 18 18:21:06 2017 From: david_johnson at brown.edu (David Johnson) Date: Tue, 18 Jul 2017 13:21:06 -0400 Subject: [gpfsug-discuss] mmsysmon.py revisited Message-ID: We also noticed a fair amount of CPU time accumulated by mmsysmon.py on our diskless compute nodes. I read the earlier query, where it was answered: > ces == Cluster Export Services, mmsysmon.py comes from mmcesmon. It is used for managing export services of GPFS. If it is killed, your nfs/smb etc will be out of work. > Their overhead is small and they are very important. Don't attempt to kill them. > Our question is this ? we don?t run the latest ?protocols", our NFS is CNFS, and our CIFS is clustered CIFS. I can understand it might be needed with Ganesha, but on every node? Why in the world would I be getting this daemon running on all client nodes, when I didn?t install the ?protocols" version of the distribution? We have release 4.2.2 at the moment. How can we disable this? Thanks, ? ddj -------------- next part -------------- An HTML attachment was scrubbed... URL: From jonathon.anderson at colorado.edu Tue Jul 18 18:51:21 2017 From: jonathon.anderson at colorado.edu (Jonathon A Anderson) Date: Tue, 18 Jul 2017 17:51:21 +0000 Subject: [gpfsug-discuss] mmsysmon.py revisited In-Reply-To: References: Message-ID: There?s no official way to cleanly disable it so far as I know yet; but you can defacto disable it by deleting /var/mmfs/mmsysmon/mmsysmonitor.conf. It?s a huge problem. I don?t understand why it hasn?t been given much credit by dev or support. ~jonathon On 7/18/17, 11:21 AM, "gpfsug-discuss-bounces at spectrumscale.org on behalf of David Johnson" wrote: We also noticed a fair amount of CPU time accumulated by mmsysmon.py on our diskless compute nodes. I read the earlier query, where it was answered: ces == Cluster Export Services, mmsysmon.py comes from mmcesmon. It is used for managing export services of GPFS. If it is killed, your nfs/smb etc will be out of work. Their overhead is small and they are very important. Don't attempt to kill them. Our question is this ? we don?t run the latest ?protocols", our NFS is CNFS, and our CIFS is clustered CIFS. I can understand it might be needed with Ganesha, but on every node? Why in the world would I be getting this daemon running on all client nodes, when I didn?t install the ?protocols" version of the distribution? We have release 4.2.2 at the moment. How can we disable this? Thanks, ? ddj From S.J.Thompson at bham.ac.uk Tue Jul 18 20:21:46 2017 From: S.J.Thompson at bham.ac.uk (Simon Thompson (IT Research Support)) Date: Tue, 18 Jul 2017 19:21:46 +0000 Subject: [gpfsug-discuss] SOBAR questions In-Reply-To: References: Message-ID: So just following up on my questions from January. We tried to do 2. I.e. Restore to a new file-system with different block sizes. It got part way through creating the file-sets on the new SOBAR file-system and then GPFS asserts and crashes... We weren't actually intentionally trying to move block sizes, but because we were restoring from a traditional SAN based system to a shiny new GNR based system, we'd manually done the FS create steps. I have a PMR open now. I don't know if someone internally in IBM actually tried this after my emails, as apparently there is a similar internal defect which is ~6 months old... Simon From: > on behalf of Marc A Kaplan > Reply-To: "gpfsug-discuss at spectrumscale.org" > Date: Friday, 20 January 2017 at 17:57 To: "gpfsug-discuss at spectrumscale.org" > Subject: Re: [gpfsug-discuss] SOBAR questions I worked on some aspects of SOBAR, but without studying and testing the commands - I'm not in a position right now to give simple definitive answers - having said that.... Generally your questions are reasonable and the answer is: "Yes it should be possible to do that, but you might be going a bit beyond the design point.., so you'll need to try it out on a (smaller) test system with some smaller tedst files. Point by point. 1. If SOBAR is unable to restore a particular file, perhaps because the premigration did not complete -- you should only lose that particular file, and otherwise "keep going". 2. I think SOBAR helps you build a similar file system to the original, including block sizes. So you'd have to go in and tweak the file system creation step(s). I think this is reasonable... If you hit a problem... IMO that would be a fair APAR. 3. Similar to 2. From: "Simon Thompson (Research Computing - IT Services)" > To: "gpfsug-discuss at spectrumscale.org" > Date: 01/20/2017 10:44 AM Subject: [gpfsug-discuss] SOBAR questions Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ We've recently been looking at deploying SOBAR to support DR of some of our file-systems, I have some questions (as ever!) that I can't see are clearly documented, so was wondering if anyone has any insight on this. 1. If we elect not to premigrate certain files, are we still able to use SOBAR? We are happy to take a hit that those files will never be available again, but some are multi TB files which change daily and we can't stream to tape effectively. 2. When doing a restore, does the block size of the new SOBAR'd to file-system have to match? For example the old FS was 1MB blocks, the new FS we create with 2MB blocks. Will this work (this strikes me as one way we might be able to migrate an FS to a new block size?)? 3. If the file-system was originally created with an older GPFS code but has since been upgraded, does restore work, and does it matter what client code? E.g. We have a file-system that was originally 3.5.x, its been upgraded over time to 4.2.2.0. Will this work if the client code was say 4.2.2.5 (with an appropriate FS version). E.g. Mmlsfs lists, "13.01 (3.5.0.0) Original file system version" and "16.00 (4.2.2.0) Current file system version". Say there was 4.2.2.5 which created version 16.01 file-system as the new FS, what would happen? This sort of detail is missing from: https://www.ibm.com/support/knowledgecenter/STXKQY_4.2.2/com.ibm.spectrum.s cale.v4r22.doc/bl1adv_sobarrestore.htm But is probably quite important for us to know! Thanks Simon _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From leslie.james.elliott at gmail.com Wed Jul 19 08:22:49 2017 From: leslie.james.elliott at gmail.com (leslie elliott) Date: Wed, 19 Jul 2017 17:22:49 +1000 Subject: [gpfsug-discuss] AFM over NFS Message-ID: we are having a problem linking a target to a fileset we are able to manually connect with NFSv4 to the correct path on an NFS export down a particular subdirectory path, but when when we create a fileset with this same path as an afmTarget it connects with NFSv3 and actually connects to the top of the export even though mmafmctl displays the extended path information are we able to tell AFM to connect with NFSv4 in any way to work around this problem the NFS comes from a closed system, we can not change the configuration on it to fix the problem on the target thanks leslie -------------- next part -------------- An HTML attachment was scrubbed... URL: From Greg.Lehmann at csiro.au Wed Jul 19 08:53:58 2017 From: Greg.Lehmann at csiro.au (Greg.Lehmann at csiro.au) Date: Wed, 19 Jul 2017 07:53:58 +0000 Subject: [gpfsug-discuss] Scale / Perfmon / Grafana - services running but no data In-Reply-To: References: Message-ID: I?m having a play with this now too. Has anybody coded a systemd unit to handle step 2b in the knowledge centre article ? bridge creation on the gpfs side? It would save me a bit of effort. I?m also wondering about the CherryPy version. It looks like this has been developed on SLES which has the newer version mentioned as a standard package and yet RHEL with an older version of CherryPy is perhaps more common as it seems to have the best support for features of GPFS, like object and block protocols. Maybe SLES is in favour now? Cheers, Greg From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Andrew Beattie Sent: Thursday, 6 July 2017 3:07 PM To: gpfsug-discuss at spectrumscale.org Subject: [gpfsug-discuss] Scale / Perfmon / Grafana - services running but no data Greetings, I'm currently setting up Grafana to interact with one of our Scale Clusters and i've followed the knowledge centre link in terms of setup. https://www.ibm.com/support/knowledgecenter/en/STXKQY_4.2.3/com.ibm.spectrum.scale.v4r23.doc/bl1adv_setuppmbridgeforgrafana.htm However while everything appears to be working i'm not seeing any data coming through the reports within the grafana server, even though I can see data in the Scale GUI The current environment: [root at sc01n02 ~]# mmlscluster GPFS cluster information ======================== GPFS cluster name: sc01.spectrum GPFS cluster id: 18085710661892594990 GPFS UID domain: sc01.spectrum Remote shell command: /usr/bin/ssh Remote file copy command: /usr/bin/scp Repository type: CCR Node Daemon node name IP address Admin node name Designation ------------------------------------------------------------------ 1 sc01n01 10.2.12.11 sc01n01 quorum-manager-perfmon 2 sc01n02 10.2.12.12 sc01n02 quorum-manager-perfmon 3 sc01n03 10.2.12.13 sc01n03 quorum-manager-perfmon [root at sc01n02 ~]# [root at sc01n02 ~]# mmlsconfig Configuration data for cluster sc01.spectrum: --------------------------------------------- clusterName sc01.spectrum clusterId 18085710661892594990 autoload yes profile gpfsProtocolDefaults dmapiFileHandleSize 32 minReleaseLevel 4.2.2.0 ccrEnabled yes cipherList AUTHONLY maxblocksize 16M [cesNodes] maxMBpS 5000 numaMemoryInterleave yes enforceFilesetQuotaOnRoot yes workerThreads 512 [common] tscCmdPortRange 60000-61000 cesSharedRoot /ibm/cesSharedRoot/ces cifsBypassTraversalChecking yes syncSambaMetadataOps yes cifsBypassShareLocksOnRename yes adminMode central File systems in cluster sc01.spectrum: -------------------------------------- /dev/cesSharedRoot /dev/icos_demo /dev/scale01 [root at sc01n02 ~]# [root at sc01n02 ~]# systemctl status pmcollector ? pmcollector.service - LSB: Start the ZIMon performance monitor collector. Loaded: loaded (/etc/rc.d/init.d/pmcollector) Active: active (running) since Tue 2017-05-30 08:46:32 AEST; 1 months 6 days ago Docs: man:systemd-sysv-generator(8) Main PID: 2693 (ZIMonCollector) CGroup: /system.slice/pmcollector.service ??2693 /opt/IBM/zimon/ZIMonCollector -C /opt/IBM/zimon/ZIMonCollector.cfg... ??2698 python /opt/IBM/zimon/bin/pmmonitor.py -f /opt/IBM/zimon/syshealth... May 30 08:46:32 sc01n02 systemd[1]: Starting LSB: Start the ZIMon performance mon...... May 30 08:46:32 sc01n02 pmcollector[2584]: Starting performance monitor collector... May 30 08:46:32 sc01n02 systemd[1]: Started LSB: Start the ZIMon performance moni...r.. Hint: Some lines were ellipsized, use -l to show in full. From Grafana Server: [cid:image002.jpg at 01D300B7.CFE73E50] when I send a set of files to the cluster (3.8GB) I can see performance metrics within the Scale GUI [cid:image004.jpg at 01D300B7.CFE73E50] yet from the Grafana Dashboard im not seeing any data points [cid:image006.jpg at 01D300B7.CFE73E50] Can anyone provide some hints as to what might be happening? Regards, Andrew Beattie Software Defined Storage - IT Specialist Phone: 614-2133-7927 E-mail: abeattie at au1.ibm.com -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image002.jpg Type: image/jpeg Size: 19427 bytes Desc: image002.jpg URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image004.jpg Type: image/jpeg Size: 84412 bytes Desc: image004.jpg URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image006.jpg Type: image/jpeg Size: 37285 bytes Desc: image006.jpg URL: From janfrode at tanso.net Wed Jul 19 12:09:48 2017 From: janfrode at tanso.net (Jan-Frode Myklebust) Date: Wed, 19 Jul 2017 11:09:48 +0000 Subject: [gpfsug-discuss] SOBAR questions In-Reply-To: References: Message-ID: Nils Haustein did such a migration from v7000 Unified to ESS last year. Used SOBAR to avoid recalls from HSM. I believe he wrote a whitepaper on the process.. -jf tir. 18. jul. 2017 kl. 21.21 skrev Simon Thompson (IT Research Support) < S.J.Thompson at bham.ac.uk>: > So just following up on my questions from January. > > We tried to do 2. I.e. Restore to a new file-system with different block > sizes. It got part way through creating the file-sets on the new SOBAR > file-system and then GPFS asserts and crashes... We weren't actually > intentionally trying to move block sizes, but because we were restoring > from a traditional SAN based system to a shiny new GNR based system, we'd > manually done the FS create steps. > > I have a PMR open now. I don't know if someone internally in IBM actually > tried this after my emails, as apparently there is a similar internal > defect which is ~6 months old... > > Simon > > From: on behalf of Marc A > Kaplan > Reply-To: "gpfsug-discuss at spectrumscale.org" < > gpfsug-discuss at spectrumscale.org> > Date: Friday, 20 January 2017 at 17:57 > > To: "gpfsug-discuss at spectrumscale.org" > Subject: Re: [gpfsug-discuss] SOBAR questions > > I worked on some aspects of SOBAR, but without studying and testing the > commands - I'm not in a position right now to give simple definitive > answers - > having said that.... > > Generally your questions are reasonable and the answer is: "Yes it should > be possible to do that, but you might be going a bit beyond the design > point.., > so you'll need to try it out on a (smaller) test system with some smaller > tedst files. > > Point by point. > > 1. If SOBAR is unable to restore a particular file, perhaps because the > premigration did not complete -- you should only lose that particular file, > and otherwise "keep going". > > 2. I think SOBAR helps you build a similar file system to the original, > including block sizes. So you'd have to go in and tweak the file system > creation step(s). > I think this is reasonable... If you hit a problem... IMO that would be a > fair APAR. > > 3. Similar to 2. > > > > > > From: "Simon Thompson (Research Computing - IT Services)" < > S.J.Thompson at bham.ac.uk> > To: "gpfsug-discuss at spectrumscale.org" < > gpfsug-discuss at spectrumscale.org> > Date: 01/20/2017 10:44 AM > Subject: [gpfsug-discuss] SOBAR questions > Sent by: gpfsug-discuss-bounces at spectrumscale.org > ------------------------------ > > > > We've recently been looking at deploying SOBAR to support DR of some of > our file-systems, I have some questions (as ever!) that I can't see are > clearly documented, so was wondering if anyone has any insight on this. > > 1. If we elect not to premigrate certain files, are we still able to use > SOBAR? We are happy to take a hit that those files will never be available > again, but some are multi TB files which change daily and we can't stream > to tape effectively. > > 2. When doing a restore, does the block size of the new SOBAR'd to > file-system have to match? For example the old FS was 1MB blocks, the new > FS we create with 2MB blocks. Will this work (this strikes me as one way > we might be able to migrate an FS to a new block size?)? > > 3. If the file-system was originally created with an older GPFS code but > has since been upgraded, does restore work, and does it matter what client > code? E.g. We have a file-system that was originally 3.5.x, its been > upgraded over time to 4.2.2.0. Will this work if the client code was say > 4.2.2.5 (with an appropriate FS version). E.g. Mmlsfs lists, "13.01 > (3.5.0.0) Original file system version" and "16.00 (4.2.2.0) Current file > system version". Say there was 4.2.2.5 which created version 16.01 > file-system as the new FS, what would happen? > > This sort of detail is missing from: > https://www.ibm.com/support/knowledgecenter/STXKQY_4.2.2/com.ibm.spectrum.s > cale.v4r22.doc/bl1adv_sobarrestore.htm > > But is probably quite important for us to know! > > Thanks > > Simon > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: From Robert.Oesterlin at nuance.com Wed Jul 19 12:26:43 2017 From: Robert.Oesterlin at nuance.com (Oesterlin, Robert) Date: Wed, 19 Jul 2017 11:26:43 +0000 Subject: [gpfsug-discuss] Scale / Perfmon / Grafana - services running but no data In-Reply-To: References: Message-ID: Getting this: python zimonGrafanaIntf.py ?s < pmcollector host> via system is a bit of a tricky process, since this process will abort unless the pmcollector is fully up. With a large database, I?ve seen it take 3-5 mins for pmcollector to fully initialize. I?m sure a simple ?sleep and try again? wrapper would take care of that. It?s on my lengthy to-do list! On the CherryPy version - I run the bridge on my RH/Centos system with python 3.4 and used ?pip install cherrypy? and it picked up the latest version. Seems to work just fine. Bob Oesterlin Sr Principal Storage Engineer, Nuance From: on behalf of "Greg.Lehmann at csiro.au" Reply-To: gpfsug main discussion list Date: Wednesday, July 19, 2017 at 2:54 AM To: "gpfsug-discuss at spectrumscale.org" Subject: [EXTERNAL] Re: [gpfsug-discuss] Scale / Perfmon / Grafana - services running but no data I?m having a play with this now too. Has anybody coded a systemd unit to handle step 2b in the knowledge centre article ? bridge creation on the gpfs side? It would save me a bit of effort. I?m also wondering about the CherryPy version. It looks like this has been developed on SLES which has the newer version mentioned as a standard package and yet RHEL with an older version of CherryPy is perhaps more common as it seems to have the best support for features of GPFS, like object and block protocols. Maybe SLES is in favour now? -------------- next part -------------- An HTML attachment was scrubbed... URL: From MDIETZ at de.ibm.com Wed Jul 19 14:05:49 2017 From: MDIETZ at de.ibm.com (Mathias Dietz) Date: Wed, 19 Jul 2017 15:05:49 +0200 Subject: [gpfsug-discuss] mmsysmon.py revisited In-Reply-To: References: Message-ID: thanks for the feedback. Let me clarify what mmsysmon is doing. Since IBM Spectrum Scale 4.2.1 the mmsysmon process is used for the overall health monitoring and CES failover handling. Even without CES it is an essential part of the system because it monitors the individual components and provides health state information and error events. This information is needed by other Spectrum Scale components (mmhealth command, the IBM Spectrum Scale GUI, Support tools, Install Toolkit,..) and therefore disabling mmsysmon will impact them. > It?s a huge problem. I don?t understand why it hasn?t been given > much credit by dev or support. Over the last couple of month, the development team has put a strong focus on this topic. In order to monitor the health of the individual components, mmsysmon listens for notifications/callback but also has to do some polling. We are trying to reduce the polling overhead constantly and replace polling with notifications when possible. Several improvements have been added to 4.2.3, including the ability to configure the polling frequency to reduce the overhead. (mmhealth config interval) See https://www.ibm.com/support/knowledgecenter/en/STXKQY_4.2.3/com.ibm.spectrum.scale.v4r23.doc/bl1adm_mmhealth.htm In addition a new option has been introduced to clock align the monitoring threads in order to reduce CPU jitter. Nevertheless, we don't see significant CPU consumption by mmsysmon on our test systems. It might be a problem specific to your system environment or a wrong configuration therefore please get in contact with IBM support to analyze the root cause of the high usage. Kind regards Mathias Dietz IBM Spectrum Scale - Release Lead Architect and RAS Architect gpfsug-discuss-bounces at spectrumscale.org wrote on 07/18/2017 07:51:21 PM: > From: Jonathon A Anderson > To: gpfsug main discussion list > Date: 07/18/2017 07:51 PM > Subject: Re: [gpfsug-discuss] mmsysmon.py revisited > Sent by: gpfsug-discuss-bounces at spectrumscale.org > > There?s no official way to cleanly disable it so far as I know yet; > but you can defacto disable it by deleting /var/mmfs/mmsysmon/ > mmsysmonitor.conf. > > It?s a huge problem. I don?t understand why it hasn?t been given > much credit by dev or support. > > ~jonathon > > > On 7/18/17, 11:21 AM, "gpfsug-discuss-bounces at spectrumscale.org on > behalf of David Johnson" on behalf of david_johnson at brown.edu> wrote: > > > > > We also noticed a fair amount of CPU time accumulated by mmsysmon.py on > our diskless compute nodes. I read the earlier query, where it > was answered: > > > > > ces == Cluster Export Services, mmsysmon.py comes from > mmcesmon. It is used for managing export services of GPFS. If it is > killed, your nfs/smb etc will be out of work. > Their overhead is small and they are very important. Don't > attempt to kill them. > > > > > > > Our question is this ? we don?t run the latest ?protocols", our > NFS is CNFS, and our CIFS is clustered CIFS. > I can understand it might be needed with Ganesha, but on every node? > > > Why in the world would I be getting this daemon running on all > client nodes, when I didn?t install the ?protocols" version > of the distribution? We have release 4.2.2 at the moment. How > can we disable this? > > > Thanks, > ? ddj > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From david_johnson at brown.edu Wed Jul 19 14:28:23 2017 From: david_johnson at brown.edu (David Johnson) Date: Wed, 19 Jul 2017 09:28:23 -0400 Subject: [gpfsug-discuss] mmsysmon.py revisited In-Reply-To: References: Message-ID: I have opened a PMR, and the official response reflects what you just posted. In addition, it seems there are some performance issues with Python 2 that will be improved with eventual migration to Python 3. I was unaware of the mmhealth functions that the mmsysmon daemon provides. The impact we were seeing was some variation in MPI benchmark results when the nodes were fully loaded. I suppose it would be possible to turn off mmsysmon during the benchmarking, but I appreciate the effort at streamlining the monitor service. Cutting back on fork/exec, better python, less polling, more notifications? all good. Thanks for the details, ? ddj > On Jul 19, 2017, at 9:05 AM, Mathias Dietz wrote: > > thanks for the feedback. > > Let me clarify what mmsysmon is doing. > Since IBM Spectrum Scale 4.2.1 the mmsysmon process is used for the overall health monitoring and CES failover handling. > Even without CES it is an essential part of the system because it monitors the individual components and provides health state information and error events. > This information is needed by other Spectrum Scale components (mmhealth command, the IBM Spectrum Scale GUI, Support tools, Install Toolkit,..) and therefore disabling mmsysmon will impact them. > > > It?s a huge problem. I don?t understand why it hasn?t been given > > much credit by dev or support. > > Over the last couple of month, the development team has put a strong focus on this topic. > In order to monitor the health of the individual components, mmsysmon listens for notifications/callback but also has to do some polling. > We are trying to reduce the polling overhead constantly and replace polling with notifications when possible. > > Several improvements have been added to 4.2.3, including the ability to configure the polling frequency to reduce the overhead. (mmhealth config interval) > See https://www.ibm.com/support/knowledgecenter/en/STXKQY_4.2.3/com.ibm.spectrum.scale.v4r23.doc/bl1adm_mmhealth.htm > In addition a new option has been introduced to clock align the monitoring threads in order to reduce CPU jitter. > > Nevertheless, we don't see significant CPU consumption by mmsysmon on our test systems. > It might be a problem specific to your system environment or a wrong configuration therefore please get in contact with IBM support to analyze the root cause of the high usage. > > Kind regards > > Mathias Dietz > > IBM Spectrum Scale - Release Lead Architect and RAS Architect > > > gpfsug-discuss-bounces at spectrumscale.org wrote on 07/18/2017 07:51:21 PM: > > > From: Jonathon A Anderson > > To: gpfsug main discussion list > > Date: 07/18/2017 07:51 PM > > Subject: Re: [gpfsug-discuss] mmsysmon.py revisited > > Sent by: gpfsug-discuss-bounces at spectrumscale.org > > > > There?s no official way to cleanly disable it so far as I know yet; > > but you can defacto disable it by deleting /var/mmfs/mmsysmon/ > > mmsysmonitor.conf. > > > > It?s a huge problem. I don?t understand why it hasn?t been given > > much credit by dev or support. > > > > ~jonathon > > > > > > On 7/18/17, 11:21 AM, "gpfsug-discuss-bounces at spectrumscale.org on > > behalf of David Johnson" > on behalf of david_johnson at brown.edu> wrote: > > > > > > > > > > We also noticed a fair amount of CPU time accumulated by mmsysmon.py on > > our diskless compute nodes. I read the earlier query, where it > > was answered: > > > > > > > > > > ces == Cluster Export Services, mmsysmon.py comes from > > mmcesmon. It is used for managing export services of GPFS. If it is > > killed, your nfs/smb etc will be out of work. > > Their overhead is small and they are very important. Don't > > attempt to kill them. > > > > > > > > > > > > > > Our question is this ? we don?t run the latest ?protocols", our > > NFS is CNFS, and our CIFS is clustered CIFS. > > I can understand it might be needed with Ganesha, but on every node? > > > > > > Why in the world would I be getting this daemon running on all > > client nodes, when I didn?t install the ?protocols" version > > of the distribution? We have release 4.2.2 at the moment. How > > can we disable this? > > > > > > Thanks, > > ? ddj > > > > > > _______________________________________________ > > gpfsug-discuss mailing list > > gpfsug-discuss at spectrumscale.org > > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From mdharris at us.ibm.com Wed Jul 19 15:40:17 2017 From: mdharris at us.ibm.com (Michael D Harris) Date: Wed, 19 Jul 2017 10:40:17 -0400 Subject: [gpfsug-discuss] mmsysmon.py revisited In-Reply-To: References: Message-ID: Hi David, Re: "The impact we were seeing was some variation in MPI benchmark results when the nodes were fully loaded." MPI workloads show the most mmhealth impact. Specifically the more sensitive the workload is to jitter the higher the potential impact. The mmhealth config interval, as per Mathias's link, is a scalar applied to all monitor interval values in the configuration file. As such it currently modifies the server side monitoring and health reporting in addition to mitigating mpi client impact. So "medium" == 5 is a good perhaps reasonable value - whereas the "slow" == 10 scalar may be too infrequent for your server side monitoring and reporting (so your 30 second update becomes 5 minutes). The clock alignment that Mathias mentioned is a new investigatory undocumented tool for MPI workloads. It nearly completely removes all mmhealth MPI jitter while retaining default monitor intervals. It also naturally generates thundering herds of all client reporting to the quorum nodes. So while you may mitigate the client MPI jitter you may severely impact the server throughput on those intervals if not also exceed connection and thread limits. Configuring "clients" separately from "servers" without resorting to alignment is another area of investigation. I'm not familiar with your PMR but as Mathias mentioned "mmhealth config interval medium" would be a good start. In testing that Kums and I have done the "mmhealth config interval medium" value provides mitigation almost as good as the mentioned clock alignment for MPI for say a psnap with barrier type workload . Regards, Mike Harris IBM Spectrum Scale - Core Team From: gpfsug-discuss-request at spectrumscale.org To: gpfsug-discuss at spectrumscale.org Date: 07/19/2017 09:28 AM Subject: gpfsug-discuss Digest, Vol 66, Issue 30 Sent by: gpfsug-discuss-bounces at spectrumscale.org Send gpfsug-discuss mailing list submissions to gpfsug-discuss at spectrumscale.org To subscribe or unsubscribe via the World Wide Web, visit http://gpfsug.org/mailman/listinfo/gpfsug-discuss or, via email, send a message with subject or body 'help' to gpfsug-discuss-request at spectrumscale.org You can reach the person managing the list at gpfsug-discuss-owner at spectrumscale.org When replying, please edit your Subject line so it is more specific than "Re: Contents of gpfsug-discuss digest..." Today's Topics: 1. Re: mmsysmon.py revisited (Mathias Dietz) 2. Re: mmsysmon.py revisited (David Johnson) ---------------------------------------------------------------------- Message: 1 Date: Wed, 19 Jul 2017 15:05:49 +0200 From: "Mathias Dietz" To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] mmsysmon.py revisited Message-ID: Content-Type: text/plain; charset="iso-8859-1" thanks for the feedback. Let me clarify what mmsysmon is doing. Since IBM Spectrum Scale 4.2.1 the mmsysmon process is used for the overall health monitoring and CES failover handling. Even without CES it is an essential part of the system because it monitors the individual components and provides health state information and error events. This information is needed by other Spectrum Scale components (mmhealth command, the IBM Spectrum Scale GUI, Support tools, Install Toolkit,..) and therefore disabling mmsysmon will impact them. > It?s a huge problem. I don?t understand why it hasn?t been given > much credit by dev or support. Over the last couple of month, the development team has put a strong focus on this topic. In order to monitor the health of the individual components, mmsysmon listens for notifications/callback but also has to do some polling. We are trying to reduce the polling overhead constantly and replace polling with notifications when possible. Several improvements have been added to 4.2.3, including the ability to configure the polling frequency to reduce the overhead. (mmhealth config interval) See https://www.ibm.com/support/knowledgecenter/en/STXKQY_4.2.3/com.ibm.spectrum.scale.v4r23.doc/bl1adm_mmhealth.htm In addition a new option has been introduced to clock align the monitoring threads in order to reduce CPU jitter. Nevertheless, we don't see significant CPU consumption by mmsysmon on our test systems. It might be a problem specific to your system environment or a wrong configuration therefore please get in contact with IBM support to analyze the root cause of the high usage. Kind regards Mathias Dietz IBM Spectrum Scale - Release Lead Architect and RAS Architect gpfsug-discuss-bounces at spectrumscale.org wrote on 07/18/2017 07:51:21 PM: > From: Jonathon A Anderson > To: gpfsug main discussion list > Date: 07/18/2017 07:51 PM > Subject: Re: [gpfsug-discuss] mmsysmon.py revisited > Sent by: gpfsug-discuss-bounces at spectrumscale.org > > There?s no official way to cleanly disable it so far as I know yet; > but you can defacto disable it by deleting /var/mmfs/mmsysmon/ > mmsysmonitor.conf. > > It?s a huge problem. I don?t understand why it hasn?t been given > much credit by dev or support. > > ~jonathon > > > On 7/18/17, 11:21 AM, "gpfsug-discuss-bounces at spectrumscale.org on > behalf of David Johnson" on behalf of david_johnson at brown.edu> wrote: > > > > > We also noticed a fair amount of CPU time accumulated by mmsysmon.py on > our diskless compute nodes. I read the earlier query, where it > was answered: > > > > > ces == Cluster Export Services, mmsysmon.py comes from > mmcesmon. It is used for managing export services of GPFS. If it is > killed, your nfs/smb etc will be out of work. > Their overhead is small and they are very important. Don't > attempt to kill them. > > > > > > > Our question is this ? we don?t run the latest ?protocols", our > NFS is CNFS, and our CIFS is clustered CIFS. > I can understand it might be needed with Ganesha, but on every node? > > > Why in the world would I be getting this daemon running on all > client nodes, when I didn?t install the ?protocols" version > of the distribution? We have release 4.2.2 at the moment. How > can we disable this? > > > Thanks, > ? ddj > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: < http://gpfsug.org/pipermail/gpfsug-discuss/attachments/20170719/8c0e33e9/attachment-0001.html > ------------------------------ Message: 2 Date: Wed, 19 Jul 2017 09:28:23 -0400 From: David Johnson To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] mmsysmon.py revisited Message-ID: Content-Type: text/plain; charset="utf-8" I have opened a PMR, and the official response reflects what you just posted. In addition, it seems there are some performance issues with Python 2 that will be improved with eventual migration to Python 3. I was unaware of the mmhealth functions that the mmsysmon daemon provides. The impact we were seeing was some variation in MPI benchmark results when the nodes were fully loaded. I suppose it would be possible to turn off mmsysmon during the benchmarking, but I appreciate the effort at streamlining the monitor service. Cutting back on fork/exec, better python, less polling, more notifications? all good. Thanks for the details, ? ddj > On Jul 19, 2017, at 9:05 AM, Mathias Dietz wrote: > > thanks for the feedback. > > Let me clarify what mmsysmon is doing. > Since IBM Spectrum Scale 4.2.1 the mmsysmon process is used for the overall health monitoring and CES failover handling. > Even without CES it is an essential part of the system because it monitors the individual components and provides health state information and error events. > This information is needed by other Spectrum Scale components (mmhealth command, the IBM Spectrum Scale GUI, Support tools, Install Toolkit,..) and therefore disabling mmsysmon will impact them. > > > It?s a huge problem. I don?t understand why it hasn?t been given > > much credit by dev or support. > > Over the last couple of month, the development team has put a strong focus on this topic. > In order to monitor the health of the individual components, mmsysmon listens for notifications/callback but also has to do some polling. > We are trying to reduce the polling overhead constantly and replace polling with notifications when possible. > > Several improvements have been added to 4.2.3, including the ability to configure the polling frequency to reduce the overhead. (mmhealth config interval) > See https://www.ibm.com/support/knowledgecenter/en/STXKQY_4.2.3/com.ibm.spectrum.scale.v4r23.doc/bl1adm_mmhealth.htm < https://www.ibm.com/support/knowledgecenter/en/STXKQY_4.2.3/com.ibm.spectrum.scale.v4r23.doc/bl1adm_mmhealth.htm > > In addition a new option has been introduced to clock align the monitoring threads in order to reduce CPU jitter. > > Nevertheless, we don't see significant CPU consumption by mmsysmon on our test systems. > It might be a problem specific to your system environment or a wrong configuration therefore please get in contact with IBM support to analyze the root cause of the high usage. > > Kind regards > > Mathias Dietz > > IBM Spectrum Scale - Release Lead Architect and RAS Architect > > > gpfsug-discuss-bounces at spectrumscale.org wrote on 07/18/2017 07:51:21 PM: > > > From: Jonathon A Anderson > > To: gpfsug main discussion list > > Date: 07/18/2017 07:51 PM > > Subject: Re: [gpfsug-discuss] mmsysmon.py revisited > > Sent by: gpfsug-discuss-bounces at spectrumscale.org > > > > There?s no official way to cleanly disable it so far as I know yet; > > but you can defacto disable it by deleting /var/mmfs/mmsysmon/ > > mmsysmonitor.conf. > > > > It?s a huge problem. I don?t understand why it hasn?t been given > > much credit by dev or support. > > > > ~jonathon > > > > > > On 7/18/17, 11:21 AM, "gpfsug-discuss-bounces at spectrumscale.org on > > behalf of David Johnson" > on behalf of david_johnson at brown.edu> wrote: > > > > > > > > > > We also noticed a fair amount of CPU time accumulated by mmsysmon.py on > > our diskless compute nodes. I read the earlier query, where it > > was answered: > > > > > > > > > > ces == Cluster Export Services, mmsysmon.py comes from > > mmcesmon. It is used for managing export services of GPFS. If it is > > killed, your nfs/smb etc will be out of work. > > Their overhead is small and they are very important. Don't > > attempt to kill them. > > > > > > > > > > > > > > Our question is this ? we don?t run the latest ?protocols", our > > NFS is CNFS, and our CIFS is clustered CIFS. > > I can understand it might be needed with Ganesha, but on every node? > > > > > > Why in the world would I be getting this daemon running on all > > client nodes, when I didn?t install the ?protocols" version > > of the distribution? We have release 4.2.2 at the moment. How > > can we disable this? > > > > > > Thanks, > > ? ddj > > > > > > _______________________________________________ > > gpfsug-discuss mailing list > > gpfsug-discuss at spectrumscale.org > > http://gpfsug.org/mailman/listinfo/gpfsug-discuss < http://gpfsug.org/mailman/listinfo/gpfsug-discuss> > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: < http://gpfsug.org/pipermail/gpfsug-discuss/attachments/20170719/669c525b/attachment.html > ------------------------------ _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss End of gpfsug-discuss Digest, Vol 66, Issue 30 ********************************************** -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: From jonathon.anderson at colorado.edu Wed Jul 19 18:52:14 2017 From: jonathon.anderson at colorado.edu (Jonathon A Anderson) Date: Wed, 19 Jul 2017 17:52:14 +0000 Subject: [gpfsug-discuss] mmsysmon.py revisited In-Reply-To: References: Message-ID: > It might be a problem specific to your system environment or a wrong configuration therefore please get in contact with IBM support to analyze the root cause of the high usage. I suspect it?s actually a result of frequent IO interrupts causing jitter in conflict with MPI on the shared Intel Omni-Path network, in our case. We?ve already tried pursuing support on this through our vendor, DDN, and got no-where. Eventually we were the ones who tried killing mmsysmon, and that fixed our problem. The official company line of ?we don't see significant CPU consumption by mmsysmon on our test systems? isn?t helping. Do you have a test system with OPA? ~jonathon On 7/19/17, 7:05 AM, "gpfsug-discuss-bounces at spectrumscale.org on behalf of Mathias Dietz" wrote: thanks for the feedback. Let me clarify what mmsysmon is doing. Since IBM Spectrum Scale 4.2.1 the mmsysmon process is used for the overall health monitoring and CES failover handling. Even without CES it is an essential part of the system because it monitors the individual components and provides health state information and error events. This information is needed by other Spectrum Scale components (mmhealth command, the IBM Spectrum Scale GUI, Support tools, Install Toolkit,..) and therefore disabling mmsysmon will impact them. > It?s a huge problem. I don?t understand why it hasn?t been given > much credit by dev or support. Over the last couple of month, the development team has put a strong focus on this topic. In order to monitor the health of the individual components, mmsysmon listens for notifications/callback but also has to do some polling. We are trying to reduce the polling overhead constantly and replace polling with notifications when possible. Several improvements have been added to 4.2.3, including the ability to configure the polling frequency to reduce the overhead. (mmhealth config interval) See https://www.ibm.com/support/knowledgecenter/en/STXKQY_4.2.3/com.ibm.spectrum.scale.v4r23.doc/bl1adm_mmhealth.htm In addition a new option has been introduced to clock align the monitoring threads in order to reduce CPU jitter. Nevertheless, we don't see significant CPU consumption by mmsysmon on our test systems. It might be a problem specific to your system environment or a wrong configuration therefore please get in contact with IBM support to analyze the root cause of the high usage. Kind regards Mathias Dietz IBM Spectrum Scale - Release Lead Architect and RAS Architect gpfsug-discuss-bounces at spectrumscale.org wrote on 07/18/2017 07:51:21 PM: > From: Jonathon A Anderson > To: gpfsug main discussion list > Date: 07/18/2017 07:51 PM > Subject: Re: [gpfsug-discuss] mmsysmon.py revisited > Sent by: gpfsug-discuss-bounces at spectrumscale.org > > There?s no official way to cleanly disable it so far as I know yet; > but you can defacto disable it by deleting /var/mmfs/mmsysmon/ > mmsysmonitor.conf. > > It?s a huge problem. I don?t understand why it hasn?t been given > much credit by dev or support. > > ~jonathon > > > On 7/18/17, 11:21 AM, "gpfsug-discuss-bounces at spectrumscale.org on > behalf of David Johnson" on behalf of david_johnson at brown.edu> wrote: > > > > > We also noticed a fair amount of CPU time accumulated by mmsysmon.py on > our diskless compute nodes. I read the earlier query, where it > was answered: > > > > > ces == Cluster Export Services, mmsysmon.py comes from > mmcesmon. It is used for managing export services of GPFS. If it is > killed, your nfs/smb etc will be out of work. > Their overhead is small and they are very important. Don't > attempt to kill them. > > > > > > > Our question is this ? we don?t run the latest ?protocols", our > NFS is CNFS, and our CIFS is clustered CIFS. > I can understand it might be needed with Ganesha, but on every node? > > > Why in the world would I be getting this daemon running on all > client nodes, when I didn?t install the ?protocols" version > of the distribution? We have release 4.2.2 at the moment. How > can we disable this? > > > Thanks, > ? ddj > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss From david_johnson at brown.edu Wed Jul 19 19:12:37 2017 From: david_johnson at brown.edu (david_johnson at brown.edu) Date: Wed, 19 Jul 2017 14:12:37 -0400 Subject: [gpfsug-discuss] mmsysmon.py revisited In-Reply-To: References: Message-ID: <85E16CE6-FAF6-4C8E-80A9-3ED66580BD21@brown.edu> We have FDR14 Mellanox fabric, probably similar interrupt load as OPA. -- ddj Dave Johnson On Jul 19, 2017, at 1:52 PM, Jonathon A Anderson wrote: >> It might be a problem specific to your system environment or a wrong configuration therefore please get in contact with IBM support to analyze the root cause of the high usage. > > I suspect it?s actually a result of frequent IO interrupts causing jitter in conflict with MPI on the shared Intel Omni-Path network, in our case. > > We?ve already tried pursuing support on this through our vendor, DDN, and got no-where. Eventually we were the ones who tried killing mmsysmon, and that fixed our problem. > > The official company line of ?we don't see significant CPU consumption by mmsysmon on our test systems? isn?t helping. Do you have a test system with OPA? > > ~jonathon > > > On 7/19/17, 7:05 AM, "gpfsug-discuss-bounces at spectrumscale.org on behalf of Mathias Dietz" wrote: > > thanks for the feedback. > > Let me clarify what mmsysmon is doing. > Since IBM Spectrum Scale 4.2.1 the mmsysmon process is used for the overall health monitoring and CES failover handling. > Even without CES it is an essential part of the system because it monitors the individual components and provides health state information and error events. > > This information is needed by other Spectrum Scale components (mmhealth command, the IBM Spectrum Scale GUI, Support tools, Install Toolkit,..) and therefore disabling mmsysmon will impact them. > > >> It?s a huge problem. I don?t understand why it hasn?t been given > >> much credit by dev or support. > > Over the last couple of month, the development team has put a strong focus on this topic. > > In order to monitor the health of the individual components, mmsysmon listens for notifications/callback but also has to do some polling. > We are trying to reduce the polling overhead constantly and replace polling with notifications when possible. > > > Several improvements have been added to 4.2.3, including the ability to configure the polling frequency to reduce the overhead. (mmhealth config interval) > > See https://www.ibm.com/support/knowledgecenter/en/STXKQY_4.2.3/com.ibm.spectrum.scale.v4r23.doc/bl1adm_mmhealth.htm > In addition a new option has been introduced to clock align the monitoring threads in order to reduce CPU jitter. > > > Nevertheless, we don't see significant CPU consumption by mmsysmon on our test systems. > > It might be a problem specific to your system environment or a wrong configuration therefore please get in contact with IBM support to analyze the root cause of the high usage. > > Kind regards > > Mathias Dietz > > IBM Spectrum Scale - Release Lead Architect and RAS Architect > > > > gpfsug-discuss-bounces at spectrumscale.org wrote on 07/18/2017 07:51:21 PM: > >> From: Jonathon A Anderson >> To: gpfsug main discussion list >> Date: 07/18/2017 07:51 PM >> Subject: Re: [gpfsug-discuss] mmsysmon.py revisited >> Sent by: gpfsug-discuss-bounces at spectrumscale.org >> >> There?s no official way to cleanly disable it so far as I know yet; >> but you can defacto disable it by deleting /var/mmfs/mmsysmon/ >> mmsysmonitor.conf. >> >> It?s a huge problem. I don?t understand why it hasn?t been given >> much credit by dev or support. >> >> ~jonathon >> >> >> On 7/18/17, 11:21 AM, "gpfsug-discuss-bounces at spectrumscale.org on >> behalf of David Johnson" > on behalf of david_johnson at brown.edu> wrote: >> >> >> >> >> We also noticed a fair amount of CPU time accumulated by mmsysmon.py on >> our diskless compute nodes. I read the earlier query, where it >> was answered: >> >> >> >> >> ces == Cluster Export Services, mmsysmon.py comes from >> mmcesmon. It is used for managing export services of GPFS. If it is >> killed, your nfs/smb etc will be out of work. >> Their overhead is small and they are very important. Don't >> attempt to kill them. >> >> >> >> >> >> >> Our question is this ? we don?t run the latest ?protocols", our >> NFS is CNFS, and our CIFS is clustered CIFS. >> I can understand it might be needed with Ganesha, but on every node? >> >> >> Why in the world would I be getting this daemon running on all >> client nodes, when I didn?t install the ?protocols" version >> of the distribution? We have release 4.2.2 at the moment. How >> can we disable this? >> >> >> Thanks, >> ? ddj >> >> >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at spectrumscale.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss From jonathon.anderson at colorado.edu Wed Jul 19 19:29:22 2017 From: jonathon.anderson at colorado.edu (Jonathon A Anderson) Date: Wed, 19 Jul 2017 18:29:22 +0000 Subject: [gpfsug-discuss] mmsysmon.py revisited In-Reply-To: <85E16CE6-FAF6-4C8E-80A9-3ED66580BD21@brown.edu> References: <85E16CE6-FAF6-4C8E-80A9-3ED66580BD21@brown.edu> Message-ID: OPA behaves _significantly_ differently from Mellanox IB. OPA uses the host CPU for packet processing, whereas Mellanox IB uses a discrete asic on the HBA. As a result, OPA is much more sensitive to task placement and interrupts, in our experience, because the host CPU load competes with the fabric IO processing load. ~jonathon On 7/19/17, 12:12 PM, "gpfsug-discuss-bounces at spectrumscale.org on behalf of david_johnson at brown.edu" wrote: We have FDR14 Mellanox fabric, probably similar interrupt load as OPA. -- ddj Dave Johnson On Jul 19, 2017, at 1:52 PM, Jonathon A Anderson wrote: >> It might be a problem specific to your system environment or a wrong configuration therefore please get in contact with IBM support to analyze the root cause of the high usage. > > I suspect it?s actually a result of frequent IO interrupts causing jitter in conflict with MPI on the shared Intel Omni-Path network, in our case. > > We?ve already tried pursuing support on this through our vendor, DDN, and got no-where. Eventually we were the ones who tried killing mmsysmon, and that fixed our problem. > > The official company line of ?we don't see significant CPU consumption by mmsysmon on our test systems? isn?t helping. Do you have a test system with OPA? > > ~jonathon > > > On 7/19/17, 7:05 AM, "gpfsug-discuss-bounces at spectrumscale.org on behalf of Mathias Dietz" wrote: > > thanks for the feedback. > > Let me clarify what mmsysmon is doing. > Since IBM Spectrum Scale 4.2.1 the mmsysmon process is used for the overall health monitoring and CES failover handling. > Even without CES it is an essential part of the system because it monitors the individual components and provides health state information and error events. > > This information is needed by other Spectrum Scale components (mmhealth command, the IBM Spectrum Scale GUI, Support tools, Install Toolkit,..) and therefore disabling mmsysmon will impact them. > > >> It?s a huge problem. I don?t understand why it hasn?t been given > >> much credit by dev or support. > > Over the last couple of month, the development team has put a strong focus on this topic. > > In order to monitor the health of the individual components, mmsysmon listens for notifications/callback but also has to do some polling. > We are trying to reduce the polling overhead constantly and replace polling with notifications when possible. > > > Several improvements have been added to 4.2.3, including the ability to configure the polling frequency to reduce the overhead. (mmhealth config interval) > > See https://www.ibm.com/support/knowledgecenter/en/STXKQY_4.2.3/com.ibm.spectrum.scale.v4r23.doc/bl1adm_mmhealth.htm > In addition a new option has been introduced to clock align the monitoring threads in order to reduce CPU jitter. > > > Nevertheless, we don't see significant CPU consumption by mmsysmon on our test systems. > > It might be a problem specific to your system environment or a wrong configuration therefore please get in contact with IBM support to analyze the root cause of the high usage. > > Kind regards > > Mathias Dietz > > IBM Spectrum Scale - Release Lead Architect and RAS Architect > > > > gpfsug-discuss-bounces at spectrumscale.org wrote on 07/18/2017 07:51:21 PM: > >> From: Jonathon A Anderson >> To: gpfsug main discussion list >> Date: 07/18/2017 07:51 PM >> Subject: Re: [gpfsug-discuss] mmsysmon.py revisited >> Sent by: gpfsug-discuss-bounces at spectrumscale.org >> >> There?s no official way to cleanly disable it so far as I know yet; >> but you can defacto disable it by deleting /var/mmfs/mmsysmon/ >> mmsysmonitor.conf. >> >> It?s a huge problem. I don?t understand why it hasn?t been given >> much credit by dev or support. >> >> ~jonathon >> >> >> On 7/18/17, 11:21 AM, "gpfsug-discuss-bounces at spectrumscale.org on >> behalf of David Johnson" > on behalf of david_johnson at brown.edu> wrote: >> >> >> >> >> We also noticed a fair amount of CPU time accumulated by mmsysmon.py on >> our diskless compute nodes. I read the earlier query, where it >> was answered: >> >> >> >> >> ces == Cluster Export Services, mmsysmon.py comes from >> mmcesmon. It is used for managing export services of GPFS. If it is >> killed, your nfs/smb etc will be out of work. >> Their overhead is small and they are very important. Don't >> attempt to kill them. >> >> >> >> >> >> >> Our question is this ? we don?t run the latest ?protocols", our >> NFS is CNFS, and our CIFS is clustered CIFS. >> I can understand it might be needed with Ganesha, but on every node? >> >> >> Why in the world would I be getting this daemon running on all >> client nodes, when I didn?t install the ?protocols" version >> of the distribution? We have release 4.2.2 at the moment. How >> can we disable this? >> >> >> Thanks, >> ? ddj >> >> >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at spectrumscale.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss From john.hearns at asml.com Thu Jul 20 08:39:29 2017 From: john.hearns at asml.com (John Hearns) Date: Thu, 20 Jul 2017 07:39:29 +0000 Subject: [gpfsug-discuss] mmsysmon.py revisited In-Reply-To: References: <85E16CE6-FAF6-4C8E-80A9-3ED66580BD21@brown.edu> Message-ID: This is really interesting. I know we can look at the interrupt rates of course, but is there a way we can quantify the effects of interrupts / OS jitter here? -----Original Message----- From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Jonathon A Anderson Sent: Wednesday, July 19, 2017 8:29 PM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] mmsysmon.py revisited OPA behaves _significantly_ differently from Mellanox IB. OPA uses the host CPU for packet processing, whereas Mellanox IB uses a discrete asic on the HBA. As a result, OPA is much more sensitive to task placement and interrupts, in our experience, because the host CPU load competes with the fabric IO processing load. ~jonathon On 7/19/17, 12:12 PM, "gpfsug-discuss-bounces at spectrumscale.org on behalf of david_johnson at brown.edu" wrote: We have FDR14 Mellanox fabric, probably similar interrupt load as OPA. -- ddj Dave Johnson On Jul 19, 2017, at 1:52 PM, Jonathon A Anderson wrote: >> It might be a problem specific to your system environment or a wrong configuration therefore please get in contact with IBM support to analyze the root cause of the high usage. > > I suspect it?s actually a result of frequent IO interrupts causing jitter in conflict with MPI on the shared Intel Omni-Path network, in our case. > > We?ve already tried pursuing support on this through our vendor, DDN, and got no-where. Eventually we were the ones who tried killing mmsysmon, and that fixed our problem. > > The official company line of ?we don't see significant CPU consumption by mmsysmon on our test systems? isn?t helping. Do you have a test system with OPA? > > ~jonathon > > > On 7/19/17, 7:05 AM, "gpfsug-discuss-bounces at spectrumscale.org on behalf of Mathias Dietz" wrote: > > thanks for the feedback. > > Let me clarify what mmsysmon is doing. > Since IBM Spectrum Scale 4.2.1 the mmsysmon process is used for the overall health monitoring and CES failover handling. > Even without CES it is an essential part of the system because it monitors the individual components and provides health state information and error events. > > This information is needed by other Spectrum Scale components (mmhealth command, the IBM Spectrum Scale GUI, Support tools, Install Toolkit,..) and therefore disabling mmsysmon will impact them. > > >> It?s a huge problem. I don?t understand why it hasn?t been given > >> much credit by dev or support. > > Over the last couple of month, the development team has put a strong focus on this topic. > > In order to monitor the health of the individual components, mmsysmon listens for notifications/callback but also has to do some polling. > We are trying to reduce the polling overhead constantly and replace polling with notifications when possible. > > > Several improvements have been added to 4.2.3, including the ability to configure the polling frequency to reduce the overhead. (mmhealth config interval) > > See https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.ibm.com%2Fsupport%2Fknowledgecenter%2Fen%2FSTXKQY_4.2.3%2Fcom.ibm.spectrum.scale.v4r23.doc%2Fbl1adm_mmhealth.htm&data=01%7C01%7Cjohn.hearns%40asml.com%7C63ff97a625e4499cb8ba08d4ced41754%7Caf73baa8f5944eb2a39d93e96cad61fc%7C1&sdata=Uzdg4ogcQwidNfi8TMp%2FdCMqnSLTFxU4y8n2ub%2F28xQ%3D&reserved=0 > In addition a new option has been introduced to clock align the monitoring threads in order to reduce CPU jitter. > > > Nevertheless, we don't see significant CPU consumption by mmsysmon on our test systems. > > It might be a problem specific to your system environment or a wrong configuration therefore please get in contact with IBM support to analyze the root cause of the high usage. > > Kind regards > > Mathias Dietz > > IBM Spectrum Scale - Release Lead Architect and RAS Architect > > > > gpfsug-discuss-bounces at spectrumscale.org wrote on 07/18/2017 07:51:21 PM: > >> From: Jonathon A Anderson >> To: gpfsug main discussion list >> Date: 07/18/2017 07:51 PM >> Subject: Re: [gpfsug-discuss] mmsysmon.py revisited >> Sent by: gpfsug-discuss-bounces at spectrumscale.org >> >> There?s no official way to cleanly disable it so far as I know yet; >> but you can defacto disable it by deleting /var/mmfs/mmsysmon/ >> mmsysmonitor.conf. >> >> It?s a huge problem. I don?t understand why it hasn?t been given >> much credit by dev or support. >> >> ~jonathon >> >> >> On 7/18/17, 11:21 AM, "gpfsug-discuss-bounces at spectrumscale.org on >> behalf of David Johnson" > on behalf of david_johnson at brown.edu> wrote: >> >> >> >> >> We also noticed a fair amount of CPU time accumulated by mmsysmon.py on >> our diskless compute nodes. I read the earlier query, where it >> was answered: >> >> >> >> >> ces == Cluster Export Services, mmsysmon.py comes from >> mmcesmon. It is used for managing export services of GPFS. If it is >> killed, your nfs/smb etc will be out of work. >> Their overhead is small and they are very important. Don't >> attempt to kill them. >> >> >> >> >> >> >> Our question is this ? we don?t run the latest ?protocols", our >> NFS is CNFS, and our CIFS is clustered CIFS. >> I can understand it might be needed with Ganesha, but on every node? >> >> >> Why in the world would I be getting this daemon running on all >> client nodes, when I didn?t install the ?protocols" version >> of the distribution? We have release 4.2.2 at the moment. How >> can we disable this? >> >> >> Thanks, >> ? ddj >> >> >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at spectrumscale.org >> https://emea01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fgpfsug.org%2Fmailman%2Flistinfo%2Fgpfsug-discuss&data=01%7C01%7Cjohn.hearns%40asml.com%7C63ff97a625e4499cb8ba08d4ced41754%7Caf73baa8f5944eb2a39d93e96cad61fc%7C1&sdata=9yatmHdcxwy%2FD%2FoZuLVDkPjpIeS9F7crTLl2MoUUIyo%3D&reserved=0 > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > https://emea01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fgpfsug.org%2Fmailman%2Flistinfo%2Fgpfsug-discuss&data=01%7C01%7Cjohn.hearns%40asml.com%7C63ff97a625e4499cb8ba08d4ced41754%7Caf73baa8f5944eb2a39d93e96cad61fc%7C1&sdata=9yatmHdcxwy%2FD%2FoZuLVDkPjpIeS9F7crTLl2MoUUIyo%3D&reserved=0 _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://emea01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fgpfsug.org%2Fmailman%2Flistinfo%2Fgpfsug-discuss&data=01%7C01%7Cjohn.hearns%40asml.com%7C63ff97a625e4499cb8ba08d4ced41754%7Caf73baa8f5944eb2a39d93e96cad61fc%7C1&sdata=9yatmHdcxwy%2FD%2FoZuLVDkPjpIeS9F7crTLl2MoUUIyo%3D&reserved=0 _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://emea01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fgpfsug.org%2Fmailman%2Flistinfo%2Fgpfsug-discuss&data=01%7C01%7Cjohn.hearns%40asml.com%7C63ff97a625e4499cb8ba08d4ced41754%7Caf73baa8f5944eb2a39d93e96cad61fc%7C1&sdata=9yatmHdcxwy%2FD%2FoZuLVDkPjpIeS9F7crTLl2MoUUIyo%3D&reserved=0 -- The information contained in this communication and any attachments is confidential and may be privileged, and is for the sole use of the intended recipient(s). Any unauthorized review, use, disclosure or distribution is prohibited. Unless explicitly stated otherwise in the body of this communication or the attachment thereto (if any), the information is provided on an AS-IS basis without any express or implied warranties or liabilities. To the extent you are relying on this information, you are doing so at your own risk. If you are not the intended recipient, please notify the sender immediately by replying to this message and destroy all copies of this message and any attachments. Neither the sender nor the company/group of companies he or she represents shall be liable for the proper and complete transmission of the information contained in this communication, or for any delay in its receipt. From MDIETZ at de.ibm.com Thu Jul 20 10:30:50 2017 From: MDIETZ at de.ibm.com (Mathias Dietz) Date: Thu, 20 Jul 2017 11:30:50 +0200 Subject: [gpfsug-discuss] mmsysmon.py revisited In-Reply-To: References: Message-ID: Jonathon, its important to separate the two issues "high CPU consumption" and "CPU Jitter". As mentioned, we are aware of the CPU jitter issue and already put several improvements in place. (more to come with the next release) Did you try with a lower polling frequency and/or enabling clock alignment as Mike suggested ? Non-MPI workloads are usually not impacted by CPU jitter, but might be impacted by high CPU consumption. But we don't see such such high CPU consumption in the lab and therefore ask affected customers to get in contact with IBM support to find the root cause. Kind regards Mathias Dietz IBM Spectrum Scale - Release Lead Architect and RAS Architect gpfsug-discuss-bounces at spectrumscale.org wrote on 07/19/2017 07:52:14 PM: > From: Jonathon A Anderson > To: gpfsug main discussion list > Date: 07/19/2017 07:52 PM > Subject: Re: [gpfsug-discuss] mmsysmon.py revisited > Sent by: gpfsug-discuss-bounces at spectrumscale.org > > > It might be a problem specific to your system environment or a > wrong configuration therefore please get in contact with IBM support > to analyze the root cause of the high usage. > > I suspect it?s actually a result of frequent IO interrupts causing > jitter in conflict with MPI on the shared Intel Omni-Path network, > in our case. > > We?ve already tried pursuing support on this through our vendor, > DDN, and got no-where. Eventually we were the ones who tried killing > mmsysmon, and that fixed our problem. > > The official company line of ?we don't see significant CPU > consumption by mmsysmon on our test systems? isn?t helping. Do you > have a test system with OPA? > > ~jonathon > > > On 7/19/17, 7:05 AM, "gpfsug-discuss-bounces at spectrumscale.org on > behalf of Mathias Dietz" on behalf of MDIETZ at de.ibm.com> wrote: > > thanks for the feedback. > > Let me clarify what mmsysmon is doing. > Since IBM Spectrum Scale 4.2.1 the mmsysmon process is used for > the overall health monitoring and CES failover handling. > Even without CES it is an essential part of the system because > it monitors the individual components and provides health state > information and error events. > > This information is needed by other Spectrum Scale components > (mmhealth command, the IBM Spectrum Scale GUI, Support tools, > Install Toolkit,..) and therefore disabling mmsysmon will impact them. > > > > It?s a huge problem. I don?t understand why it hasn?t been given > > > much credit by dev or support. > > Over the last couple of month, the development team has put a > strong focus on this topic. > > In order to monitor the health of the individual components, > mmsysmon listens for notifications/callback but also has to do some polling. > We are trying to reduce the polling overhead constantly and > replace polling with notifications when possible. > > > Several improvements have been added to 4.2.3, including the > ability to configure the polling frequency to reduce the overhead. > (mmhealth config interval) > > See https://www.ibm.com/support/knowledgecenter/en/STXKQY_4.2.3/ > com.ibm.spectrum.scale.v4r23.doc/bl1adm_mmhealth.htm > In addition a new option has been introduced to clock align the > monitoring threads in order to reduce CPU jitter. > > > Nevertheless, we don't see significant CPU consumption by > mmsysmon on our test systems. > > It might be a problem specific to your system environment or a > wrong configuration therefore please get in contact with IBM support > to analyze the root cause of the high usage. > > Kind regards > > Mathias Dietz > > IBM Spectrum Scale - Release Lead Architect and RAS Architect > > > > gpfsug-discuss-bounces at spectrumscale.org wrote on 07/18/2017 07:51:21 PM: > > > From: Jonathon A Anderson > > To: gpfsug main discussion list > > Date: 07/18/2017 07:51 PM > > Subject: Re: [gpfsug-discuss] mmsysmon.py revisited > > Sent by: gpfsug-discuss-bounces at spectrumscale.org > > > > There?s no official way to cleanly disable it so far as I know yet; > > but you can defacto disable it by deleting /var/mmfs/mmsysmon/ > > mmsysmonitor.conf. > > > > It?s a huge problem. I don?t understand why it hasn?t been given > > much credit by dev or support. > > > > ~jonathon > > > > > > On 7/18/17, 11:21 AM, "gpfsug-discuss-bounces at spectrumscale.org on > > behalf of David Johnson" > on behalf of david_johnson at brown.edu> wrote: > > > > > > > > > > We also noticed a fair amount of CPU time accumulated by > mmsysmon.py on > > our diskless compute nodes. I read the earlier query, where it > > was answered: > > > > > > > > > > ces == Cluster Export Services, mmsysmon.py comes from > > mmcesmon. It is used for managing export services of GPFS. If it is > > killed, your nfs/smb etc will be out of work. > > Their overhead is small and they are very important. Don't > > attempt to kill them. > > > > > > > > > > > > > > Our question is this ? we don?t run the latest ?protocols", our > > NFS is CNFS, and our CIFS is clustered CIFS. > > I can understand it might be needed with Ganesha, but on > every node? > > > > > > Why in the world would I be getting this daemon running on all > > client nodes, when I didn?t install the ?protocols" version > > of the distribution? We have release 4.2.2 at the moment. How > > can we disable this? > > > > > > Thanks, > > ? ddj > > > > > > _______________________________________________ > > gpfsug-discuss mailing list > > gpfsug-discuss at spectrumscale.org > > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From cgirda at wustl.edu Thu Jul 20 15:57:14 2017 From: cgirda at wustl.edu (Chakravarthy Girda) Date: Thu, 20 Jul 2017 09:57:14 -0500 Subject: [gpfsug-discuss] pmmonitor - ERROR Message-ID: <40d1f871-6081-4db3-8d6a-cec816266a00@wustl.edu> Hi There, I was running a bridge port services to push my stats to grafana. It was running fine until we started some rigorous IOPS testing on the cluster. Now its failing to start with the following error. Questions: 1. Any clues on it fix? 2. Is there anyway I can run this in a service/daemon mode rather than running in a screen session? [root at linuscs107 zimonGrafanaIntf]# python zimonGrafanaIntf.py -s linuscs107.gsc.wustl.edu Failed to initialize MetadataHandler, please check log file for reason #cat pmmonitor.log 2017-07-20 09:41:29,384 - pmmonitor - ERROR - GPFSCapacityUtil::execute: Error sending query in execute, quitting 2017-07-20 09:41:29,552 - pmmonitor - ERROR - GPFSCapacityUtil::execute: Error sending query in execute, quitting 2017-07-20 09:41:30,047 - pmmonitor - ERROR - GPFSCapacityUtil::execute: Error sending query in execute, quitting Thank you Chakri From Robert.Oesterlin at nuance.com Thu Jul 20 16:06:48 2017 From: Robert.Oesterlin at nuance.com (Oesterlin, Robert) Date: Thu, 20 Jul 2017 15:06:48 +0000 Subject: [gpfsug-discuss] mmsysmon and CCR Message-ID: I recently ran into an issue where the frequency of mmsysmon polling (GPFS 4.2.2) was causing issues with CCR updates. I eventually ended decreasing the polling interval to 30 mins (I don?t have any CES) which seemed to solve the issue. So, if you have a large cluster, be on the lookout for CCR issues, if you have that configured. Bob Oesterlin Sr Principal Storage Engineer, Nuance -------------- next part -------------- An HTML attachment was scrubbed... URL: From cgirda at wustl.edu Thu Jul 20 17:38:25 2017 From: cgirda at wustl.edu (Chakravarthy Girda) Date: Thu, 20 Jul 2017 11:38:25 -0500 Subject: [gpfsug-discuss] pmmonitor - ERROR In-Reply-To: <40d1f871-6081-4db3-8d6a-cec816266a00@wustl.edu> References: <40d1f871-6081-4db3-8d6a-cec816266a00@wustl.edu> Message-ID: <31b9b441-f51c-c0d1-11e0-b01a070f9e4e@wustl.edu> cat zserver.log 2017-07-20 11:21:59,001 - zimonGrafanaIntf - ERROR - Could not initialize the QueryHandler, GetHandler::initializeTables failed (errno: None, errmsg: Unable to connect to 10.100.3.150 on port 9084, error number: 111, error code: ECONNREFUSED) 2017-07-20 11:32:29,090 - zimonGrafanaIntf - ERROR - Could not initialize the QueryHandler, GetHandler::initializeTables failed (errno: None, errmsg: Unable to connect to 10.100.3.150 on port 9084, error number: 111, error code: ECONNREFUSED) Thank you Chakri On 7/20/17 9:57 AM, Chakravarthy Girda wrote: > Hi There, > > I was running a bridge port services to push my stats to grafana. It > was running fine until we started some rigorous IOPS testing on the > cluster. Now its failing to start with the following error. > > Questions: > > 1. Any clues on it fix? > 2. Is there anyway I can run this in a service/daemon mode rather than > running in a screen session? > > > [root at linuscs107 zimonGrafanaIntf]# python zimonGrafanaIntf.py -s > linuscs107.gsc.wustl.edu > Failed to initialize MetadataHandler, please check log file for reason > > #cat pmmonitor.log > > 2017-07-20 09:41:29,384 - pmmonitor - ERROR - GPFSCapacityUtil::execute: > Error sending query in execute, quitting > 2017-07-20 09:41:29,552 - pmmonitor - ERROR - GPFSCapacityUtil::execute: > Error sending query in execute, quitting > 2017-07-20 09:41:30,047 - pmmonitor - ERROR - GPFSCapacityUtil::execute: > Error sending query in execute, quitting > > > Thank you > Chakri > > > > > From Robert.Oesterlin at nuance.com Thu Jul 20 17:50:12 2017 From: Robert.Oesterlin at nuance.com (Oesterlin, Robert) Date: Thu, 20 Jul 2017 16:50:12 +0000 Subject: [gpfsug-discuss] pmmonitor - ERROR Message-ID: <3377EEBD-7F1A-4269-83EE-458362762457@nuance.com> This looks like the Grafana bridge could not connect to the pmcollector process - is it running normally? See if some of the normal ?mmprefmon? commands work and/or look at the log file on the pmcollector node. (under /var/log/zimon) You will also see this node when the pmcollector process is still initializing. (reading in the existing database, not ready to service requests) Bob Oesterlin Sr Principal Storage Engineer, Nuance On 7/20/17, 11:38 AM, "gpfsug-discuss-bounces at spectrumscale.org on behalf of Chakravarthy Girda" wrote: 2017-07-20 11:32:29,090 - zimonGrafanaIntf - ERROR - Could not initialize the QueryHandler, GetHandler::initializeTables failed (errno: None, errmsg: Unable to connect to 10.100.3.150 on port 9084, error number: 111, error code: ECONNREFUSED) From mdharris at us.ibm.com Thu Jul 20 17:55:56 2017 From: mdharris at us.ibm.com (Michael D Harris) Date: Thu, 20 Jul 2017 12:55:56 -0400 Subject: [gpfsug-discuss] gpfsug-discuss Digest, Vol 66, Issue 34 In-Reply-To: References: Message-ID: Hi Bob, The CCR monitor interval is addressed in 4.2.3 or 4.2.3 ptf1 Regards, Mike Harris Spectrum Scale Development - Core Team -------------- next part -------------- An HTML attachment was scrubbed... URL: From cgirda at wustl.edu Thu Jul 20 18:12:09 2017 From: cgirda at wustl.edu (Chakravarthy Girda) Date: Thu, 20 Jul 2017 12:12:09 -0500 Subject: [gpfsug-discuss] pmmonitor - ERROR In-Reply-To: <3377EEBD-7F1A-4269-83EE-458362762457@nuance.com> References: <3377EEBD-7F1A-4269-83EE-458362762457@nuance.com> Message-ID: Bob, Your correct. Found the issues with pmcollector services. Fixed issues with pmcollector, resolved the issues. Thank you Chakri On 7/20/17 11:50 AM, Oesterlin, Robert wrote: > You will also see this node when the pmcollector process is still initializing. (reading in the existing database, not ready to service requests) From cgirda at wustl.edu Thu Jul 20 18:30:03 2017 From: cgirda at wustl.edu (Chakravarthy Girda) Date: Thu, 20 Jul 2017 12:30:03 -0500 Subject: [gpfsug-discuss] pmmonitor - ERROR In-Reply-To: References: <3377EEBD-7F1A-4269-83EE-458362762457@nuance.com> Message-ID: <576fe053-e356-dc5c-e71d-18ef96f7ccaa@wustl.edu> Bob, Actually the pmcollector service died in 5min. 2017-07-20 12:11:29,639 - pmmonitor - WARNING - GPFSCapacityUtil: received zero inode/pool total size 2017-07-20 12:16:29,470 - pmmonitor - WARNING - GPFSCapacityUtil: received zero inode/pool total size 2017-07-20 12:16:29,639 - pmmonitor - WARNING - GPFSCapacityUtil: received zero inode/pool total size 2017-07-20 12:21:29,384 - pmmonitor - ERROR - QueryHandler: Socket connection broken, received no data 2017-07-20 12:21:29,384 - pmmonitor - ERROR - GPFSCapacityUtil::execute: Error sending query in execute, quitting 2017-07-20 12:21:29,552 - pmmonitor - ERROR - GPFSCapacityUtil::execute: Error sending query in execute, quitting 2017-07-20 12:21:30,047 - pmmonitor - ERROR - GPFSCapacityUtil::execute: Error sending query in execute, quitting Thank you Chakri On 7/20/17 12:12 PM, Chakravarthy Girda wrote: > Bob, > > Your correct. Found the issues with pmcollector services. Fixed issues > with pmcollector, resolved the issues. > > > Thank you > > Chakri > > > On 7/20/17 11:50 AM, Oesterlin, Robert wrote: >> You will also see this node when the pmcollector process is still initializing. (reading in the existing database, not ready to service requests) From cgirda at wustl.edu Thu Jul 20 21:03:56 2017 From: cgirda at wustl.edu (Chakravarthy Girda) Date: Thu, 20 Jul 2017 15:03:56 -0500 Subject: [gpfsug-discuss] pmmonitor - ERROR In-Reply-To: <576fe053-e356-dc5c-e71d-18ef96f7ccaa@wustl.edu> References: <3377EEBD-7F1A-4269-83EE-458362762457@nuance.com> <576fe053-e356-dc5c-e71d-18ef96f7ccaa@wustl.edu> Message-ID: For now I switched the "zimonGrafanaIntf" to port "4262". So far it didn't crash the pmcollector. Will wait for some more time to ensure its working. * Can we start this process in a daemon or service mode? Thank you Chakri On 7/20/17 12:30 PM, Chakravarthy Girda wrote: > Bob, > > Actually the pmcollector service died in 5min. > > 2017-07-20 12:11:29,639 - pmmonitor - WARNING - GPFSCapacityUtil: > received zero inode/pool total size > 2017-07-20 12:16:29,470 - pmmonitor - WARNING - GPFSCapacityUtil: > received zero inode/pool total size > 2017-07-20 12:16:29,639 - pmmonitor - WARNING - GPFSCapacityUtil: > received zero inode/pool total size > 2017-07-20 12:21:29,384 - pmmonitor - ERROR - QueryHandler: Socket > connection broken, received no data > 2017-07-20 12:21:29,384 - pmmonitor - ERROR - GPFSCapacityUtil::execute: > Error sending query in execute, quitting > 2017-07-20 12:21:29,552 - pmmonitor - ERROR - GPFSCapacityUtil::execute: > Error sending query in execute, quitting > 2017-07-20 12:21:30,047 - pmmonitor - ERROR - GPFSCapacityUtil::execute: > Error sending query in execute, quitting > > Thank you > Chakri > > > On 7/20/17 12:12 PM, Chakravarthy Girda wrote: >> Bob, >> >> Your correct. Found the issues with pmcollector services. Fixed issues >> with pmcollector, resolved the issues. >> >> >> Thank you >> >> Chakri >> >> >> On 7/20/17 11:50 AM, Oesterlin, Robert wrote: >>> You will also see this node when the pmcollector process is still initializing. (reading in the existing database, not ready to service requests) > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss From cgirda at wustl.edu Thu Jul 20 21:42:09 2017 From: cgirda at wustl.edu (Chakravarthy Girda) Date: Thu, 20 Jul 2017 15:42:09 -0500 Subject: [gpfsug-discuss] zimonGrafanaIntf template variable Message-ID: <00372fdc-a0b7-26ac-84c1-aa32c78e4261@wustl.edu> Hi, I imported the pre-built grafana dashboard. https://www.ibm.com/developerworks/community/wikis/form/anonymous/api/wiki/fa32927c-e904-49cc-a4cc-870bcc8e307c/page/a180eb7e-9161-4e07-a6e4-35a0a076f7b3/attachment/5e9a5886-5bd9-4a6f-919e-bc66d16760cf/media/default%20dashboards%20set.zip Get updates from few graphs but not all. I realize that I need to update the template variables. Eg:- I get into the "File Systems View" Variable ( gpfsMetrics_fs1 ) --> Query ( gpfsMetrics_fs1 ) Regex ( /.*[^gpfs_fs_inode_used|gpfs_fs_inode_alloc|gpfs_fs_inode_free|gpfs_fs_inode_max]/ ) Question: * How can I execute the above Query and regex to fix the issues. * Is there any document on CLI options? Thank you Chakri -------------- next part -------------- An HTML attachment was scrubbed... URL: From valdis.kletnieks at vt.edu Fri Jul 21 22:13:17 2017 From: valdis.kletnieks at vt.edu (valdis.kletnieks at vt.edu) Date: Fri, 21 Jul 2017 17:13:17 -0400 Subject: [gpfsug-discuss] GPFS, LTFS/EE and data-in-inode? Message-ID: <28986.1500671597@turing-police.cc.vt.edu> So we're running GPFS 4.2.2.3 and LTFS/EE 1.2.3 to use as an archive service. Inode size is 4K, and we had a requirement to encrypt-at-rest, so encryption is in play as well. Data is replicated 2x and fragment size is 32K. I was investigating how much data-in-inode would help deal with users who put large trees of small files into the archive (yes, I know we can use applypolicy with external programs to tarball offending directories, but that's a separate discussion ;) ## ls -ls * 64 -rw-r--r-- 1 root root 2048 Jul 21 14:47 random.data 64 -rw-r--r-- 1 root root 512 Jul 21 14:48 random.data.1 64 -rw-r--r-- 1 root root 128 Jul 21 14:50 random.data.2 64 -rw-r--r-- 1 root root 32 Jul 21 14:50 random.data.3 64 -rw-r--r-- 1 root root 16 Jul 21 14:50 random.data.4 Hmm.. I was expecting at least *some* of these to fit in the inode, and not take 2 32K blocks... ## mmlsattr -d -L random.data.4 file name: random.data.4 metadata replication: 2 max 2 data replication: 2 max 2 immutable: no appendOnly: no flags: storage pool name: system fileset name: root snapshot name: creation time: Fri Jul 21 14:50:51 2017 Misc attributes: ARCHIVE Encrypted: yes gpfs.Encryption: 0x4541 (... another 296 hex digits) EncPar 'AES:256:XTS:FEK:HMACSHA512' type: wrapped FEK WrpPar 'AES:KWRAP' CmbPar 'XORHMACSHA512' KEY-97c7f4b7-06cb-4a53-b317-1c187432dc62:archKEY1_gpfsG1 Hmm.. Doesn't *look* like enough extended attributes to prevent storing even 16 bytes in the inode, should be room for around 3.5K minus the above 250 bytes or so of attributes.... What am I missing here? Does "encrypted" or LTFS/EE disable data-in-inode? -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 486 bytes Desc: not available URL: From oehmes at gmail.com Fri Jul 21 23:04:32 2017 From: oehmes at gmail.com (Sven Oehme) Date: Fri, 21 Jul 2017 22:04:32 +0000 Subject: [gpfsug-discuss] GPFS, LTFS/EE and data-in-inode? In-Reply-To: <28986.1500671597@turing-police.cc.vt.edu> References: <28986.1500671597@turing-police.cc.vt.edu> Message-ID: Hi, i talked with a few others to confirm this, but unfortunate this is a limitation of the code today (maybe not well documented which we will look into). Encryption only encrypts data blocks, it doesn't encrypt metadata. Hence, if encryption is enabled, we don't store data in the inode, because then it wouldn't be encrypted. For the same reason HAWC and encryption are incompatible. Sven On Fri, Jul 21, 2017 at 2:13 PM wrote: > So we're running GPFS 4.2.2.3 and LTFS/EE 1.2.3 to use as an archive > service. > Inode size is 4K, and we had a requirement to encrypt-at-rest, so > encryption > is in play as well. Data is replicated 2x and fragment size is 32K. > > I was investigating how much data-in-inode would help deal with users who > put > large trees of small files into the archive (yes, I know we can use > applypolicy > with external programs to tarball offending directories, but that's a > separate > discussion ;) > > ## ls -ls * > 64 -rw-r--r-- 1 root root 2048 Jul 21 14:47 random.data > 64 -rw-r--r-- 1 root root 512 Jul 21 14:48 random.data.1 > 64 -rw-r--r-- 1 root root 128 Jul 21 14:50 random.data.2 > 64 -rw-r--r-- 1 root root 32 Jul 21 14:50 random.data.3 > 64 -rw-r--r-- 1 root root 16 Jul 21 14:50 random.data.4 > > Hmm.. I was expecting at least *some* of these to fit in the inode, and > not take 2 32K blocks... > > ## mmlsattr -d -L random.data.4 > file name: random.data.4 > metadata replication: 2 max 2 > data replication: 2 max 2 > immutable: no > appendOnly: no > flags: > storage pool name: system > fileset name: root > snapshot name: > creation time: Fri Jul 21 14:50:51 2017 > Misc attributes: ARCHIVE > Encrypted: yes > gpfs.Encryption: 0x4541 (... another 296 hex digits) > EncPar 'AES:256:XTS:FEK:HMACSHA512' > type: wrapped FEK WrpPar 'AES:KWRAP' CmbPar 'XORHMACSHA512' > KEY-97c7f4b7-06cb-4a53-b317-1c187432dc62:archKEY1_gpfsG1 > > Hmm.. Doesn't *look* like enough extended attributes to prevent storing > even > 16 bytes in the inode, should be room for around 3.5K minus the above 250 > bytes > or so of attributes.... > > What am I missing here? Does "encrypted" or LTFS/EE disable data-in-inode? > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: From valdis.kletnieks at vt.edu Fri Jul 21 23:24:13 2017 From: valdis.kletnieks at vt.edu (valdis.kletnieks at vt.edu) Date: Fri, 21 Jul 2017 18:24:13 -0400 Subject: [gpfsug-discuss] GPFS, LTFS/EE and data-in-inode? In-Reply-To: References: <28986.1500671597@turing-police.cc.vt.edu> Message-ID: <33069.1500675853@turing-police.cc.vt.edu> On Fri, 21 Jul 2017 22:04:32 -0000, Sven Oehme said: > i talked with a few others to confirm this, but unfortunate this is a > limitation of the code today (maybe not well documented which we will look > into). Encryption only encrypts data blocks, it doesn't encrypt metadata. > Hence, if encryption is enabled, we don't store data in the inode, because > then it wouldn't be encrypted. For the same reason HAWC and encryption are > incompatible. I can live with that restriction if it's documented better, thanks... -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 486 bytes Desc: not available URL: From p.childs at qmul.ac.uk Mon Jul 24 10:29:49 2017 From: p.childs at qmul.ac.uk (Peter Childs) Date: Mon, 24 Jul 2017 09:29:49 +0000 Subject: [gpfsug-discuss] Gpfs Memory Usaage Keeps going up and we don't know why. Message-ID: <1500888588.571.3.camel@qmul.ac.uk> We have two GPFS clusters. One is fairly old and running 4.2.1-2 and non CCR and the nodes run fine using up about 1.5G of memory and is consistent (GPFS pagepool is set to 1G, so that looks about right.) The other one is "newer" running 4.2.1-3 with CCR and the nodes keep increasing in there memory usage, starting at about 1.1G and are find for a few days however after a while they grow to 4.2G which when the node need to run real work, means the work can't be done. I'm losing track of what maybe different other than CCR, and I'm trying to find some more ideas of where to look. I'm checked all the standard things like pagepool and maxFilesToCache (set to the default of 4000), workerThreads is set to 128 on the new gpfs cluster (against default 48 on the old) I'm not sure what else to look at on this one hence why I'm asking the community. Thanks in advance Peter Childs ITS Research Storage Queen Mary University of London. From ilan84 at gmail.com Mon Jul 24 11:36:41 2017 From: ilan84 at gmail.com (Ilan Schwarts) Date: Mon, 24 Jul 2017 13:36:41 +0300 Subject: [gpfsug-discuss] export nfs share on gpfs with no authentication Message-ID: Hi, I have gpfs with 2 Nodes (redhat). I am trying to create NFS share - So I would be able to mount and access it from another linux machine. I receive error: Current authentication: none is invalid. What do i need to configure ? PLEASE NOTE: I dont have the SMB package at the moment, I dont want authentication on the NFS export.. While trying to create NFS (I execute the following): [root at LH20-GPFS1 ~]# mmnfs export add /fs_gpfs01 -c "* Access_Type=RW,Protocols=3:4,Squash=no_root_squash)" I receive the following error: [root at LH20-GPFS1 ~]# mmnfs export add /fs_gpfs01 -c "*(Access_Type=RW,Protocols=3:4,Squash=no_root_squash)" mmcesfuncs.sh: Current authentication: none is invalid. This operation can not be completed without correct Authentication configuration. Configure authentication using: mmuserauth mmnfs export add: Command failed. Examine previous error messages to determine cause. [root at LH20-GPFS1 ~]# mmuserauth service list FILE access not configured PARAMETERS VALUES ------------------------------------------------- OBJECT access not configured PARAMETERS VALUES ------------------------------------------------- [root at LH20-GPFS1 ~]# Some additional information on cluster: ============================== [root at LH20-GPFS1 ~]# mmlsmgr file system manager node ---------------- ------------------ fs_gpfs01 10.10.158.61 (LH20-GPFS1) Cluster manager node: 10.10.158.61 (LH20-GPFS1) [root at LH20-GPFS1 ~]# mmgetstate -a Node number Node name GPFS state ------------------------------------------ 1 LH20-GPFS1 active 3 LH20-GPFS2 active [root at LH20-GPFS1 ~]# mmlscluster GPFS cluster information ======================== GPFS cluster name: LH20-GPFS1 GPFS cluster id: 10777108240438931454 GPFS UID domain: LH20-GPFS1 Remote shell command: /usr/bin/ssh Remote file copy command: /usr/bin/scp Repository type: CCR Node Daemon node name IP address Admin node name Designation -------------------------------------------------------------------- 1 LH20-GPFS1 10.10.158.61 LH20-GPFS1 quorum-manager 3 LH20-GPFS2 10.10.158.62 LH20-GPFS2 quorum From jonathan at buzzard.me.uk Mon Jul 24 12:43:10 2017 From: jonathan at buzzard.me.uk (Jonathan Buzzard) Date: Mon, 24 Jul 2017 12:43:10 +0100 Subject: [gpfsug-discuss] GPFS, LTFS/EE and data-in-inode? In-Reply-To: <28986.1500671597@turing-police.cc.vt.edu> References: <28986.1500671597@turing-police.cc.vt.edu> Message-ID: <1500896590.4387.167.camel@buzzard.me.uk> On Fri, 2017-07-21 at 17:13 -0400, valdis.kletnieks at vt.edu wrote: > So we're running GPFS 4.2.2.3 and LTFS/EE 1.2.3 to use as an archive service. > Inode size is 4K, and we had a requirement to encrypt-at-rest, so encryption > is in play as well. Data is replicated 2x and fragment size is 32K. > For an archive service how about only accepting files in actual "archive" formats and then severely restricting the number of files a user can have? By archive files I am thinking like a .zip, tar.gz, tar.bz or similar. Has a number of effects. Firstly it makes the files "big" so they move to tape efficiently. It also makes it less likely the end user will try and use it as an general purpose file server. As it's an archive there should be no problem for the user to bundle all the files into a .zip file or similar. Noting that Windows Vista and up handle ZIP64 files getting around the older 4GB and 65k files limit. JAB. -- Jonathan A. Buzzard Email: jonathan (at) buzzard.me.uk Fife, United Kingdom. From stefan.dietrich at desy.de Mon Jul 24 13:19:47 2017 From: stefan.dietrich at desy.de (Dietrich, Stefan) Date: Mon, 24 Jul 2017 14:19:47 +0200 (CEST) Subject: [gpfsug-discuss] Scale / Perfmon / Grafana - services running but no data In-Reply-To: References: Message-ID: <1981958989.2609398.1500898787132.JavaMail.zimbra@desy.de> Yep, have look at this Gist [1] The unit files assumes some paths and users, which are created during the installation of my RPM. [1] https://gist.github.com/stdietrich/b3b985f872ea648d6c03bb6249c44e72 Regards, Stefan ----- Original Message ----- > From: "Greg Lehmann" > To: gpfsug-discuss at spectrumscale.org > Sent: Wednesday, July 19, 2017 9:53:58 AM > Subject: Re: [gpfsug-discuss] Scale / Perfmon / Grafana - services running but no data > I?m having a play with this now too. Has anybody coded a systemd unit to handle > step 2b in the knowledge centre article ? bridge creation on the gpfs side? It > would save me a bit of effort. > > > > I?m also wondering about the CherryPy version. It looks like this has been > developed on SLES which has the newer version mentioned as a standard package > and yet RHEL with an older version of CherryPy is perhaps more common as it > seems to have the best support for features of GPFS, like object and block > protocols. Maybe SLES is in favour now? > > > > Cheers, > > > > Greg > > > > From: gpfsug-discuss-bounces at spectrumscale.org > [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Andrew Beattie > Sent: Thursday, 6 July 2017 3:07 PM > To: gpfsug-discuss at spectrumscale.org > Subject: [gpfsug-discuss] Scale / Perfmon / Grafana - services running but no > data > > > > > Greetings, > > > > > > > > > I'm currently setting up Grafana to interact with one of our Scale Clusters > > > and i've followed the knowledge centre link in terms of setup. > > > > > > [ > https://www.ibm.com/support/knowledgecenter/en/STXKQY_4.2.3/com.ibm.spectrum.scale.v4r23.doc/bl1adv_setuppmbridgeforgrafana.htm > | > https://www.ibm.com/support/knowledgecenter/en/STXKQY_4.2.3/com.ibm.spectrum.scale.v4r23.doc/bl1adv_setuppmbridgeforgrafana.htm > ] > > > > > > However while everything appears to be working i'm not seeing any data coming > through the reports within the grafana server, even though I can see data in > the Scale GUI > > > > > > The current environment: > > > > > > [root at sc01n02 ~]# mmlscluster > > > GPFS cluster information > ======================== > GPFS cluster name: sc01.spectrum > GPFS cluster id: 18085710661892594990 > GPFS UID domain: sc01.spectrum > Remote shell command: /usr/bin/ssh > Remote file copy command: /usr/bin/scp > Repository type: CCR > > > Node Daemon node name IP address Admin node name Designation > ------------------------------------------------------------------ > 1 sc01n01 10.2.12.11 sc01n01 quorum-manager-perfmon > 2 sc01n02 10.2.12.12 sc01n02 quorum-manager-perfmon > 3 sc01n03 10.2.12.13 sc01n03 quorum-manager-perfmon > > > [root at sc01n02 ~]# > > > > > > > > > [root at sc01n02 ~]# mmlsconfig > Configuration data for cluster sc01.spectrum: > --------------------------------------------- > clusterName sc01.spectrum > clusterId 18085710661892594990 > autoload yes > profile gpfsProtocolDefaults > dmapiFileHandleSize 32 > minReleaseLevel 4.2.2.0 > ccrEnabled yes > cipherList AUTHONLY > maxblocksize 16M > [cesNodes] > maxMBpS 5000 > numaMemoryInterleave yes > enforceFilesetQuotaOnRoot yes > workerThreads 512 > [common] > tscCmdPortRange 60000-61000 > cesSharedRoot /ibm/cesSharedRoot/ces > cifsBypassTraversalChecking yes > syncSambaMetadataOps yes > cifsBypassShareLocksOnRename yes > adminMode central > > > File systems in cluster sc01.spectrum: > -------------------------------------- > /dev/cesSharedRoot > /dev/icos_demo > /dev/scale01 > [root at sc01n02 ~]# > > > > > > > > > [root at sc01n02 ~]# systemctl status pmcollector > ? pmcollector.service - LSB: Start the ZIMon performance monitor collector. > Loaded: loaded (/etc/rc.d/init.d/pmcollector) > Active: active (running) since Tue 2017-05-30 08:46:32 AEST; 1 months 6 days ago > Docs: man:systemd-sysv-generator(8) > Main PID: 2693 (ZIMonCollector) > CGroup: /system.slice/pmcollector.service > ??2693 /opt/IBM/zimon/ZIMonCollector -C /opt/IBM/zimon/ZIMonCollector.cfg... > ??2698 python /opt/IBM/zimon/bin/pmmonitor.py -f /opt/IBM/zimon/syshealth... > > > May 30 08:46:32 sc01n02 systemd[1]: Starting LSB: Start the ZIMon performance > mon...... > May 30 08:46:32 sc01n02 pmcollector[2584]: Starting performance monitor > collector... > May 30 08:46:32 sc01n02 systemd[1]: Started LSB: Start the ZIMon performance > moni...r.. > Hint: Some lines were ellipsized, use -l to show in full. > > > > > > From Grafana Server: > > > > > > > > > > > > > > > when I send a set of files to the cluster (3.8GB) I can see performance metrics > within the Scale GUI > > > > > > > > > > > > yet from the Grafana Dashboard im not seeing any data points > > > > > > > > > > > > Can anyone provide some hints as to what might be happening? > > > > > > > > > > > > Regards, > > > > > > > > > Andrew Beattie > > > Software Defined Storage - IT Specialist > > > Phone: 614-2133-7927 > > > E-mail: [ mailto:abeattie at au1.ibm.com | abeattie at au1.ibm.com ] > > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss From jjdoherty at yahoo.com Mon Jul 24 14:11:12 2017 From: jjdoherty at yahoo.com (Jim Doherty) Date: Mon, 24 Jul 2017 13:11:12 +0000 (UTC) Subject: [gpfsug-discuss] Gpfs Memory Usaage Keeps going up and we don't know why. References: <261384244.3866909.1500901872347.ref@mail.yahoo.com> Message-ID: <261384244.3866909.1500901872347@mail.yahoo.com> There are 3 places that the GPFS mmfsd uses memory? the pagepool? plus 2 shared memory segments.?? To see the memory utilization of the shared memory segments run the command?? mmfsadm dump malloc .??? The statistics for memory pool id 2 is where? maxFilesToCache/maxStatCache objects are? and the manager nodes use memory pool id 3 to track the MFTC/MSC objects.?? You might want to upgrade to later PTF? as there was a PTF to fix a memory leak that occurred in tscomm associated with network connection drops.?? On Monday, July 24, 2017 5:29 AM, Peter Childs wrote: We have two GPFS clusters. One is fairly old and running 4.2.1-2 and non CCR and the nodes run fine using up about 1.5G of memory and is consistent (GPFS pagepool is set to 1G, so that looks about right.) The other one is "newer" running 4.2.1-3 with CCR and the nodes keep increasing in there memory usage, starting at about 1.1G and are find for a few days however after a while they grow to 4.2G which when the node need to run real work, means the work can't be done. I'm losing track of what maybe different other than CCR, and I'm trying to find some more ideas of where to look. I'm checked all the standard things like pagepool and maxFilesToCache (set to the default of 4000), workerThreads is set to 128 on the new gpfs cluster (against default 48 on the old) I'm not sure what else to look at on this one hence why I'm asking the community. Thanks in advance Peter Childs ITS Research Storage Queen Mary University of London. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From p.childs at qmul.ac.uk Mon Jul 24 14:30:49 2017 From: p.childs at qmul.ac.uk (Peter Childs) Date: Mon, 24 Jul 2017 13:30:49 +0000 Subject: [gpfsug-discuss] Gpfs Memory Usaage Keeps going up and we don't know why. In-Reply-To: <261384244.3866909.1500901872347@mail.yahoo.com> References: <261384244.3866909.1500901872347.ref@mail.yahoo.com> <261384244.3866909.1500901872347@mail.yahoo.com> Message-ID: <1500903047.571.7.camel@qmul.ac.uk> I've had a look at mmfsadm dump malloc and it looks to agree with the output from mmdiag --memory. and does not seam to account for the excessive memory usage. The new machines do have idleSocketTimout set to 0 from what your saying it could be related to keeping that many connections between nodes working. Thanks in advance Peter. [root at dn29 ~]# mmdiag --memory === mmdiag: memory === mmfsd heap size: 2039808 bytes Statistics for MemoryPool id 1 ("Shared Segment (EPHEMERAL)") 128 bytes in use 17500049370 hard limit on memory usage 1048576 bytes committed to regions 1 number of regions 555 allocations 555 frees 0 allocation failures Statistics for MemoryPool id 2 ("Shared Segment") 42179592 bytes in use 17500049370 hard limit on memory usage 56623104 bytes committed to regions 9 number of regions 100027 allocations 79624 frees 0 allocation failures Statistics for MemoryPool id 3 ("Token Manager") 2099520 bytes in use 17500049370 hard limit on memory usage 16778240 bytes committed to regions 1 number of regions 4 allocations 0 frees 0 allocation failures On Mon, 2017-07-24 at 13:11 +0000, Jim Doherty wrote: There are 3 places that the GPFS mmfsd uses memory the pagepool plus 2 shared memory segments. To see the memory utilization of the shared memory segments run the command mmfsadm dump malloc . The statistics for memory pool id 2 is where maxFilesToCache/maxStatCache objects are and the manager nodes use memory pool id 3 to track the MFTC/MSC objects. You might want to upgrade to later PTF as there was a PTF to fix a memory leak that occurred in tscomm associated with network connection drops. On Monday, July 24, 2017 5:29 AM, Peter Childs wrote: We have two GPFS clusters. One is fairly old and running 4.2.1-2 and non CCR and the nodes run fine using up about 1.5G of memory and is consistent (GPFS pagepool is set to 1G, so that looks about right.) The other one is "newer" running 4.2.1-3 with CCR and the nodes keep increasing in there memory usage, starting at about 1.1G and are find for a few days however after a while they grow to 4.2G which when the node need to run real work, means the work can't be done. I'm losing track of what maybe different other than CCR, and I'm trying to find some more ideas of where to look. I'm checked all the standard things like pagepool and maxFilesToCache (set to the default of 4000), workerThreads is set to 128 on the new gpfs cluster (against default 48 on the old) I'm not sure what else to look at on this one hence why I'm asking the community. Thanks in advance Peter Childs ITS Research Storage Queen Mary University of London. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -- Peter Childs ITS Research Storage Queen Mary, University of London -------------- next part -------------- An HTML attachment was scrubbed... URL: From jjdoherty at yahoo.com Mon Jul 24 15:10:45 2017 From: jjdoherty at yahoo.com (Jim Doherty) Date: Mon, 24 Jul 2017 14:10:45 +0000 (UTC) Subject: [gpfsug-discuss] Gpfs Memory Usaage Keeps going up and we don't know why. In-Reply-To: <1500903047.571.7.camel@qmul.ac.uk> References: <261384244.3866909.1500901872347.ref@mail.yahoo.com> <261384244.3866909.1500901872347@mail.yahoo.com> <1500903047.571.7.camel@qmul.ac.uk> Message-ID: <1770436429.3911327.1500905445052@mail.yahoo.com> How are you identifying? the high memory usage???? On Monday, July 24, 2017 9:30 AM, Peter Childs wrote: I've had a look at mmfsadm dump malloc and it looks to agree with the output from mmdiag --memory. and does not seam to account for the excessive memory usage. The new machines do have idleSocketTimout set to 0 from what your saying it could be related to keeping that many connections between nodes working. Thanks in advance Peter. [root at dn29 ~]# mmdiag --memory === mmdiag: memory ===mmfsd heap size: 2039808 bytes Statistics for MemoryPool id 1 ("Shared Segment (EPHEMERAL)")???????????128 bytes in use???17500049370 hard limit on memory usage???????1048576 bytes committed to regions?????????????1 number of regions???????????555 allocations???????????555 frees?????????????0 allocation failures Statistics for MemoryPool id 2 ("Shared Segment")??????42179592 bytes in use???17500049370 hard limit on memory usage??????56623104 bytes committed to regions?????????????9 number of regions????????100027 allocations?????????79624 frees?????????????0 allocation failures Statistics for MemoryPool id 3 ("Token Manager")???????2099520 bytes in use???17500049370 hard limit on memory usage??????16778240 bytes committed to regions?????????????1 number of regions?????????????4 allocations?????????????0 frees?????????????0 allocation failures On Mon, 2017-07-24 at 13:11 +0000, Jim Doherty wrote: There are 3 places that the GPFS mmfsd uses memory? the pagepool? plus 2 shared memory segments.?? To see the memory utilization of the shared memory segments run the command?? mmfsadm dump malloc .??? The statistics for memory pool id 2 is where? maxFilesToCache/maxStatCache objects are? and the manager nodes use memory pool id 3 to track the MFTC/MSC objects.?? You might want to upgrade to later PTF? as there was a PTF to fix a memory leak that occurred in tscomm associated with network connection drops.?? On Monday, July 24, 2017 5:29 AM, Peter Childs wrote: We have two GPFS clusters. One is fairly old and running 4.2.1-2 and non CCR and the nodes run fine using up about 1.5G of memory and is consistent (GPFS pagepool is set to 1G, so that looks about right.) The other one is "newer" running 4.2.1-3 with CCR and the nodes keep increasing in there memory usage, starting at about 1.1G and are find for a few days however after a while they grow to 4.2G which when the node need to run real work, means the work can't be done. I'm losing track of what maybe different other than CCR, and I'm trying to find some more ideas of where to look. I'm checked all the standard things like pagepool and maxFilesToCache (set to the default of 4000), workerThreads is set to 128 on the new gpfs cluster (against default 48 on the old) I'm not sure what else to look at on this one hence why I'm asking the community. Thanks in advance Peter Childs ITS Research Storage Queen Mary University of London. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -- Peter ChildsITS Research StorageQueen Mary, University of London _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From p.childs at qmul.ac.uk Mon Jul 24 15:21:27 2017 From: p.childs at qmul.ac.uk (Peter Childs) Date: Mon, 24 Jul 2017 14:21:27 +0000 Subject: [gpfsug-discuss] Gpfs Memory Usaage Keeps going up and we don't know why. In-Reply-To: <1770436429.3911327.1500905445052@mail.yahoo.com> References: <261384244.3866909.1500901872347.ref@mail.yahoo.com> <261384244.3866909.1500901872347@mail.yahoo.com> <1500903047.571.7.camel@qmul.ac.uk> <1770436429.3911327.1500905445052@mail.yahoo.com> Message-ID: <1500906086.571.9.camel@qmul.ac.uk> top but ps gives the same value. [root at dn29 ~]# ps auww -q 4444 USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND root 4444 2.7 22.3 10537600 5472580 ? S wrote: I've had a look at mmfsadm dump malloc and it looks to agree with the output from mmdiag --memory. and does not seam to account for the excessive memory usage. The new machines do have idleSocketTimout set to 0 from what your saying it could be related to keeping that many connections between nodes working. Thanks in advance Peter. [root at dn29 ~]# mmdiag --memory === mmdiag: memory === mmfsd heap size: 2039808 bytes Statistics for MemoryPool id 1 ("Shared Segment (EPHEMERAL)") 128 bytes in use 17500049370 hard limit on memory usage 1048576 bytes committed to regions 1 number of regions 555 allocations 555 frees 0 allocation failures Statistics for MemoryPool id 2 ("Shared Segment") 42179592 bytes in use 17500049370 hard limit on memory usage 56623104 bytes committed to regions 9 number of regions 100027 allocations 79624 frees 0 allocation failures Statistics for MemoryPool id 3 ("Token Manager") 2099520 bytes in use 17500049370 hard limit on memory usage 16778240 bytes committed to regions 1 number of regions 4 allocations 0 frees 0 allocation failures On Mon, 2017-07-24 at 13:11 +0000, Jim Doherty wrote: There are 3 places that the GPFS mmfsd uses memory the pagepool plus 2 shared memory segments. To see the memory utilization of the shared memory segments run the command mmfsadm dump malloc . The statistics for memory pool id 2 is where maxFilesToCache/maxStatCache objects are and the manager nodes use memory pool id 3 to track the MFTC/MSC objects. You might want to upgrade to later PTF as there was a PTF to fix a memory leak that occurred in tscomm associated with network connection drops. On Monday, July 24, 2017 5:29 AM, Peter Childs wrote: We have two GPFS clusters. One is fairly old and running 4.2.1-2 and non CCR and the nodes run fine using up about 1.5G of memory and is consistent (GPFS pagepool is set to 1G, so that looks about right.) The other one is "newer" running 4.2.1-3 with CCR and the nodes keep increasing in there memory usage, starting at about 1.1G and are find for a few days however after a while they grow to 4.2G which when the node need to run real work, means the work can't be done. I'm losing track of what maybe different other than CCR, and I'm trying to find some more ideas of where to look. I'm checked all the standard things like pagepool and maxFilesToCache (set to the default of 4000), workerThreads is set to 128 on the new gpfs cluster (against default 48 on the old) I'm not sure what else to look at on this one hence why I'm asking the community. Thanks in advance Peter Childs ITS Research Storage Queen Mary University of London. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -- Peter Childs ITS Research Storage Queen Mary, University of London _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -- Peter Childs ITS Research Storage Queen Mary, University of London -------------- next part -------------- An HTML attachment was scrubbed... URL: From adam.huffman at crick.ac.uk Mon Jul 24 15:40:51 2017 From: adam.huffman at crick.ac.uk (Adam Huffman) Date: Mon, 24 Jul 2017 14:40:51 +0000 Subject: [gpfsug-discuss] Gpfs Memory Usaage Keeps going up and we don't know why. In-Reply-To: <1500906086.571.9.camel@qmul.ac.uk> References: <261384244.3866909.1500901872347.ref@mail.yahoo.com> <261384244.3866909.1500901872347@mail.yahoo.com> <1500903047.571.7.camel@qmul.ac.uk> <1770436429.3911327.1500905445052@mail.yahoo.com> <1500906086.571.9.camel@qmul.ac.uk> Message-ID: <1CC632F0-55DB-4185-8177-B814A2F8A874@crick.ac.uk> smem is recommended here Cheers, Adam -- Adam Huffman Senior HPC and Cloud Systems Engineer The Francis Crick Institute 1 Midland Road London NW1 1AT T: 020 3796 1175 E: adam.huffman at crick.ac.uk W: www.crick.ac.uk On 24 Jul 2017, at 15:21, Peter Childs > wrote: top but ps gives the same value. [root at dn29 ~]# ps auww -q 4444 USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND root 4444 2.7 22.3 10537600 5472580 ? S> wrote: I've had a look at mmfsadm dump malloc and it looks to agree with the output from mmdiag --memory. and does not seam to account for the excessive memory usage. The new machines do have idleSocketTimout set to 0 from what your saying it could be related to keeping that many connections between nodes working. Thanks in advance Peter. [root at dn29 ~]# mmdiag --memory === mmdiag: memory === mmfsd heap size: 2039808 bytes Statistics for MemoryPool id 1 ("Shared Segment (EPHEMERAL)") 128 bytes in use 17500049370 hard limit on memory usage 1048576 bytes committed to regions 1 number of regions 555 allocations 555 frees 0 allocation failures Statistics for MemoryPool id 2 ("Shared Segment") 42179592 bytes in use 17500049370 hard limit on memory usage 56623104 bytes committed to regions 9 number of regions 100027 allocations 79624 frees 0 allocation failures Statistics for MemoryPool id 3 ("Token Manager") 2099520 bytes in use 17500049370 hard limit on memory usage 16778240 bytes committed to regions 1 number of regions 4 allocations 0 frees 0 allocation failures On Mon, 2017-07-24 at 13:11 +0000, Jim Doherty wrote: There are 3 places that the GPFS mmfsd uses memory the pagepool plus 2 shared memory segments. To see the memory utilization of the shared memory segments run the command mmfsadm dump malloc . The statistics for memory pool id 2 is where maxFilesToCache/maxStatCache objects are and the manager nodes use memory pool id 3 to track the MFTC/MSC objects. You might want to upgrade to later PTF as there was a PTF to fix a memory leak that occurred in tscomm associated with network connection drops. On Monday, July 24, 2017 5:29 AM, Peter Childs > wrote: We have two GPFS clusters. One is fairly old and running 4.2.1-2 and non CCR and the nodes run fine using up about 1.5G of memory and is consistent (GPFS pagepool is set to 1G, so that looks about right.) The other one is "newer" running 4.2.1-3 with CCR and the nodes keep increasing in there memory usage, starting at about 1.1G and are find for a few days however after a while they grow to 4.2G which when the node need to run real work, means the work can't be done. I'm losing track of what maybe different other than CCR, and I'm trying to find some more ideas of where to look. I'm checked all the standard things like pagepool and maxFilesToCache (set to the default of 4000), workerThreads is set to 128 on the new gpfs cluster (against default 48 on the old) I'm not sure what else to look at on this one hence why I'm asking the community. Thanks in advance Peter Childs ITS Research Storage Queen Mary University of London. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -- Peter Childs ITS Research Storage Queen Mary, University of London _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -- Peter Childs ITS Research Storage Queen Mary, University of London _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss The Francis Crick Institute Limited is a registered charity in England and Wales no. 1140062 and a company registered in England and Wales no. 06885462, with its registered office at 1 Midland Road London NW1 1AT -------------- next part -------------- An HTML attachment was scrubbed... URL: From jamiedavis at us.ibm.com Mon Jul 24 15:45:26 2017 From: jamiedavis at us.ibm.com (James Davis) Date: Mon, 24 Jul 2017 14:45:26 +0000 Subject: [gpfsug-discuss] GPFS, LTFS/EE and data-in-inode? In-Reply-To: <33069.1500675853@turing-police.cc.vt.edu> References: <33069.1500675853@turing-police.cc.vt.edu>, <28986.1500671597@turing-police.cc.vt.edu> Message-ID: An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: attlisjw.dat Type: application/octet-stream Size: 497 bytes Desc: not available URL: From r.sobey at imperial.ac.uk Mon Jul 24 15:50:57 2017 From: r.sobey at imperial.ac.uk (Sobey, Richard A) Date: Mon, 24 Jul 2017 14:50:57 +0000 Subject: [gpfsug-discuss] GPFS, LTFS/EE and data-in-inode? In-Reply-To: References: <33069.1500675853@turing-police.cc.vt.edu>, <28986.1500671597@turing-police.cc.vt.edu> Message-ID: I suppose the distinction between data, metadata and data IN metadata could be made. Whilst it is clear to me (us) now, perhaps the thought was that the data would be encrypted even if it was stored inside the metadata. My two pence. Richard From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of James Davis Sent: 24 July 2017 15:45 To: gpfsug-discuss at spectrumscale.org Subject: Re: [gpfsug-discuss] GPFS, LTFS/EE and data-in-inode? Hey all, On the documentation of encryption restrictions and encryption/HAWC interplay... The encryption documentation currently states: "Secure storage uses encryption to make data unreadable to anyone who does not possess the necessary encryption keys...Only data, not metadata, is encrypted." The HAWC restrictions include: "Encrypted data is never stored in the recovery log..." If this is unclear, I'm open to suggestions for improvements. Cordially, Jamie ----- Original message ----- From: valdis.kletnieks at vt.edu Sent by: gpfsug-discuss-bounces at spectrumscale.org To: gpfsug main discussion list > Cc: Subject: Re: [gpfsug-discuss] GPFS, LTFS/EE and data-in-inode? Date: Fri, Jul 21, 2017 6:24 PM On Fri, 21 Jul 2017 22:04:32 -0000, Sven Oehme said: > i talked with a few others to confirm this, but unfortunate this is a > limitation of the code today (maybe not well documented which we will look > into). Encryption only encrypts data blocks, it doesn't encrypt metadata. > Hence, if encryption is enabled, we don't store data in the inode, because > then it wouldn't be encrypted. For the same reason HAWC and encryption are > incompatible. I can live with that restriction if it's documented better, thanks... [Document Icon]attq4saq.dat Type: application/pgp-signature Name: attq4saq.dat _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From jonathan at buzzard.me.uk Mon Jul 24 15:57:13 2017 From: jonathan at buzzard.me.uk (Jonathan Buzzard) Date: Mon, 24 Jul 2017 15:57:13 +0100 Subject: [gpfsug-discuss] GPFS, LTFS/EE and data-in-inode? In-Reply-To: References: <33069.1500675853@turing-police.cc.vt.edu> , <28986.1500671597@turing-police.cc.vt.edu> Message-ID: <1500908233.4387.194.camel@buzzard.me.uk> On Mon, 2017-07-24 at 14:45 +0000, James Davis wrote: > Hey all, > > On the documentation of encryption restrictions and encryption/HAWC > interplay... > > The encryption documentation currently states: > > "Secure storage uses encryption to make data unreadable to anyone who > does not possess the necessary encryption keys...Only data, not > metadata, is encrypted." > > The HAWC restrictions include: > > "Encrypted data is never stored in the recovery log..." > > If this is unclear, I'm open to suggestions for improvements. > Just because *DATA* is stored in the metadata does not make it magically metadata. It's still data so you could quite reasonably conclude that it is encrypted. We have now been disabused of this, but the documentation is not clear and needs clarifying. Perhaps say metadata blocks are not encrypted. Or just a simple data stored in inodes is not encrypted would suffice. JAB. -- Jonathan A. Buzzard Email: jonathan (at) buzzard.me.uk Fife, United Kingdom. From valdis.kletnieks at vt.edu Mon Jul 24 16:49:07 2017 From: valdis.kletnieks at vt.edu (valdis.kletnieks at vt.edu) Date: Mon, 24 Jul 2017 11:49:07 -0400 Subject: [gpfsug-discuss] GPFS, LTFS/EE and data-in-inode? In-Reply-To: <1500896590.4387.167.camel@buzzard.me.uk> References: <28986.1500671597@turing-police.cc.vt.edu> <1500896590.4387.167.camel@buzzard.me.uk> Message-ID: <17702.1500911347@turing-police.cc.vt.edu> On Mon, 24 Jul 2017 12:43:10 +0100, Jonathan Buzzard said: > For an archive service how about only accepting files in actual > "archive" formats and then severely restricting the number of files a > user can have? > > By archive files I am thinking like a .zip, tar.gz, tar.bz or similar. After having dealt with users who fill up disk storage for almost 4 decades now, I'm fully aware of those advantages. :) ( /me ponders when an IBM 2314 disk pack with 27M of space was "a lot" in 1978, and when we moved 2 IBM mainframes in 1989, 400G took 2,500+ square feet, and now 8T drives are all over the place...) On the flip side, my current project is migrating 5 petabytes of data from our old archive system that didn't have such rules (mostly due to politics and the fact that the underlying XFS filesystem uses a 4K blocksize so it wasn't as big an issue), so I'm stuck with what people put in there years ago. -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 486 bytes Desc: not available URL: From Mark.Bush at siriuscom.com Mon Jul 24 16:49:26 2017 From: Mark.Bush at siriuscom.com (Mark Bush) Date: Mon, 24 Jul 2017 15:49:26 +0000 Subject: [gpfsug-discuss] export nfs share on gpfs with no authentication In-Reply-To: References: Message-ID: Ilan, you must create some type of authentication mechanism for CES to work properly first. If you want a quick and dirty way that would just use your local /etc/passwd try this. /usr/lpp/mmfs/bin/mmuserauth service create --data-access-method file --type userdefined Mark -----Original Message----- From: Ilan Schwarts [mailto:ilan84 at gmail.com] Sent: Monday, July 24, 2017 5:37 AM To: gpfsug main discussion list Subject: [gpfsug-discuss] export nfs share on gpfs with no authentication Hi, I have gpfs with 2 Nodes (redhat). I am trying to create NFS share - So I would be able to mount and access it from another linux machine. I receive error: Current authentication: none is invalid. What do i need to configure ? PLEASE NOTE: I dont have the SMB package at the moment, I dont want authentication on the NFS export.. While trying to create NFS (I execute the following): [root at LH20-GPFS1 ~]# mmnfs export add /fs_gpfs01 -c "* Access_Type=RW,Protocols=3:4,Squash=no_root_squash)" I receive the following error: [root at LH20-GPFS1 ~]# mmnfs export add /fs_gpfs01 -c "*(Access_Type=RW,Protocols=3:4,Squash=no_root_squash)" mmcesfuncs.sh: Current authentication: none is invalid. This operation can not be completed without correct Authentication configuration. Configure authentication using: mmuserauth mmnfs export add: Command failed. Examine previous error messages to determine cause. [root at LH20-GPFS1 ~]# mmuserauth service list FILE access not configured PARAMETERS VALUES ------------------------------------------------- OBJECT access not configured PARAMETERS VALUES ------------------------------------------------- [root at LH20-GPFS1 ~]# Some additional information on cluster: ============================== [root at LH20-GPFS1 ~]# mmlsmgr file system manager node ---------------- ------------------ fs_gpfs01 10.10.158.61 (LH20-GPFS1) Cluster manager node: 10.10.158.61 (LH20-GPFS1) [root at LH20-GPFS1 ~]# mmgetstate -a Node number Node name GPFS state ------------------------------------------ 1 LH20-GPFS1 active 3 LH20-GPFS2 active [root at LH20-GPFS1 ~]# mmlscluster GPFS cluster information ======================== GPFS cluster name: LH20-GPFS1 GPFS cluster id: 10777108240438931454 GPFS UID domain: LH20-GPFS1 Remote shell command: /usr/bin/ssh Remote file copy command: /usr/bin/scp Repository type: CCR Node Daemon node name IP address Admin node name Designation -------------------------------------------------------------------- 1 LH20-GPFS1 10.10.158.61 LH20-GPFS1 quorum-manager 3 LH20-GPFS2 10.10.158.62 LH20-GPFS2 quorum _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss This message (including any attachments) is intended only for the use of the individual or entity to which it is addressed and may contain information that is non-public, proprietary, privileged, confidential, and exempt from disclosure under applicable law. If you are not the intended recipient, you are hereby notified that any use, dissemination, distribution, or copying of this communication is strictly prohibited. This message may be viewed by parties at Sirius Computer Solutions other than those named in the message header. This message does not contain an official representation of Sirius Computer Solutions. If you have received this communication in error, notify Sirius Computer Solutions immediately and (i) destroy this message if a facsimile or (ii) delete this message immediately if this is an electronic communication. Thank you. Sirius Computer Solutions From valdis.kletnieks at vt.edu Mon Jul 24 17:35:34 2017 From: valdis.kletnieks at vt.edu (valdis.kletnieks at vt.edu) Date: Mon, 24 Jul 2017 12:35:34 -0400 Subject: [gpfsug-discuss] export nfs share on gpfs with no authentication In-Reply-To: References: Message-ID: <27469.1500914134@turing-police.cc.vt.edu> On Mon, 24 Jul 2017 13:36:41 +0300, Ilan Schwarts said: > Hi, > I have gpfs with 2 Nodes (redhat). > I am trying to create NFS share - So I would be able to mount and > access it from another linux machine. > While trying to create NFS (I execute the following): > [root at LH20-GPFS1 ~]# mmnfs export add /fs_gpfs01 -c "* > Access_Type=RW,Protocols=3:4,Squash=no_root_squash)" You can get away with little to no authentication for NFSv3, but not for NFSv4. Try with Protocols=3 only and mmuserauth service create --type userdefined that should get you Unix-y NFSv3 UID/GID support and "trust what the NFS client tells you". This of course only works sanely if each NFS export is only to a set of machines in the same administrative domain that manages their UID/GIDs. Exporting to two sets of machines that don't coordinate their UID/GID space is, of course, where hilarity and hijinks ensue.... -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 486 bytes Desc: not available URL: From luke.raimbach at googlemail.com Mon Jul 24 23:23:03 2017 From: luke.raimbach at googlemail.com (Luke Raimbach) Date: Mon, 24 Jul 2017 22:23:03 +0000 Subject: [gpfsug-discuss] Gpfs Memory Usaage Keeps going up and we don't know why. In-Reply-To: <1CC632F0-55DB-4185-8177-B814A2F8A874@crick.ac.uk> References: <261384244.3866909.1500901872347.ref@mail.yahoo.com> <261384244.3866909.1500901872347@mail.yahoo.com> <1500903047.571.7.camel@qmul.ac.uk> <1770436429.3911327.1500905445052@mail.yahoo.com> <1500906086.571.9.camel@qmul.ac.uk> <1CC632F0-55DB-4185-8177-B814A2F8A874@crick.ac.uk> Message-ID: Switch of CCR and see what happens. On Mon, 24 Jul 2017, 15:40 Adam Huffman, wrote: > smem is recommended here > > Cheers, > Adam > > -- > > Adam Huffman > Senior HPC and Cloud Systems Engineer > The Francis Crick Institute > 1 Midland Road > London NW1 1AT > > T: 020 3796 1175 > E: adam.huffman at crick.ac.uk > W: www.crick.ac.uk > > > > > > On 24 Jul 2017, at 15:21, Peter Childs wrote: > > > top > > but ps gives the same value. > > [root at dn29 ~]# ps auww -q 4444 > USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND > root 4444 2.7 22.3 10537600 5472580 ? S /usr/lpp/mmfs/bin/mmfsd > > Thanks for the help > > Peter. > > > On Mon, 2017-07-24 at 14:10 +0000, Jim Doherty wrote: > > How are you identifying the high memory usage? > > > On Monday, July 24, 2017 9:30 AM, Peter Childs > wrote: > > > I've had a look at mmfsadm dump malloc and it looks to agree with the > output from mmdiag --memory. and does not seam to account for the excessive > memory usage. > > The new machines do have idleSocketTimout set to 0 from what your saying > it could be related to keeping that many connections between nodes working. > > Thanks in advance > > Peter. > > > > > [root at dn29 ~]# mmdiag --memory > > === mmdiag: memory === > mmfsd heap size: 2039808 bytes > > > Statistics for MemoryPool id 1 ("Shared Segment (EPHEMERAL)") > 128 bytes in use > 17500049370 hard limit on memory usage > 1048576 bytes committed to regions > 1 number of regions > 555 allocations > 555 frees > 0 allocation failures > > > Statistics for MemoryPool id 2 ("Shared Segment") > 42179592 bytes in use > 17500049370 hard limit on memory usage > 56623104 bytes committed to regions > 9 number of regions > 100027 allocations > 79624 frees > 0 allocation failures > > > Statistics for MemoryPool id 3 ("Token Manager") > 2099520 bytes in use > 17500049370 hard limit on memory usage > 16778240 bytes committed to regions > 1 number of regions > 4 allocations > 0 frees > 0 allocation failures > > > On Mon, 2017-07-24 at 13:11 +0000, Jim Doherty wrote: > > There are 3 places that the GPFS mmfsd uses memory the pagepool plus 2 > shared memory segments. To see the memory utilization of the shared > memory segments run the command mmfsadm dump malloc . The statistics > for memory pool id 2 is where maxFilesToCache/maxStatCache objects are > and the manager nodes use memory pool id 3 to track the MFTC/MSC objects. > > You might want to upgrade to later PTF as there was a PTF to fix a memory > leak that occurred in tscomm associated with network connection drops. > > > On Monday, July 24, 2017 5:29 AM, Peter Childs > wrote: > > > We have two GPFS clusters. > > One is fairly old and running 4.2.1-2 and non CCR and the nodes run > fine using up about 1.5G of memory and is consistent (GPFS pagepool is > set to 1G, so that looks about right.) > > The other one is "newer" running 4.2.1-3 with CCR and the nodes keep > increasing in there memory usage, starting at about 1.1G and are find > for a few days however after a while they grow to 4.2G which when the > node need to run real work, means the work can't be done. > > I'm losing track of what maybe different other than CCR, and I'm trying > to find some more ideas of where to look. > > I'm checked all the standard things like pagepool and maxFilesToCache > (set to the default of 4000), workerThreads is set to 128 on the new > gpfs cluster (against default 48 on the old) > > I'm not sure what else to look at on this one hence why I'm asking the > community. > > Thanks in advance > > Peter Childs > ITS Research Storage > Queen Mary University of London. > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.orghttp://gpfsug.org/mailman/listinfo/gpfsug-discuss > > -- > > Peter Childs > ITS Research Storage > Queen Mary, University of London > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.orghttp://gpfsug.org/mailman/listinfo/gpfsug-discuss > > -- > > Peter Childs > ITS Research Storage > Queen Mary, University of London > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > The Francis Crick Institute Limited is a registered charity in England and > Wales no. 1140062 and a company registered in England and Wales no. > 06885462, with its registered office at 1 Midland Road London NW1 1AT > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ilan84 at gmail.com Tue Jul 25 05:52:11 2017 From: ilan84 at gmail.com (Ilan Schwarts) Date: Tue, 25 Jul 2017 07:52:11 +0300 Subject: [gpfsug-discuss] export nfs share on gpfs with no authentication In-Reply-To: <27469.1500914134@turing-police.cc.vt.edu> References: <27469.1500914134@turing-police.cc.vt.edu> Message-ID: Hi, While trying to add the userdefined auth, I receive error that SMB service not enabled. I am currently working on a spectrum scale cluster, and i dont have the SMB package, I am waiting for it.. is there a way to export NFSv3 using the spectrum scale tools without SMB package ? [root at LH20-GPFS1 ~]# mmuserauth service create --type userdefined : SMB service not enabled. Enable SMB service first. mmcesuserauthcrservice: Command failed. Examine previous error messages to determine cause. I exported the NFS via /etc/exports and than ./exportfs -a .. It works fine, I was able to mount the gpfs export from another machine.. this was my work-around since the spectrum scale tools failed to export NFSv3 On Mon, Jul 24, 2017 at 7:35 PM, wrote: > On Mon, 24 Jul 2017 13:36:41 +0300, Ilan Schwarts said: >> Hi, >> I have gpfs with 2 Nodes (redhat). >> I am trying to create NFS share - So I would be able to mount and >> access it from another linux machine. > >> While trying to create NFS (I execute the following): >> [root at LH20-GPFS1 ~]# mmnfs export add /fs_gpfs01 -c "* >> Access_Type=RW,Protocols=3:4,Squash=no_root_squash)" > > You can get away with little to no authentication for NFSv3, but > not for NFSv4. Try with Protocols=3 only and > > mmuserauth service create --type userdefined > > that should get you Unix-y NFSv3 UID/GID support and "trust what the NFS > client tells you". This of course only works sanely if each NFS export is > only to a set of machines in the same administrative domain that manages their > UID/GIDs. Exporting to two sets of machines that don't coordinate their > UID/GID space is, of course, where hilarity and hijinks ensue.... > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -- - Ilan Schwarts From ulmer at ulmer.org Tue Jul 25 06:33:13 2017 From: ulmer at ulmer.org (Stephen Ulmer) Date: Tue, 25 Jul 2017 01:33:13 -0400 Subject: [gpfsug-discuss] GPFS, LTFS/EE and data-in-inode? In-Reply-To: <1500908233.4387.194.camel@buzzard.me.uk> References: <33069.1500675853@turing-police.cc.vt.edu> <28986.1500671597@turing-police.cc.vt.edu> <1500908233.4387.194.camel@buzzard.me.uk> Message-ID: <1233C5A4-A8C9-4A56-AEC3-AE65DBB5D346@ulmer.org> > On Jul 24, 2017, at 10:57 AM, Jonathan Buzzard > wrote: > > On Mon, 2017-07-24 at 14:45 +0000, James Davis wrote: >> Hey all, >> >> On the documentation of encryption restrictions and encryption/HAWC >> interplay... >> >> The encryption documentation currently states: >> >> "Secure storage uses encryption to make data unreadable to anyone who >> does not possess the necessary encryption keys...Only data, not >> metadata, is encrypted." >> >> The HAWC restrictions include: >> >> "Encrypted data is never stored in the recovery log..." >> >> If this is unclear, I'm open to suggestions for improvements. >> > > Just because *DATA* is stored in the metadata does not make it magically > metadata. It's still data so you could quite reasonably conclude that it > is encrypted. > [?] > JAB. +1. Also, "Encrypted data is never stored in the recovery log?" does not make it clear whether: The data that is supposed to be encrypted is not written to the recovery log. The data that is supposed to be encrypted is written to the recovery log, but is not encrypted there. Thanks, -- Stephen -------------- next part -------------- An HTML attachment was scrubbed... URL: From jonathan at buzzard.me.uk Tue Jul 25 10:02:14 2017 From: jonathan at buzzard.me.uk (Jonathan Buzzard) Date: Tue, 25 Jul 2017 10:02:14 +0100 Subject: [gpfsug-discuss] GPFS, LTFS/EE and data-in-inode? In-Reply-To: <17702.1500911347@turing-police.cc.vt.edu> References: <28986.1500671597@turing-police.cc.vt.edu> <1500896590.4387.167.camel@buzzard.me.uk> <17702.1500911347@turing-police.cc.vt.edu> Message-ID: <1500973334.4387.201.camel@buzzard.me.uk> On Mon, 2017-07-24 at 11:49 -0400, valdis.kletnieks at vt.edu wrote: > On Mon, 24 Jul 2017 12:43:10 +0100, Jonathan Buzzard said: > > > For an archive service how about only accepting files in actual > > "archive" formats and then severely restricting the number of files a > > user can have? > > > > By archive files I am thinking like a .zip, tar.gz, tar.bz or similar. > > After having dealt with users who fill up disk storage for almost 4 decades > now, I'm fully aware of those advantages. :) > > ( /me ponders when an IBM 2314 disk pack with 27M of space was "a lot" in 1978, > and when we moved 2 IBM mainframes in 1989, 400G took 2,500+ square feet, and > now 8T drives are all over the place...) > > On the flip side, my current project is migrating 5 petabytes of data from our > old archive system that didn't have such rules (mostly due to politics and the > fact that the underlying XFS filesystem uses a 4K blocksize so it wasn't as big > an issue), so I'm stuck with what people put in there years ago. I would be tempted to zip up the directories and move them ziped ;-) JAB. -- Jonathan A. Buzzard Email: jonathan (at) buzzard.me.uk Fife, United Kingdom. From john.hearns at asml.com Tue Jul 25 10:30:28 2017 From: john.hearns at asml.com (John Hearns) Date: Tue, 25 Jul 2017 09:30:28 +0000 Subject: [gpfsug-discuss] GPFS, LTFS/EE and data-in-inode? In-Reply-To: <1500973334.4387.201.camel@buzzard.me.uk> References: <28986.1500671597@turing-police.cc.vt.edu> <1500896590.4387.167.camel@buzzard.me.uk> <17702.1500911347@turing-police.cc.vt.edu> <1500973334.4387.201.camel@buzzard.me.uk> Message-ID: I agree with Jonathan. In my experience, if you look at why there are many small files being stored by researchers, these are either the results of data acquisition - high speed cameras, microscopes, or in my experience a wind tunnel. Or the images are a sequence of images produced by a simulation which are later post-processed into a movie or Ensight/Paraview format. When questioned, the resaechers will always say "but I would like to keep this data available just in case". In reality those files are never looked at again. And as has been said if you have a tape based archiving system you could end up with thousands of small files being spread all over your tapes. So it is legitimate to make zips / tars of directories like that. I am intrigued to see that GPFS has a policy facility which can call an external program. That is useful. -----Original Message----- From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Jonathan Buzzard Sent: Tuesday, July 25, 2017 11:02 AM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] GPFS, LTFS/EE and data-in-inode? On Mon, 2017-07-24 at 11:49 -0400, valdis.kletnieks at vt.edu wrote: > On Mon, 24 Jul 2017 12:43:10 +0100, Jonathan Buzzard said: > > > For an archive service how about only accepting files in actual > > "archive" formats and then severely restricting the number of files > > a user can have? > > > > By archive files I am thinking like a .zip, tar.gz, tar.bz or similar. > > After having dealt with users who fill up disk storage for almost 4 > decades now, I'm fully aware of those advantages. :) > > ( /me ponders when an IBM 2314 disk pack with 27M of space was "a lot" > in 1978, and when we moved 2 IBM mainframes in 1989, 400G took 2,500+ > square feet, and now 8T drives are all over the place...) > > On the flip side, my current project is migrating 5 petabytes of data > from our old archive system that didn't have such rules (mostly due to > politics and the fact that the underlying XFS filesystem uses a 4K > blocksize so it wasn't as big an issue), so I'm stuck with what people put in there years ago. I would be tempted to zip up the directories and move them ziped ;-) JAB. -- Jonathan A. Buzzard Email: jonathan (at) buzzard.me.uk Fife, United Kingdom. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://emea01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fgpfsug.org%2Fmailman%2Flistinfo%2Fgpfsug-discuss&data=01%7C01%7Cjohn.hearns%40asml.com%7Ce8a4016223414177bf9408d4d33bdb31%7Caf73baa8f5944eb2a39d93e96cad61fc%7C1&sdata=pean0PRBgJJmtbZ7TwO%2BxiSvhKsba%2FRGI9VUCxhp6kM%3D&reserved=0 -- The information contained in this communication and any attachments is confidential and may be privileged, and is for the sole use of the intended recipient(s). Any unauthorized review, use, disclosure or distribution is prohibited. Unless explicitly stated otherwise in the body of this communication or the attachment thereto (if any), the information is provided on an AS-IS basis without any express or implied warranties or liabilities. To the extent you are relying on this information, you are doing so at your own risk. If you are not the intended recipient, please notify the sender immediately by replying to this message and destroy all copies of this message and any attachments. Neither the sender nor the company/group of companies he or she represents shall be liable for the proper and complete transmission of the information contained in this communication, or for any delay in its receipt. From jonathan at buzzard.me.uk Tue Jul 25 12:22:49 2017 From: jonathan at buzzard.me.uk (Jonathan Buzzard) Date: Tue, 25 Jul 2017 12:22:49 +0100 Subject: [gpfsug-discuss] GPFS, LTFS/EE and data-in-inode? In-Reply-To: References: <28986.1500671597@turing-police.cc.vt.edu> <1500896590.4387.167.camel@buzzard.me.uk> <17702.1500911347@turing-police.cc.vt.edu> <1500973334.4387.201.camel@buzzard.me.uk> Message-ID: <1500981769.4387.222.camel@buzzard.me.uk> On Tue, 2017-07-25 at 09:30 +0000, John Hearns wrote: > I agree with Jonathan. > > In my experience, if you look at why there are many small files being > stored by researchers, these are either the results of data acquisition > - high speed cameras, microscopes, or in my experience a wind tunnel. > Or the images are a sequence of images produced by a simulation which > are later post-processed into a movie or Ensight/Paraview format. When > questioned, the resaechers will always say "but I would like to keep > this data available just in case". In reality those files are never > looked at again. And as has been said if you have a tape based > archiving system you could end up with thousands of small files being > spread all over your tapes. So it is legitimate to make zips / tars of > directories like that. > Note that rules on data retention may require them to keep them for 10 years, so it is not unreasonable. Letting them spew thousands of files into an "archive" is not sensible. I was thinking of ways of getting the users to do it, and I guess leaving them with zero available file number quota in the new system would force them to zip up their data so they could add new stuff ;-) Archives in my view should have no quota on the space, only quota's on the number of files. Of course that might not be very popular. On reflection I think I would use a policy to restrict to files ending with .zip/.ZIP only. It's an archive and this format is effectively open source, widely understood and cross platform, and with the ZIP64 version will now stand the test of time too. Given it's an archive I would have a script that ran around setting all the files to immutable 7 days after creation too. Or maybe change the ownership and set a readonly ACL to the original user. Need to stop them changing stuff after the event if you are going to use to as part of your anti research fraud measures. JAB. -- Jonathan A. Buzzard Email: jonathan (at) buzzard.me.uk Fife, United Kingdom. From valdis.kletnieks at vt.edu Tue Jul 25 17:11:45 2017 From: valdis.kletnieks at vt.edu (valdis.kletnieks at vt.edu) Date: Tue, 25 Jul 2017 12:11:45 -0400 Subject: [gpfsug-discuss] GPFS, LTFS/EE and data-in-inode? In-Reply-To: <1500973334.4387.201.camel@buzzard.me.uk> References: <28986.1500671597@turing-police.cc.vt.edu> <1500896590.4387.167.camel@buzzard.me.uk> <17702.1500911347@turing-police.cc.vt.edu> <1500973334.4387.201.camel@buzzard.me.uk> Message-ID: <88035.1500999105@turing-police.cc.vt.edu> On Tue, 25 Jul 2017 10:02:14 +0100, Jonathan Buzzard said: > I would be tempted to zip up the directories and move them ziped ;-) Not an option, unless you want to come here and re-write the researcher's tracking systems that knows where they archived a given run, and teach it "Except now it's in a .tar.gz in that directory, or perhaps one or two directories higher up, under some name". Yes, researchers do that. And as the joke goes: "What's the difference between a tenured professor and a terrorist?" "You can negotiate with a terrorist..." Plus remember that most of these directories are currently scattered across multiple tapes, which means "zip up a directory" may mean reading as many as 10 to 20 tapes just to get the directory on disk so you can zip it up. As it is, I had to write code that recall and processes all the files on tape 1, *wherever they are in the file system*, free them from the source disk, recall and process all the files on tape 2, repeat until tape 3,857. (And due to funding issues 5 years ago which turned into a "who paid for what tapes" food fight, most of the tapes ended up with files from entirely different file systems on them, going into different filesets on the destination). (And in fact, the migration is currently hosed up because a researcher *is* doing pretty much that - recalling all the files from one directory, then the next, then the next, to get files they need urgently for a deliverable but haven't been moved to the new system. So rather than having 12 LTO-5 drives to multistream the tape recalls, I've got 12 recalls fighting for one drive while the researcher's processing is hogging the other 11, due to the way the previous system prioritizes in-line opens of files versus bulk recalls) -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 486 bytes Desc: not available URL: From scbatche at us.ibm.com Tue Jul 25 21:46:45 2017 From: scbatche at us.ibm.com (Scott C Batchelder) Date: Tue, 25 Jul 2017 15:46:45 -0500 Subject: [gpfsug-discuss] Baseline testing GPFS with gpfsperf Message-ID: Hello: I am wondering if I can get some more information on the gpfsperf tool for baseline testing GPFS. I want to record GPFS read and write performance for a file system on the cluster before I enable DMAPI and configure the HSM interface. The README for the tool does not offer much insight in how I should run this tool based on the cluster or file system settings. The cluster that I will be running this tool on will not have MPI installed and will have multiple file systems in the cluster. Are there some best practises for running this tool? For example: - Should the number of threads equal the number of NSDs for the file system? or equal to the number of nodes? - If I execute a large multi-threaded run of this tool from a single node in the cluster, will that give me an accurate result of the performance of the file system? Any feedback is appreciated. Thanks. Sincerely, Scott Batchelder Phone: 1-281-883-7926 E-mail: scbatche at us.ibm.com 12301 Kurland Dr Houston, TX 77034-4812 United States -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/gif Size: 2022 bytes Desc: not available URL: From valdis.kletnieks at vt.edu Wed Jul 26 00:59:08 2017 From: valdis.kletnieks at vt.edu (valdis.kletnieks at vt.edu) Date: Tue, 25 Jul 2017 19:59:08 -0400 Subject: [gpfsug-discuss] Baseline testing GPFS with gpfsperf In-Reply-To: References: Message-ID: <13777.1501027148@turing-police.cc.vt.edu> On Tue, 25 Jul 2017 15:46:45 -0500, "Scott C Batchelder" said: > - Should the number of threads equal the number of NSDs for the file > system? or equal to the number of nodes? Depends on what definition of "throughput" you are interested in. If your configuration has 50 clients banging on 5 NSD servers, your numbers for 5 threads and 50 threads are going to tell you subtly different things... (Basically, one thread per NSD is going to tell you the maximum that one client can expect to get with little to no contention, while one per client will tell you about the maximum *aggregate* that all 50 can get together - which is probably still giving each individual client less throughput than one-to-one....) We usually test with "exactly one thread total", "one thread per server", and "keep piling the clients on till the total number doesn't get any bigger". Also be aware that it only gives you insight to your workload performance if your workload is comprised of large file access - if your users are actually doing a lot of medium or small files, that changes the results dramatically as you end up possibly pounding on metadata more than the actual data.... -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 486 bytes Desc: not available URL: From varun.mittal at in.ibm.com Wed Jul 26 04:42:27 2017 From: varun.mittal at in.ibm.com (Varun Mittal3) Date: Wed, 26 Jul 2017 09:12:27 +0530 Subject: [gpfsug-discuss] export nfs share on gpfs with no authentication In-Reply-To: References: <27469.1500914134@turing-police.cc.vt.edu> Message-ID: Hi Did you try to run this command from a CES designated node ? If no, then try executing the command from a CES node: mmuserauth service create --data-access-type file --type userdefined Best regards, Varun Mittal Cloud/Object Scrum @ Spectrum Scale ETZ, Pune From: Ilan Schwarts To: gpfsug main discussion list Date: 25/07/2017 10:22 AM Subject: Re: [gpfsug-discuss] export nfs share on gpfs with no authentication Sent by: gpfsug-discuss-bounces at spectrumscale.org Hi, While trying to add the userdefined auth, I receive error that SMB service not enabled. I am currently working on a spectrum scale cluster, and i dont have the SMB package, I am waiting for it.. is there a way to export NFSv3 using the spectrum scale tools without SMB package ? [root at LH20-GPFS1 ~]# mmuserauth service create --type userdefined : SMB service not enabled. Enable SMB service first. mmcesuserauthcrservice: Command failed. Examine previous error messages to determine cause. I exported the NFS via /etc/exports and than ./exportfs -a .. It works fine, I was able to mount the gpfs export from another machine.. this was my work-around since the spectrum scale tools failed to export NFSv3 On Mon, Jul 24, 2017 at 7:35 PM, wrote: > On Mon, 24 Jul 2017 13:36:41 +0300, Ilan Schwarts said: >> Hi, >> I have gpfs with 2 Nodes (redhat). >> I am trying to create NFS share - So I would be able to mount and >> access it from another linux machine. > >> While trying to create NFS (I execute the following): >> [root at LH20-GPFS1 ~]# mmnfs export add /fs_gpfs01 -c "* >> Access_Type=RW,Protocols=3:4,Squash=no_root_squash)" > > You can get away with little to no authentication for NFSv3, but > not for NFSv4. Try with Protocols=3 only and > > mmuserauth service create --type userdefined > > that should get you Unix-y NFSv3 UID/GID support and "trust what the NFS > client tells you". This of course only works sanely if each NFS export is > only to a set of machines in the same administrative domain that manages their > UID/GIDs. Exporting to two sets of machines that don't coordinate their > UID/GID space is, of course, where hilarity and hijinks ensue.... > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -- - Ilan Schwarts _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: From varun.mittal at in.ibm.com Wed Jul 26 04:44:24 2017 From: varun.mittal at in.ibm.com (Varun Mittal3) Date: Wed, 26 Jul 2017 09:14:24 +0530 Subject: [gpfsug-discuss] export nfs share on gpfs with no authentication In-Reply-To: References: <27469.1500914134@turing-police.cc.vt.edu> Message-ID: Sorry a small typo: mmuserauth service create --data-access-method file --type userdefined Best regards, Varun Mittal Cloud/Object Scrum @ Spectrum Scale ETZ, Pune From: Varun Mittal3/India/IBM To: gpfsug main discussion list Date: 26/07/2017 09:12 AM Subject: Re: [gpfsug-discuss] export nfs share on gpfs with no authentication Hi Did you try to run this command from a CES designated node ? If no, then try executing the command from a CES node: mmuserauth service create --data-access-type file --type userdefined Best regards, Varun Mittal Cloud/Object Scrum @ Spectrum Scale ETZ, Pune From: Ilan Schwarts To: gpfsug main discussion list Date: 25/07/2017 10:22 AM Subject: Re: [gpfsug-discuss] export nfs share on gpfs with no authentication Sent by: gpfsug-discuss-bounces at spectrumscale.org Hi, While trying to add the userdefined auth, I receive error that SMB service not enabled. I am currently working on a spectrum scale cluster, and i dont have the SMB package, I am waiting for it.. is there a way to export NFSv3 using the spectrum scale tools without SMB package ? [root at LH20-GPFS1 ~]# mmuserauth service create --type userdefined : SMB service not enabled. Enable SMB service first. mmcesuserauthcrservice: Command failed. Examine previous error messages to determine cause. I exported the NFS via /etc/exports and than ./exportfs -a .. It works fine, I was able to mount the gpfs export from another machine.. this was my work-around since the spectrum scale tools failed to export NFSv3 On Mon, Jul 24, 2017 at 7:35 PM, wrote: > On Mon, 24 Jul 2017 13:36:41 +0300, Ilan Schwarts said: >> Hi, >> I have gpfs with 2 Nodes (redhat). >> I am trying to create NFS share - So I would be able to mount and >> access it from another linux machine. > >> While trying to create NFS (I execute the following): >> [root at LH20-GPFS1 ~]# mmnfs export add /fs_gpfs01 -c "* >> Access_Type=RW,Protocols=3:4,Squash=no_root_squash)" > > You can get away with little to no authentication for NFSv3, but > not for NFSv4. Try with Protocols=3 only and > > mmuserauth service create --type userdefined > > that should get you Unix-y NFSv3 UID/GID support and "trust what the NFS > client tells you". This of course only works sanely if each NFS export is > only to a set of machines in the same administrative domain that manages their > UID/GIDs. Exporting to two sets of machines that don't coordinate their > UID/GID space is, of course, where hilarity and hijinks ensue.... > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -- - Ilan Schwarts _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: From Mark.Bush at siriuscom.com Wed Jul 26 18:28:55 2017 From: Mark.Bush at siriuscom.com (Mark Bush) Date: Wed, 26 Jul 2017 17:28:55 +0000 Subject: [gpfsug-discuss] Lost disks Message-ID: I have a client has had an issue where all of the nsd disks disappeared in the cluster recently. Not sure if it's due to a back end disk issue or if it's a reboot that did it. But in their PMR they were told that all that data is lost now and that the disk headers didn't appear as GPFS disk headers. How on earth could something like that happen? Could it be a backend disk thing? They are confident that nobody tried to reformat disks but aren't 100% sure that something at the disk array couldn't have caused this. Is there an easy way to see if there is still data on these disks? Short of a full restore from backup what other options might they have? The mmlsnsd -X show's blanks for device and device type now. # mmlsnsd -X Disk name NSD volume ID Device Devtype Node name Remarks --------------------------------------------------------------------------------------------------- INGEST_FILEMGR_xis2301 0A23982E57FD995D - - ingest-filemgr01.a.fXXXXXXX.net (not found) server node INGEST_FILEMGR_xis2301 0A23982E57FD995D - - ingest-filemgr02.a.fXXXXXXX.net (not found) server node INGEST_FILEMGR_xis2302 0A23982E57FD9960 - - ingest-filemgr01.a.fXXXXXXX.net (not found) server node INGEST_FILEMGR_xis2302 0A23982E57FD9960 - - ingest-filemgr02.a.fXXXXXXX.net (not found) server node INGEST_FILEMGR_xis2303 0A23982E57FD9962 - - ingest-filemgr01.a.fXXXXXXX.net (not found) server node Mark This message (including any attachments) is intended only for the use of the individual or entity to which it is addressed and may contain information that is non-public, proprietary, privileged, confidential, and exempt from disclosure under applicable law. If you are not the intended recipient, you are hereby notified that any use, dissemination, distribution, or copying of this communication is strictly prohibited. This message may be viewed by parties at Sirius Computer Solutions other than those named in the message header. This message does not contain an official representation of Sirius Computer Solutions. If you have received this communication in error, notify Sirius Computer Solutions immediately and (i) destroy this message if a facsimile or (ii) delete this message immediately if this is an electronic communication. Thank you. Sirius Computer Solutions -------------- next part -------------- An HTML attachment was scrubbed... URL: From kums at us.ibm.com Wed Jul 26 18:37:45 2017 From: kums at us.ibm.com (Kumaran Rajaram) Date: Wed, 26 Jul 2017 13:37:45 -0400 Subject: [gpfsug-discuss] Baseline testing GPFS with gpfsperf In-Reply-To: <13777.1501027148@turing-police.cc.vt.edu> References: <13777.1501027148@turing-police.cc.vt.edu> Message-ID: Hi Scott, >>- Should the number of threads equal the number of NSDs for the file system? or equal to the number of nodes? >>- If I execute a large multi-threaded run of this tool from a single node in the cluster, will that give me an accurate result of the performance of the file system? To add to Valdis's note, the answer to above also depends on the node, network used for GPFS communication between client and server, as well as storage performance capabilities constituting the GPFS cluster/network/storage stack. As an example, if the storage subsystem (including controller + disks) hosting the file-system can deliver ~20 GB/s and the networking between NSD client and server is FDR 56Gb/s Infiniband (with verbsRdma = ~6GB/s). Assuming, one FDR-IB link (verbsPorts) is configured per NSD server as well as client, then you could need minimum of 4 x NSD servers (4 x 6GB/s ==> 24 GB/s) to saturate the backend storage. So, you would need to run gpfsperf (or anyother parallel I/O benchmark) across minimum of 4 x GPFS NSD clients to saturate the backend storage. You can scale the gpfsperf thread counts (-th parameter) depending on access pattern (buffered/dio etc) but this would only be able to drive load from single NSD client node. If you would like to drive I/O load from multiple NSD client nodes + synchronize the parallel runs across multiple nodes for accuracy, then gpfsperf-mpi would be strongly recommended. You would need to use MPI to launch gpfsperf-mpi across multiple NSD client nodes and scale the MPI processes (across NSD clients with 1 or more MPI process per NSD client) accordingly to drive the I/O load for good performance. >>The cluster that I will be running this tool on will not have MPI installed and will have multiple file systems in the cluster. Without MPI, alternative would be to use ssh or pdsh to launch gpfsperf across multiple nodes however if there are slow NSD clients then the performance may not be accurate (slow clients taking longer and after faster clients finished it will get all the network/storage resources skewing the performance analysis. You may also consider using parallel Iozone as it can be run across multiple node using rsh/ssh with combination of "-+m" and "-t" option. http://iozone.org/docs/IOzone_msword_98.pdf ## -+m filename Use this file to obtain the configuration informati on of the clients for cluster testing. The file contains one line for each client. Each line has th ree fields. The fields are space delimited. A # sign in column zero is a comment line. The first fi eld is the name of the client. The second field is the path, on the client, for the working directory where Iozone will execute. The third field is the path, on the client, for the executable Iozone. To use this option one must be able to execute comm ands on the clients without being challenged for a password. Iozone will start remote execution by using ?rsh" To use ssh, export RSH=/usr/bin/ssh -t # Run Iozone in a throughput mode. This option allows the user to specify how many threads or processes to have active during th e measurement. ## Hope this helps, -Kums From: valdis.kletnieks at vt.edu To: gpfsug main discussion list Date: 07/25/2017 07:59 PM Subject: Re: [gpfsug-discuss] Baseline testing GPFS with gpfsperf Sent by: gpfsug-discuss-bounces at spectrumscale.org On Tue, 25 Jul 2017 15:46:45 -0500, "Scott C Batchelder" said: > - Should the number of threads equal the number of NSDs for the file > system? or equal to the number of nodes? Depends on what definition of "throughput" you are interested in. If your configuration has 50 clients banging on 5 NSD servers, your numbers for 5 threads and 50 threads are going to tell you subtly different things... (Basically, one thread per NSD is going to tell you the maximum that one client can expect to get with little to no contention, while one per client will tell you about the maximum *aggregate* that all 50 can get together - which is probably still giving each individual client less throughput than one-to-one....) We usually test with "exactly one thread total", "one thread per server", and "keep piling the clients on till the total number doesn't get any bigger". Also be aware that it only gives you insight to your workload performance if your workload is comprised of large file access - if your users are actually doing a lot of medium or small files, that changes the results dramatically as you end up possibly pounding on metadata more than the actual data.... [attachment "att0twxd.dat" deleted by Kumaran Rajaram/Arlington/IBM] _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From Robert.Oesterlin at nuance.com Wed Jul 26 18:45:35 2017 From: Robert.Oesterlin at nuance.com (Oesterlin, Robert) Date: Wed, 26 Jul 2017 17:45:35 +0000 Subject: [gpfsug-discuss] Lost disks Message-ID: One way this could possible happen would be a system is being installed (I?m assuming this is Linux) and the FC adapter is active; then the OS install will see disks and wipe out the NSD descriptor on those disks. (Which is why the NSD V2 format was invented, to prevent this from happening) If you don?t lose all of the descriptors, it?s sometimes possible to manually re-construct the missing header information - I?m assuming since you opened a PMR, IBM has looked at this. This is a scenario I?ve had to recover from - twice. Back-end array issue seems unlikely to me, I?d keep looking at the systems with access to those LUNs and see what commands/operations could have been run. Bob Oesterlin Sr Principal Storage Engineer, Nuance From: on behalf of Mark Bush Reply-To: gpfsug main discussion list Date: Wednesday, July 26, 2017 at 12:29 PM To: "gpfsug-discuss at spectrumscale.org" Subject: [EXTERNAL] [gpfsug-discuss] Lost disks I have a client has had an issue where all of the nsd disks disappeared in the cluster recently. Not sure if it?s due to a back end disk issue or if it?s a reboot that did it. But in their PMR they were told that all that data is lost now and that the disk headers didn?t appear as GPFS disk headers. How on earth could something like that happen? Could it be a backend disk thing? They are confident that nobody tried to reformat disks but aren?t 100% sure that something at the disk array couldn?t have caused this. -------------- next part -------------- An HTML attachment was scrubbed... URL: From oehmes at gmail.com Wed Jul 26 19:18:38 2017 From: oehmes at gmail.com (Sven Oehme) Date: Wed, 26 Jul 2017 18:18:38 +0000 Subject: [gpfsug-discuss] Lost disks In-Reply-To: References: Message-ID: it can happen for multiple reasons , one is a linux install, unfortunate there are significant more simpler explanations. Linux as well as BIOS in servers from time to time looks for empty disks and puts a GPT label on it if the disk doesn't have one, etc. this thread is explaining a lot of this : https://www.ibm.com/developerworks/community/forums/html/topic?id=77777777-0000-0000-0000-000014439222 this is why we implemented NSD V2 format long time ago , unfortunate there is no way to convert an V1 NSD to a V2 nsd on an existing filesytem except you remove the NSDs one at a time and re-add them after you upgraded the system to at least GPFS 4.1 (i would recommend a later version like 4.2.3) some more details are here in this thread : https://www.ibm.com/developerworks/community/forums/html/threadTopic?id=5c1ee5bc-41b8-4318-a74e-4d962f82ce2e but a quick summary of the benefits of V2 are : - ? Support for GPT NSD ? - Adds a standard disk partition table (GPT type) to NSDs ? - Disk label support for Linux ? - New GPFS NSD v2 format provides the following benefits: ? - Includes a partition table so that the disk is recognized as a GPFS device ? - Adjusts data alignment to support disks with a 4 KB physical block size ? - Adds backup copies of some key GPFS data structures ? - Expands some reserved areas to allow for future growth the main reason we can't convert from V1 to V2 is the on disk format changed significant so we would have to move on disk data which is very risky. hope that explains this. Sven On Wed, Jul 26, 2017 at 10:29 AM Mark Bush wrote: > I have a client has had an issue where all of the nsd disks disappeared in > the cluster recently. Not sure if it?s due to a back end disk issue or if > it?s a reboot that did it. But in their PMR they were told that all that > data is lost now and that the disk headers didn?t appear as GPFS disk > headers. How on earth could something like that happen? Could it be a > backend disk thing? They are confident that nobody tried to reformat disks > but aren?t 100% sure that something at the disk array couldn?t have caused > this. > > > > Is there an easy way to see if there is still data on these disks? > > Short of a full restore from backup what other options might they have? > > > > The mmlsnsd -X show?s blanks for device and device type now. > > > > # mmlsnsd -X > > > > Disk name NSD volume ID Device Devtype Node > name Remarks > > > --------------------------------------------------------------------------------------------------- > > INGEST_FILEMGR_xis2301 0A23982E57FD995D - - > ingest-filemgr01.a.fXXXXXXX.net (not found) server node > > INGEST_FILEMGR_xis2301 0A23982E57FD995D - - > ingest-filemgr02.a.fXXXXXXX.net (not found) server node > > INGEST_FILEMGR_xis2302 0A23982E57FD9960 - - > ingest-filemgr01.a.fXXXXXXX.net (not found) server node > > INGEST_FILEMGR_xis2302 0A23982E57FD9960 - - > ingest-filemgr02.a.fXXXXXXX.net (not found) server node > > INGEST_FILEMGR_xis2303 0A23982E57FD9962 - - > ingest-filemgr01.a.fXXXXXXX.net (not found) server node > > > > > > *Mark* > > This message (including any attachments) is intended only for the use of > the individual or entity to which it is addressed and may contain > information that is non-public, proprietary, privileged, confidential, and > exempt from disclosure under applicable law. If you are not the intended > recipient, you are hereby notified that any use, dissemination, > distribution, or copying of this communication is strictly prohibited. This > message may be viewed by parties at Sirius Computer Solutions other than > those named in the message header. This message does not contain an > official representation of Sirius Computer Solutions. If you have received > this communication in error, notify Sirius Computer Solutions immediately > and (i) destroy this message if a facsimile or (ii) delete this message > immediately if this is an electronic communication. Thank you. > Sirius Computer Solutions > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: From Mark.Bush at siriuscom.com Wed Jul 26 19:19:15 2017 From: Mark.Bush at siriuscom.com (Mark Bush) Date: Wed, 26 Jul 2017 18:19:15 +0000 Subject: [gpfsug-discuss] Lost disks In-Reply-To: References: Message-ID: What is this manual header reconstruction you speak of? That doesn?t sound trivial at all. From: Oesterlin, Robert [mailto:Robert.Oesterlin at nuance.com] Sent: Wednesday, July 26, 2017 12:46 PM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] Lost disks One way this could possible happen would be a system is being installed (I?m assuming this is Linux) and the FC adapter is active; then the OS install will see disks and wipe out the NSD descriptor on those disks. (Which is why the NSD V2 format was invented, to prevent this from happening) If you don?t lose all of the descriptors, it?s sometimes possible to manually re-construct the missing header information - I?m assuming since you opened a PMR, IBM has looked at this. This is a scenario I?ve had to recover from - twice. Back-end array issue seems unlikely to me, I?d keep looking at the systems with access to those LUNs and see what commands/operations could have been run. Bob Oesterlin Sr Principal Storage Engineer, Nuance From: > on behalf of Mark Bush > Reply-To: gpfsug main discussion list > Date: Wednesday, July 26, 2017 at 12:29 PM To: "gpfsug-discuss at spectrumscale.org" > Subject: [EXTERNAL] [gpfsug-discuss] Lost disks I have a client has had an issue where all of the nsd disks disappeared in the cluster recently. Not sure if it?s due to a back end disk issue or if it?s a reboot that did it. But in their PMR they were told that all that data is lost now and that the disk headers didn?t appear as GPFS disk headers. How on earth could something like that happen? Could it be a backend disk thing? They are confident that nobody tried to reformat disks but aren?t 100% sure that something at the disk array couldn?t have caused this. This message (including any attachments) is intended only for the use of the individual or entity to which it is addressed and may contain information that is non-public, proprietary, privileged, confidential, and exempt from disclosure under applicable law. If you are not the intended recipient, you are hereby notified that any use, dissemination, distribution, or copying of this communication is strictly prohibited. This message may be viewed by parties at Sirius Computer Solutions other than those named in the message header. This message does not contain an official representation of Sirius Computer Solutions. If you have received this communication in error, notify Sirius Computer Solutions immediately and (i) destroy this message if a facsimile or (ii) delete this message immediately if this is an electronic communication. Thank you. Sirius Computer Solutions -------------- next part -------------- An HTML attachment was scrubbed... URL: From Robert.Oesterlin at nuance.com Wed Jul 26 20:05:59 2017 From: Robert.Oesterlin at nuance.com (Oesterlin, Robert) Date: Wed, 26 Jul 2017 19:05:59 +0000 Subject: [gpfsug-discuss] Lost disks In-Reply-To: References: Message-ID: IBM has a procedure for it that may work in some cases, but you?re manually editing the NSD descriptors on disk. Contact IBM if you think an NSD has been lost to descriptor being re-written. Bob Oesterlin Sr Principal Storage Engineer, Nuance From: on behalf of Mark Bush Reply-To: gpfsug main discussion list Date: Wednesday, July 26, 2017 at 1:19 PM To: gpfsug main discussion list Subject: [EXTERNAL] Re: [gpfsug-discuss] Lost disks What is this manual header reconstruction you speak of? That doesn?t sound trivial at all. -------------- next part -------------- An HTML attachment was scrubbed... URL: From john.hearns at asml.com Thu Jul 27 11:39:28 2017 From: john.hearns at asml.com (John Hearns) Date: Thu, 27 Jul 2017 10:39:28 +0000 Subject: [gpfsug-discuss] Lost disks In-Reply-To: References: Message-ID: Mark, I once rescued a system which had the disk partition on the OS disks deleted. (This was a system with a device mapper RAID pair of OS disks). Download a copy of sysrescue http://www.system-rescue-cd.org/ and create a bootable USB stick (or network boot). When you boot the system in sysrescue it has a utility to scan disks which will identify existing partitions, even if the partition table has been erased. I can?t say if this will do anything with the disks in your system, but this is certainly worth a try if you suspect that the data is all still on disk. From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Mark Bush Sent: Wednesday, July 26, 2017 8:19 PM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] Lost disks What is this manual header reconstruction you speak of? That doesn?t sound trivial at all. From: Oesterlin, Robert [mailto:Robert.Oesterlin at nuance.com] Sent: Wednesday, July 26, 2017 12:46 PM To: gpfsug main discussion list > Subject: Re: [gpfsug-discuss] Lost disks One way this could possible happen would be a system is being installed (I?m assuming this is Linux) and the FC adapter is active; then the OS install will see disks and wipe out the NSD descriptor on those disks. (Which is why the NSD V2 format was invented, to prevent this from happening) If you don?t lose all of the descriptors, it?s sometimes possible to manually re-construct the missing header information - I?m assuming since you opened a PMR, IBM has looked at this. This is a scenario I?ve had to recover from - twice. Back-end array issue seems unlikely to me, I?d keep looking at the systems with access to those LUNs and see what commands/operations could have been run. Bob Oesterlin Sr Principal Storage Engineer, Nuance From: > on behalf of Mark Bush > Reply-To: gpfsug main discussion list > Date: Wednesday, July 26, 2017 at 12:29 PM To: "gpfsug-discuss at spectrumscale.org" > Subject: [EXTERNAL] [gpfsug-discuss] Lost disks I have a client has had an issue where all of the nsd disks disappeared in the cluster recently. Not sure if it?s due to a back end disk issue or if it?s a reboot that did it. But in their PMR they were told that all that data is lost now and that the disk headers didn?t appear as GPFS disk headers. How on earth could something like that happen? Could it be a backend disk thing? They are confident that nobody tried to reformat disks but aren?t 100% sure that something at the disk array couldn?t have caused this. This message (including any attachments) is intended only for the use of the individual or entity to which it is addressed and may contain information that is non-public, proprietary, privileged, confidential, and exempt from disclosure under applicable law. If you are not the intended recipient, you are hereby notified that any use, dissemination, distribution, or copying of this communication is strictly prohibited. This message may be viewed by parties at Sirius Computer Solutions other than those named in the message header. This message does not contain an official representation of Sirius Computer Solutions. If you have received this communication in error, notify Sirius Computer Solutions immediately and (i) destroy this message if a facsimile or (ii) delete this message immediately if this is an electronic communication. Thank you. Sirius Computer Solutions -- The information contained in this communication and any attachments is confidential and may be privileged, and is for the sole use of the intended recipient(s). Any unauthorized review, use, disclosure or distribution is prohibited. Unless explicitly stated otherwise in the body of this communication or the attachment thereto (if any), the information is provided on an AS-IS basis without any express or implied warranties or liabilities. To the extent you are relying on this information, you are doing so at your own risk. If you are not the intended recipient, please notify the sender immediately by replying to this message and destroy all copies of this message and any attachments. Neither the sender nor the company/group of companies he or she represents shall be liable for the proper and complete transmission of the information contained in this communication, or for any delay in its receipt. -------------- next part -------------- An HTML attachment was scrubbed... URL: From jonathan at buzzard.me.uk Thu Jul 27 11:58:08 2017 From: jonathan at buzzard.me.uk (Jonathan Buzzard) Date: Thu, 27 Jul 2017 11:58:08 +0100 Subject: [gpfsug-discuss] Lost disks In-Reply-To: References: Message-ID: <1501153088.26563.39.camel@buzzard.me.uk> On Wed, 2017-07-26 at 17:45 +0000, Oesterlin, Robert wrote: > One way this could possible happen would be a system is being > installed (I?m assuming this is Linux) and the FC adapter is active; > then the OS install will see disks and wipe out the NSD descriptor on > those disks. (Which is why the NSD V2 format was invented, to prevent > this from happening) If you don?t lose all of the descriptors, it?s > sometimes possible to manually re-construct the missing header > information - I?m assuming since you opened a PMR, IBM has looked at > this. This is a scenario I?ve had to recover from - twice. Back-end > array issue seems unlikely to me, I?d keep looking at the systems with > access to those LUNs and see what commands/operations could have been > run. I would concur that this is the most likely scenario; an install where for whatever reason the machine could see the disks and they are gone. I know that RHEL6 and its derivatives will do that for you. Has happened to me at previous place of work where another admin forgot to de-zone a server, went to install CentOS6 as part of a cluster upgrade from CentOS5 and overwrote all the NSD descriptors. Thing is GPFS does not look at the NSD descriptors that much. So in my case it was several days before it was noticed, and only then because I rebooted the last NSD server as part of a rolling upgrade of GPFS. I could have cruised for weeks/months with no NSD descriptors if I had not restarted all the NSD servers. The moral of this is the overwrite could have take place quite some time ago. Basically if the disks are all missing then the NSD descriptor has been overwritten, and the protestations of the client are irrelevant. The chances of the disk array doing it to *ALL* the disks is somewhere around ? IMHO. JAB. -- Jonathan A. Buzzard Email: jonathan (at) buzzard.me.uk Fife, United Kingdom. From richard.rupp at us.ibm.com Thu Jul 27 12:28:35 2017 From: richard.rupp at us.ibm.com (RICHARD RUPP) Date: Thu, 27 Jul 2017 07:28:35 -0400 Subject: [gpfsug-discuss] Lost disks In-Reply-To: References: Message-ID: If you are under IBM support, leverage IBM for help. A third party utility has the possibility of making it worse. From: John Hearns To: gpfsug main discussion list Date: 07/27/2017 06:40 AM Subject: Re: [gpfsug-discuss] Lost disks Sent by: gpfsug-discuss-bounces at spectrumscale.org Mark, I once rescued a system which had the disk partition on the OS disks deleted. (This was a system with a device mapper RAID pair of OS disks). Download a copy of sysrescue http://www.system-rescue-cd.org/ and create a bootable USB stick (or network boot). When you boot the system in sysrescue it has a utility to scan disks which will identify existing partitions, even if the partition table has been erased. I can?t say if this will do anything with the disks in your system, but this is certainly worth a try if you suspect that the data is all still on disk. From: gpfsug-discuss-bounces at spectrumscale.org [ mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Mark Bush Sent: Wednesday, July 26, 2017 8:19 PM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] Lost disks What is this manual header reconstruction you speak of? That doesn?t sound trivial at all. From: Oesterlin, Robert [mailto:Robert.Oesterlin at nuance.com] Sent: Wednesday, July 26, 2017 12:46 PM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] Lost disks One way this could possible happen would be a system is being installed (I?m assuming this is Linux) and the FC adapter is active; then the OS install will see disks and wipe out the NSD descriptor on those disks. (Which is why the NSD V2 format was invented, to prevent this from happening) If you don?t lose all of the descriptors, it?s sometimes possible to manually re-construct the missing header information - I?m assuming since you opened a PMR, IBM has looked at this. This is a scenario I?ve had to recover from - twice. Back-end array issue seems unlikely to me, I?d keep looking at the systems with access to those LUNs and see what commands/operations could have been run. Bob Oesterlin Sr Principal Storage Engineer, Nuance From: on behalf of Mark Bush < Mark.Bush at siriuscom.com> Reply-To: gpfsug main discussion list Date: Wednesday, July 26, 2017 at 12:29 PM To: "gpfsug-discuss at spectrumscale.org" Subject: [EXTERNAL] [gpfsug-discuss] Lost disks I have a client has had an issue where all of the nsd disks disappeared in the cluster recently. Not sure if it?s due to a back end disk issue or if it?s a reboot that did it. But in their PMR they were told that all that data is lost now and that the disk headers didn?t appear as GPFS disk headers. How on earth could something like that happen? Could it be a backend disk thing? They are confident that nobody tried to reformat disks but aren?t 100% sure that something at the disk array couldn?t have caused this. This message (including any attachments) is intended only for the use of the individual or entity to which it is addressed and may contain information that is non-public, proprietary, privileged, confidential, and exempt from disclosure under applicable law. If you are not the intended recipient, you are hereby notified that any use, dissemination, distribution, or copying of this communication is strictly prohibited. This message may be viewed by parties at Sirius Computer Solutions other than those named in the message header. This message does not contain an official representation of Sirius Computer Solutions. If you have received this communication in error, notify Sirius Computer Solutions immediately and (i) destroy this message if a facsimile or (ii) delete this message immediately if this is an electronic communication. Thank you. Sirius Computer Solutions -- The information contained in this communication and any attachments is confidential and may be privileged, and is for the sole use of the intended recipient(s). Any unauthorized review, use, disclosure or distribution is prohibited. Unless explicitly stated otherwise in the body of this communication or the attachment thereto (if any), the information is provided on an AS-IS basis without any express or implied warranties or liabilities. To the extent you are relying on this information, you are doing so at your own risk. If you are not the intended recipient, please notify the sender immediately by replying to this message and destroy all copies of this message and any attachments. Neither the sender nor the company/group of companies he or she represents shall be liable for the proper and complete transmission of the information contained in this communication, or for any delay in its receipt. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: From jonathan.buzzard at strath.ac.uk Thu Jul 27 12:58:50 2017 From: jonathan.buzzard at strath.ac.uk (Jonathan Buzzard) Date: Thu, 27 Jul 2017 12:58:50 +0100 Subject: [gpfsug-discuss] Lost disks In-Reply-To: References: Message-ID: <1501156730.26563.49.camel@strath.ac.uk> On Thu, 2017-07-27 at 07:28 -0400, RICHARD RUPP wrote: > If you are under IBM support, leverage IBM for help. A third party > utility has the possibility of making it worse. > The chances of recovery are slim in the first place from this sort of problem. At least with v1 NSD descriptors. Further IBM have *ALREADY* told him the data is lost, I quote But in their PMR they were told that all that data is lost now and that the disk headers didn?t appear as GPFS disk headers. So in this scenario you have little to loose trying something because you are now on your own. Worst case scenario is that whatever you try does not work, which leave you no worse of than you are now. Well apart from lost time for the restore, but you might have started that already to somewhere else. I was once told by IBM (nine years ago now) that my GPFS file system was caput and to arrange a restore from tape. At which point some fiddling by myself fixed the problem and a 100TB restore was no longer required. However this was not due to overwritten NSD descriptors. When that happened the two file systems effected had to be restored. Well bizarrely one was still mounted and I was able to rsync the data off. However the point is that at this stage fiddling with third party tools is the only option left. JAB. -- Jonathan A. Buzzard Tel: +44141-5483420 HPC System Administrator, ARCHIE-WeSt. University of Strathclyde, John Anderson Building, Glasgow. G4 0NG From UWEFALKE at de.ibm.com Thu Jul 27 15:18:02 2017 From: UWEFALKE at de.ibm.com (Uwe Falke) Date: Thu, 27 Jul 2017 16:18:02 +0200 Subject: [gpfsug-discuss] Lost disks In-Reply-To: <1501156730.26563.49.camel@strath.ac.uk> References: <1501156730.26563.49.camel@strath.ac.uk> Message-ID: "Just doing something" makes things worse usually. Whether a 3rd party tool knows how to handle GPFS NSDs can be doubted (as long as it is not dedicated to that purpose). First, I'd look what is actually on the sectors where the NSD headers used to be, and try to find whether data beyond that area were also modified (if the latter is the case, restoring the NSDs does not make much sense as data and/or metadata (depending on disk usage) would also be corrupted. If you are sure that just the NSD header area has been affected, you might try to trick GPFS in getting just the information into the header area needed that GPFS recognises the devices as the NSDs they were. The first 4 kiB of a v1 NSD from a VM on my laptop look like $ cat nsdv1head | od --address-radix=x -xc 000000 0000 0000 0000 0000 0000 0000 0000 0000 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 * 000200 cf70 4192 0000 0100 0000 3000 e930 a028 p 317 222 A \0 \0 \0 001 \0 \0 \0 0 0 351 ( 240 000210 a8c0 ce7a a251 1f92 a251 1a92 0000 0800 300 250 z 316 Q 242 222 037 Q 242 222 032 \0 \0 \0 \b 000220 0000 f20f 0000 0000 0000 0000 0000 0000 \0 \0 017 362 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 000230 0000 0000 0000 0000 0000 0000 0000 0000 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 * 000400 93d2 7885 0000 0100 0000 0002 141e 64a8 322 223 205 x \0 \0 \0 001 \0 \0 002 \0 036 024 250 d 000410 a8c0 ce7a a251 3490 0000 fa0f 0000 0800 300 250 z 316 Q 242 220 4 \0 \0 017 372 \0 \0 \0 \b 000420 0000 0000 0000 0000 0000 0000 0000 0000 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 * 000480 534e 2044 6564 6373 6972 7470 726f 6620 N S D d e s c r i p t o r f 000490 726f 2f20 6564 2f76 6476 2062 7263 6165 o r / d e v / v d b c r e a 0004a0 6574 2064 7962 4720 4650 2053 6f4d 206e t e d b y G P F S M o n 0004b0 614d 2079 3732 3020 3a30 3434 303a 2034 M a y 2 7 0 0 : 4 4 : 0 4 0004c0 3032 3331 000a 0000 0000 0000 0000 0000 2 0 1 3 \n \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 0004d0 0000 0000 0000 0000 0000 0000 0000 0000 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 * 000e00 4c5f 4d56 0000 017d 0000 017d 0000 017d _ L V M \0 \0 } 001 \0 \0 } 001 \0 \0 } 001 000e10 0000 017d 0000 0000 0000 0000 0000 0000 \0 \0 } 001 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 000e20 0000 0000 0000 0000 0000 0000 0000 0000 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 000e30 0000 0000 0000 0000 0000 0000 017d 0000 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 } 001 \0 \0 000e40 0000 0000 0000 0000 0000 0000 0000 0000 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 * 001000 I suppose, the important area starts at 0x0200 (ie. with the second 512Byte sector) and ends at 0x04df (which would be within the 3rd 512Bytes sector, hence the 2nd and 3rd sectors appear crucial). I think that there is some more space before the payload area starts. Without knowledge what exactly has to go into the header, I'd try to create an NSD on one or two (new) disks, save the headers, then create an FS on them, save the headers again, check if anything has changed. So, creating some new NSDs, checking what keys might appear there and in the cluster configuration could get you very close to craft the header information which is gone. Of course, that depends on how dear the data on the gone FS AKA SG are and how hard it'd be to rebuild them otherwise (replay from backup, recalculate, ...) It seems not a bad idea to set aside the NSD headers of your NSDs in a back up :-) And also now: Before amending any blocks on your disks, save them! Mit freundlichen Gr??en / Kind regards Dr. Uwe Falke IT Specialist High Performance Computing Services / Integrated Technology Services / Data Center Services ------------------------------------------------------------------------------------------------------------------------------------------- IBM Deutschland Rathausstr. 7 09111 Chemnitz Phone: +49 371 6978 2165 Mobile: +49 175 575 2877 E-Mail: uwefalke at de.ibm.com ------------------------------------------------------------------------------------------------------------------------------------------- IBM Deutschland Business & Technology Services GmbH / Gesch?ftsf?hrung: Andreas Hasse, Thomas Wolter Sitz der Gesellschaft: Ehningen / Registergericht: Amtsgericht Stuttgart, HRB 17122 From: Jonathan Buzzard To: gpfsug main discussion list Date: 07/27/2017 01:59 PM Subject: Re: [gpfsug-discuss] Lost disks Sent by: gpfsug-discuss-bounces at spectrumscale.org On Thu, 2017-07-27 at 07:28 -0400, RICHARD RUPP wrote: > If you are under IBM support, leverage IBM for help. A third party > utility has the possibility of making it worse. > The chances of recovery are slim in the first place from this sort of problem. At least with v1 NSD descriptors. Further IBM have *ALREADY* told him the data is lost, I quote But in their PMR they were told that all that data is lost now and that the disk headers didn?t appear as GPFS disk headers. So in this scenario you have little to loose trying something because you are now on your own. Worst case scenario is that whatever you try does not work, which leave you no worse of than you are now. Well apart from lost time for the restore, but you might have started that already to somewhere else. I was once told by IBM (nine years ago now) that my GPFS file system was caput and to arrange a restore from tape. At which point some fiddling by myself fixed the problem and a 100TB restore was no longer required. However this was not due to overwritten NSD descriptors. When that happened the two file systems effected had to be restored. Well bizarrely one was still mounted and I was able to rsync the data off. However the point is that at this stage fiddling with third party tools is the only option left. JAB. -- Jonathan A. Buzzard Tel: +44141-5483420 HPC System Administrator, ARCHIE-WeSt. University of Strathclyde, John Anderson Building, Glasgow. G4 0NG _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss From jonathan.buzzard at strath.ac.uk Thu Jul 27 16:09:31 2017 From: jonathan.buzzard at strath.ac.uk (Jonathan Buzzard) Date: Thu, 27 Jul 2017 16:09:31 +0100 Subject: [gpfsug-discuss] Lost disks In-Reply-To: References: <1501156730.26563.49.camel@strath.ac.uk> Message-ID: <1501168171.26563.56.camel@strath.ac.uk> On Thu, 2017-07-27 at 16:18 +0200, Uwe Falke wrote: > "Just doing something" makes things worse usually. Whether a 3rd > party tool knows how to handle GPFS NSDs can be doubted (as long as it > is not dedicated to that purpose). It might usually, but IBM have *ALREADY* given up in this case and told the customer their data is toast. Under these circumstances other than wasting time that could have been spent profitably on a restore it is *IMPOSSIBLE* to make the situation worse. [SNIP] > It seems not a bad idea to set aside the NSD headers of your NSDs in a > back up :-) > And also now: Before amending any blocks on your disks, save them! > It's called NSD v2 descriptor format, so rather than use raw disks they are in a GPT partition, and for good measure a backup copy is stored at the end of the disk too. Personally if I had any v1 NSD's in a file system I would have a plan for a series of mmdeldisk/mmcrnsd/mmadddisk to get them all to v2 sooner rather than later. JAB. -- Jonathan A. Buzzard Tel: +44141-5483420 HPC System Administrator, ARCHIE-WeSt. University of Strathclyde, John Anderson Building, Glasgow. G4 0NG From Robert.Oesterlin at nuance.com Thu Jul 27 16:28:02 2017 From: Robert.Oesterlin at nuance.com (Oesterlin, Robert) Date: Thu, 27 Jul 2017 15:28:02 +0000 Subject: [gpfsug-discuss] NSD V2 vs V1 - how can you tell? Message-ID: <58646EEA-0AD2-4EC8-8A05-070064C35F2E@nuance.com> I?m sure I have a mix of V1 and V2 NSDs - how can I tell what the format each is? Bob Oesterlin Sr Principal Storage Engineer, Nuance -------------- next part -------------- An HTML attachment was scrubbed... URL: From UWEFALKE at de.ibm.com Thu Jul 27 16:51:29 2017 From: UWEFALKE at de.ibm.com (Uwe Falke) Date: Thu, 27 Jul 2017 17:51:29 +0200 Subject: [gpfsug-discuss] Lost disks In-Reply-To: <1501168171.26563.56.camel@strath.ac.uk> References: <1501156730.26563.49.camel@strath.ac.uk> <1501168171.26563.56.camel@strath.ac.uk> Message-ID: gpfsug-discuss-bounces at spectrumscale.org wrote on 07/27/2017 05:09:31 PM: > From: Jonathan Buzzard > To: gpfsug main discussion list > Date: 07/27/2017 05:09 PM > Subject: Re: [gpfsug-discuss] Lost disks > Sent by: gpfsug-discuss-bounces at spectrumscale.org > > On Thu, 2017-07-27 at 16:18 +0200, Uwe Falke wrote: > > > "Just doing something" makes things worse usually. Whether a 3rd > > party tool knows how to handle GPFS NSDs can be doubted (as long as it > > is not dedicated to that purpose). > > It might usually, but IBM have *ALREADY* given up in this case and told > the customer their data is toast. Under these circumstances other than > wasting time that could have been spent profitably on a restore it is > *IMPOSSIBLE* to make the situation worse. SCNR: It is always possible to make things worse. However, of course, if the efforts to do research on that system appear too expensive compared to the possible gain, then it is wise to give up and restore data from backup to a new file system. > > [SNIP] > > > It seems not a bad idea to set aside the NSD headers of your NSDs in a > > back up :-) > > And also now: Before amending any blocks on your disks, save them! > > > > It's called NSD v2 descriptor format, so rather than use raw disks they > are in a GPT partition, and for good measure a backup copy is stored at > the end of the disk too. > > Personally if I had any v1 NSD's in a file system I would have a plan > for a series of mmdeldisk/mmcrnsd/mmadddisk to get them all to v2 sooner > rather than later. Yep, but I suppose the gone NSDs were v1. Then, there might be some restrictions blocking the move from NSDv1 to NSDv2 (old FS level still req.ed, or just the hugeness of a file system). And you never know, if some tool runs wild due to logical failures it overwrites all GPT copies on a disk and you're lost again (but of course NSDv2 has been a tremendous step ahead). Mit freundlichen Gr??en / Kind regards Dr. Uwe Falke IT Specialist High Performance Computing Services / Integrated Technology Services / Data Center Services ------------------------------------------------------------------------------------------------------------------------------------------- IBM Deutschland Rathausstr. 09111 Chemnitz Phone: +49 371 6978 2165 Mobile: +49 175 575 2877 E-Mail: uwefalke at de.ibm.com ------------------------------------------------------------------------------------------------------------------------------------------- IBM Deutschland Business & Technology Services GmbH / Gesch?ftsf?hrung: Andreas Hasse, Thomas Wolter Sitz der Gesellschaft: Ehningen / Registergericht: Amtsgericht Stuttgart, HRB 17122 From luke.raimbach at googlemail.com Thu Jul 27 17:09:42 2017 From: luke.raimbach at googlemail.com (Luke Raimbach) Date: Thu, 27 Jul 2017 16:09:42 +0000 Subject: [gpfsug-discuss] NSD V2 vs V1 - how can you tell? In-Reply-To: <58646EEA-0AD2-4EC8-8A05-070064C35F2E@nuance.com> References: <58646EEA-0AD2-4EC8-8A05-070064C35F2E@nuance.com> Message-ID: mmfsadm test readdescraw On Thu, 27 Jul 2017, 16:28 Oesterlin, Robert, wrote: > I?m sure I have a mix of V1 and V2 NSDs - how can I tell what the format > each is? > > > > Bob Oesterlin > Sr Principal Storage Engineer, Nuance > > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: From Robert.Oesterlin at nuance.com Thu Jul 27 17:17:20 2017 From: Robert.Oesterlin at nuance.com (Oesterlin, Robert) Date: Thu, 27 Jul 2017 16:17:20 +0000 Subject: [gpfsug-discuss] NSD V2 vs V1 - how can you tell? Message-ID: <50669E00-32A8-4AC7-A729-CB961F96ECAE@nuance.com> Right - but what field do I look at? Bob Oesterlin Sr Principal Storage Engineer, Nuance From: on behalf of Luke Raimbach Reply-To: gpfsug main discussion list Date: Thursday, July 27, 2017 at 11:10 AM To: gpfsug main discussion list Subject: [EXTERNAL] Re: [gpfsug-discuss] NSD V2 vs V1 - how can you tell? mmfsadm test readdescraw -------------- next part -------------- An HTML attachment was scrubbed... URL: From jonathan.buzzard at strath.ac.uk Thu Jul 27 19:26:45 2017 From: jonathan.buzzard at strath.ac.uk (Jonathan Buzzard) Date: Thu, 27 Jul 2017 19:26:45 +0100 Subject: [gpfsug-discuss] Lost disks In-Reply-To: References: <1501156730.26563.49.camel@strath.ac.uk> <1501168171.26563.56.camel@strath.ac.uk> Message-ID: <3ea4371f-49c9-1236-44e2-d44f39e27c9e@strath.ac.uk> On 27/07/17 16:51, Uwe Falke wrote: [SNIP] > SCNR: It is always possible to make things worse. > However, of course, if the efforts to do research on that system appear > too expensive compared to the possible gain, then it is wise to give up > and restore data from backup to a new file system. > Explain to me when IBM have washed their hands of the situation; that is they deem the file system unrecoverable and will take no further action to help the customer, how under these circumstances it is possible for it to get any worse attempting to recover the situation yourself? The answer is you can't so and are talking complete codswallop. In general you are right, in this situation you are utterly and totally wrong. JAB. -- Jonathan A. Buzzard Tel: +44141-5483420 HPC System Administrator, ARCHIE-WeSt. University of Strathclyde, John Anderson Building, Glasgow. G4 0NG From chair at spectrumscale.org Thu Jul 27 21:19:15 2017 From: chair at spectrumscale.org (Spectrum Scale UG Chair (Simon Thompson)) Date: Thu, 27 Jul 2017 21:19:15 +0100 Subject: [gpfsug-discuss] Lost disks In-Reply-To: <3ea4371f-49c9-1236-44e2-d44f39e27c9e@strath.ac.uk> References: <1501156730.26563.49.camel@strath.ac.uk> <1501168171.26563.56.camel@strath.ac.uk> <3ea4371f-49c9-1236-44e2-d44f39e27c9e@strath.ac.uk> Message-ID: Guys, this is supposed to be a community mailing list where people can come and ask questions and we can have healthy debate, but please can we keep it calm? Thanks Simon Group Chair From sfadden at us.ibm.com Thu Jul 27 21:33:19 2017 From: sfadden at us.ibm.com (Scott Fadden) Date: Thu, 27 Jul 2017 20:33:19 +0000 Subject: [gpfsug-discuss] NSD V2 vs V1 - how can you tell? In-Reply-To: References: , <58646EEA-0AD2-4EC8-8A05-070064C35F2E@nuance.com> Message-ID: An HTML attachment was scrubbed... URL: From jonathan.buzzard at strath.ac.uk Fri Jul 28 00:29:47 2017 From: jonathan.buzzard at strath.ac.uk (Jonathan Buzzard) Date: Fri, 28 Jul 2017 00:29:47 +0100 Subject: [gpfsug-discuss] NSD V2 vs V1 - how can you tell? In-Reply-To: <58646EEA-0AD2-4EC8-8A05-070064C35F2E@nuance.com> References: <58646EEA-0AD2-4EC8-8A05-070064C35F2E@nuance.com> Message-ID: On 27/07/17 16:28, Oesterlin, Robert wrote: > I?m sure I have a mix of V1 and V2 NSDs - how can I tell what the format > each is? Well on anything approaching a recent Linux lsblk should as I understand it should show GPT partitions on v2 NSD's. Normally a v1 NSD would show up as a raw block device. I guess you could have created the v1 NSD's inside a partition but that was not normal practice. JAB. -- Jonathan A. Buzzard Tel: +44141-5483420 HPC System Administrator, ARCHIE-WeSt. University of Strathclyde, John Anderson Building, Glasgow. G4 0NG From daniel.kidger at uk.ibm.com Fri Jul 28 12:03:40 2017 From: daniel.kidger at uk.ibm.com (Daniel Kidger) Date: Fri, 28 Jul 2017 11:03:40 +0000 Subject: [gpfsug-discuss] Lost disks In-Reply-To: References: , <1501156730.26563.49.camel@strath.ac.uk><1501168171.26563.56.camel@strath.ac.uk><3ea4371f-49c9-1236-44e2-d44f39e27c9e@strath.ac.uk> Message-ID: An HTML attachment was scrubbed... URL: From Robert.Oesterlin at nuance.com Fri Jul 28 12:46:47 2017 From: Robert.Oesterlin at nuance.com (Oesterlin, Robert) Date: Fri, 28 Jul 2017 11:46:47 +0000 Subject: [gpfsug-discuss] NSD V2 vs V1 - how can you tell? Message-ID: <6DDEF46C-D30E-4C68-B569-770DE04A9D5B@nuance.com> Hi Scott This refers to the file system format which is independent of the NSD version number. File systems can be upgraded but all the NSDs are still at V1. For instance, here is an NSD I know is still V1: [root at gpfs2-gpfs01 ~]# grep msa0319VOL2 volmaps msa0319VOL2 mpathel (3600c0ff0001497e259ebac5001000000) dm-19 14T sdad 0[active][ready] sdft 1[active][ready] sdam 2[active][ready] sdgc 3[active][ready] [root at gpfs2-gpfs01 ~]# mmfsadm test readdescraw /dev/dm-19 | grep " original format" original format version 1001, cur version 1600 (mgr 1600, helper 1600, mnode 1600) The file system version is current however. Bob Oesterlin Sr Principal Storage Engineer, Nuance From: on behalf of Scott Fadden Reply-To: gpfsug main discussion list Date: Thursday, July 27, 2017 at 3:33 PM To: "gpfsug-discuss at spectrumscale.org" Cc: "gpfsug-discuss at spectrumscale.org" Subject: [EXTERNAL] Re: [gpfsug-discuss] NSD V2 vs V1 - how can you tell? # mmfsadm test readdescraw /dev/dm-14 | grep " original format" original format version 1600, cur version 1700 (mgr 1700, helper 1700, mnode 1700) The harder part is what version number = v2 and what matches version 1. The real answer is there is not a simple one, it is not really v1 vs v2 it is what feature you are interested in. Just one small example 4K Disk SECTOR support started in 1403 Dynamically enabling quotas started in 1404 -------------- next part -------------- An HTML attachment was scrubbed... URL: From Robert.Oesterlin at nuance.com Fri Jul 28 13:44:11 2017 From: Robert.Oesterlin at nuance.com (Oesterlin, Robert) Date: Fri, 28 Jul 2017 12:44:11 +0000 Subject: [gpfsug-discuss] LROC example Message-ID: <8103C497-EFA2-41E3-A047-4C3A3AA3EC0B@nuance.com> For those of you considering LROC, you may find this interesting. LROC can be very effective in some job mixes, as shown below. This is in a compute cluster of about 400 nodes. Each compute node has a 100GB LROC. In this particular job mix, LROC was recalling 3-4 times the traffic that was going to the NSDs. I see other cases where?s it?s less effective. [cid:image001.png at 01D30775.4ACF3D20] Bob Oesterlin Sr Principal Storage Engineer, Nuance -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image001.png Type: image/png Size: 54425 bytes Desc: image001.png URL: From knop at us.ibm.com Fri Jul 28 13:44:26 2017 From: knop at us.ibm.com (Felipe Knop) Date: Fri, 28 Jul 2017 08:44:26 -0400 Subject: [gpfsug-discuss] NSD V2 vs V1 - how can you tell? In-Reply-To: <6DDEF46C-D30E-4C68-B569-770DE04A9D5B@nuance.com> References: <6DDEF46C-D30E-4C68-B569-770DE04A9D5B@nuance.com> Message-ID: Bob, I believe the NSD format version (v1 vs v2) is shown in the " format version" line that starts with "NSDid" : # mmfsadm test readdescraw /dev/dm-11 NSD descriptor in sector 64 of /dev/dm-11 NSDid: 9461C0A85788693A format version: 1403 Label: It should say "1403" when the format is v2. Felipe ---- Felipe Knop knop at us.ibm.com GPFS Development and Security IBM Systems IBM Building 008 2455 South Rd, Poughkeepsie, NY 12601 (845) 433-9314 T/L 293-9314 From: "Oesterlin, Robert" To: gpfsug main discussion list Date: 07/28/2017 07:47 AM Subject: Re: [gpfsug-discuss] NSD V2 vs V1 - how can you tell? Sent by: gpfsug-discuss-bounces at spectrumscale.org Hi Scott This refers to the file system format which is independent of the NSD version number. File systems can be upgraded but all the NSDs are still at V1. For instance, here is an NSD I know is still V1: [root at gpfs2-gpfs01 ~]# grep msa0319VOL2 volmaps msa0319VOL2 mpathel (3600c0ff0001497e259ebac5001000000) dm-19 14T sdad 0[active][ready] sdft 1[active][ready] sdam 2[active][ready] sdgc 3[active][ready] [root at gpfs2-gpfs01 ~]# mmfsadm test readdescraw /dev/dm-19 | grep " original format" original format version 1001, cur version 1600 (mgr 1600, helper 1600, mnode 1600) The file system version is current however. Bob Oesterlin Sr Principal Storage Engineer, Nuance From: on behalf of Scott Fadden Reply-To: gpfsug main discussion list Date: Thursday, July 27, 2017 at 3:33 PM To: "gpfsug-discuss at spectrumscale.org" Cc: "gpfsug-discuss at spectrumscale.org" Subject: [EXTERNAL] Re: [gpfsug-discuss] NSD V2 vs V1 - how can you tell? # mmfsadm test readdescraw /dev/dm-14 | grep " original format" original format version 1600, cur version 1700 (mgr 1700, helper 1700, mnode 1700) The harder part is what version number = v2 and what matches version 1. The real answer is there is not a simple one, it is not really v1 vs v2 it is what feature you are interested in. Just one small example 4K Disk SECTOR support started in 1403 Dynamically enabling quotas started in 1404 _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From gcorneau at us.ibm.com Fri Jul 28 20:07:54 2017 From: gcorneau at us.ibm.com (Glen Corneau) Date: Fri, 28 Jul 2017 14:07:54 -0500 Subject: [gpfsug-discuss] NSD V2 vs V1 - how can you tell? In-Reply-To: References: <6DDEF46C-D30E-4C68-B569-770DE04A9D5B@nuance.com> Message-ID: Just a note for my AIX folks out there (and I know there's at least one!): When NSDv2 (version 1403) disks are defined in AIX we *don't* create GPTs on those LUNs. However with GPFS (Spectrum Scale) installed on AIX we will place the NSD name in the "VG" column of lsvg. But yes, we've had situations of customers creating new VGs on existing GPFS LUNs (force!) and destroying file systems. ------------------ Glen Corneau Power Systems Washington Systems Center gcorneau at us.ibm.com From: "Felipe Knop" To: gpfsug main discussion list Date: 07/28/2017 07:45 AM Subject: Re: [gpfsug-discuss] NSD V2 vs V1 - how can you tell? Sent by: gpfsug-discuss-bounces at spectrumscale.org Bob, I believe the NSD format version (v1 vs v2) is shown in the " format version" line that starts with "NSDid" : # mmfsadm test readdescraw /dev/dm-11 NSD descriptor in sector 64 of /dev/dm-11 NSDid: 9461C0A85788693A format version: 1403 Label: It should say "1403" when the format is v2. Felipe ---- Felipe Knop knop at us.ibm.com GPFS Development and Security IBM Systems IBM Building 008 2455 South Rd, Poughkeepsie, NY 12601 (845) 433-9314 T/L 293-9314 From: "Oesterlin, Robert" To: gpfsug main discussion list Date: 07/28/2017 07:47 AM Subject: Re: [gpfsug-discuss] NSD V2 vs V1 - how can you tell? Sent by: gpfsug-discuss-bounces at spectrumscale.org Hi Scott This refers to the file system format which is independent of the NSD version number. File systems can be upgraded but all the NSDs are still at V1. For instance, here is an NSD I know is still V1: [root at gpfs2-gpfs01 ~]# grep msa0319VOL2 volmaps msa0319VOL2 mpathel (3600c0ff0001497e259ebac5001000000) dm-19 14T sdad 0[active][ready] sdft 1[active][ready] sdam 2[active][ready] sdgc 3[active][ready] [root at gpfs2-gpfs01 ~]# mmfsadm test readdescraw /dev/dm-19 | grep " original format" original format version 1001, cur version 1600 (mgr 1600, helper 1600, mnode 1600) The file system version is current however. Bob Oesterlin Sr Principal Storage Engineer, Nuance -------------- next part -------------- An HTML attachment was scrubbed... URL: From scale at us.ibm.com Sun Jul 30 04:22:25 2017 From: scale at us.ibm.com (IBM Spectrum Scale) Date: Sat, 29 Jul 2017 23:22:25 -0400 Subject: [gpfsug-discuss] GPFS, LTFS/EE and data-in-inode? In-Reply-To: <1500908233.4387.194.camel@buzzard.me.uk> References: <33069.1500675853@turing-police.cc.vt.edu>, <28986.1500671597@turing-police.cc.vt.edu> <1500908233.4387.194.camel@buzzard.me.uk> Message-ID: Jonathan, all, We'll be introducing some clarification into the publications to highlight that data is not stored in the inode for encrypted files. Regards, The Spectrum Scale (GPFS) team ------------------------------------------------------------------------------------------------------------------ If you feel that your question can benefit other users of Spectrum Scale (GPFS), then please post it to the public IBM developerWroks Forum at https://www.ibm.com/developerworks/community/forums/html/forum?id=11111111-0000-0000-0000-000000000479 . If your query concerns a potential software error in Spectrum Scale (GPFS) and you have an IBM software maintenance contract please contact 1-800-237-5511 in the United States or your local IBM Service Center in other countries. The forum is informally monitored as time permits and should not be used for priority messages to the Spectrum Scale (GPFS) team. From: Jonathan Buzzard To: gpfsug main discussion list Date: 07/24/2017 10:57 AM Subject: Re: [gpfsug-discuss] GPFS, LTFS/EE and data-in-inode? Sent by: gpfsug-discuss-bounces at spectrumscale.org On Mon, 2017-07-24 at 14:45 +0000, James Davis wrote: > Hey all, > > On the documentation of encryption restrictions and encryption/HAWC > interplay... > > The encryption documentation currently states: > > "Secure storage uses encryption to make data unreadable to anyone who > does not possess the necessary encryption keys...Only data, not > metadata, is encrypted." > > The HAWC restrictions include: > > "Encrypted data is never stored in the recovery log..." > > If this is unclear, I'm open to suggestions for improvements. > Just because *DATA* is stored in the metadata does not make it magically metadata. It's still data so you could quite reasonably conclude that it is encrypted. We have now been disabused of this, but the documentation is not clear and needs clarifying. Perhaps say metadata blocks are not encrypted. Or just a simple data stored in inodes is not encrypted would suffice. JAB. -- Jonathan A. Buzzard Email: jonathan (at) buzzard.me.uk Fife, United Kingdom. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From scale at us.ibm.com Mon Jul 31 05:57:44 2017 From: scale at us.ibm.com (IBM Spectrum Scale) Date: Mon, 31 Jul 2017 00:57:44 -0400 Subject: [gpfsug-discuss] Lost disks In-Reply-To: <1501153088.26563.39.camel@buzzard.me.uk> References: <1501153088.26563.39.camel@buzzard.me.uk> Message-ID: Jonathan, Regarding >> Thing is GPFS does not look at the NSD descriptors that much. So in my >> case it was several days before it was noticed, and only then because I >> rebooted the last NSD server as part of a rolling upgrade of GPFS. I >> could have cruised for weeks/months with no NSD descriptors if I had not >> restarted all the NSD servers. The moral of this is the overwrite could >> have take place quite some time ago. While GPFS does not normally read the NSD descriptors in the course of performing file system operations, as of 4.1.1 a periodic check is done on the content of various descriptors, and a message like [E] On-disk NSD descriptor of is valid but has a different ID. ID in cache is and ID on-disk is should get issued if the content of the descriptor on disk changes. Regards, The Spectrum Scale (GPFS) team ------------------------------------------------------------------------------------------------------------------ If you feel that your question can benefit other users of Spectrum Scale (GPFS), then please post it to the public IBM developerWroks Forum at https://www.ibm.com/developerworks/community/forums/html/forum?id=11111111-0000-0000-0000-000000000479 . If your query concerns a potential software error in Spectrum Scale (GPFS) and you have an IBM software maintenance contract please contact 1-800-237-5511 in the United States or your local IBM Service Center in other countries. The forum is informally monitored as time permits and should not be used for priority messages to the Spectrum Scale (GPFS) team. From: Jonathan Buzzard To: gpfsug main discussion list Date: 07/27/2017 06:58 AM Subject: Re: [gpfsug-discuss] Lost disks Sent by: gpfsug-discuss-bounces at spectrumscale.org On Wed, 2017-07-26 at 17:45 +0000, Oesterlin, Robert wrote: > One way this could possible happen would be a system is being > installed (I?m assuming this is Linux) and the FC adapter is active; > then the OS install will see disks and wipe out the NSD descriptor on > those disks. (Which is why the NSD V2 format was invented, to prevent > this from happening) If you don?t lose all of the descriptors, it?s > sometimes possible to manually re-construct the missing header > information - I?m assuming since you opened a PMR, IBM has looked at > this. This is a scenario I?ve had to recover from - twice. Back-end > array issue seems unlikely to me, I?d keep looking at the systems with > access to those LUNs and see what commands/operations could have been > run. I would concur that this is the most likely scenario; an install where for whatever reason the machine could see the disks and they are gone. I know that RHEL6 and its derivatives will do that for you. Has happened to me at previous place of work where another admin forgot to de-zone a server, went to install CentOS6 as part of a cluster upgrade from CentOS5 and overwrote all the NSD descriptors. Thing is GPFS does not look at the NSD descriptors that much. So in my case it was several days before it was noticed, and only then because I rebooted the last NSD server as part of a rolling upgrade of GPFS. I could have cruised for weeks/months with no NSD descriptors if I had not restarted all the NSD servers. The moral of this is the overwrite could have take place quite some time ago. Basically if the disks are all missing then the NSD descriptor has been overwritten, and the protestations of the client are irrelevant. The chances of the disk array doing it to *ALL* the disks is somewhere around ? IMHO. JAB. -- Jonathan A. Buzzard Email: jonathan (at) buzzard.me.uk Fife, United Kingdom. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From Mark.Bush at siriuscom.com Mon Jul 31 18:30:34 2017 From: Mark.Bush at siriuscom.com (Mark Bush) Date: Mon, 31 Jul 2017 17:30:34 +0000 Subject: [gpfsug-discuss] Auditing Message-ID: Does someone already have a policy that can extract typical file audit items (user_id, last written, opened/accessed, modified, deleted, etc)? Am I barking up the wrong tree for this is there a better way to get this type of data from a Spectrum Scale filesystem? Mark This message (including any attachments) is intended only for the use of the individual or entity to which it is addressed and may contain information that is non-public, proprietary, privileged, confidential, and exempt from disclosure under applicable law. If you are not the intended recipient, you are hereby notified that any use, dissemination, distribution, or copying of this communication is strictly prohibited. This message may be viewed by parties at Sirius Computer Solutions other than those named in the message header. This message does not contain an official representation of Sirius Computer Solutions. If you have received this communication in error, notify Sirius Computer Solutions immediately and (i) destroy this message if a facsimile or (ii) delete this message immediately if this is an electronic communication. Thank you. Sirius Computer Solutions -------------- next part -------------- An HTML attachment was scrubbed... URL: From Renar.Grunenberg at huk-coburg.de Mon Jul 31 18:44:21 2017 From: Renar.Grunenberg at huk-coburg.de (Grunenberg, Renar) Date: Mon, 31 Jul 2017 17:44:21 +0000 Subject: [gpfsug-discuss] Quota and hardlimit enforcement Message-ID: <200a086c1740448da544e667c03887e5@SMXRF105.msg.hukrf.de> Hallo All, we are on Version 4.2.3.2 and see some missunderstandig in the enforcement of hardlimit definitions on a flieset quota. What we see is we put some 200 GB files on following quota definitions: quota 150 GB Limit 250 GB Grace none. After the creating of one 200 GB we hit the softquota limit, thats ok. But After the the second file was created!! we expect an io error but it don?t happen. We define all well know Parameters (-Q,..) on the filesystem . Is this a bug or a Feature? mmcheckquota are already running at first. Regards Renar. Renar Grunenberg Abteilung Informatik ? Betrieb HUK-COBURG Bahnhofsplatz 96444 Coburg Telefon: 09561 96-44110 Telefax: 09561 96-44104 E-Mail: Renar.Grunenberg at huk-coburg.de Internet: www.huk.de ________________________________ HUK-COBURG Haftpflicht-Unterst?tzungs-Kasse kraftfahrender Beamter Deutschlands a. G. in Coburg Reg.-Gericht Coburg HRB 100; St.-Nr. 9212/101/00021 Sitz der Gesellschaft: Bahnhofsplatz, 96444 Coburg Vorsitzender des Aufsichtsrats: Prof. Dr. Heinrich R. Schradin. Vorstand: Klaus-J?rgen Heitmann (Sprecher), Stefan Gronbach, Dr. Hans Olav Her?y, Dr. J?rg Rheinl?nder (stv.), Sarah R?ssler, Daniel Thomas (stv.). ________________________________ Diese Nachricht enth?lt vertrauliche und/oder rechtlich gesch?tzte Informationen. Wenn Sie nicht der richtige Adressat sind oder diese Nachricht irrt?mlich erhalten haben, informieren Sie bitte sofort den Absender und vernichten Sie diese Nachricht. Das unerlaubte Kopieren sowie die unbefugte Weitergabe dieser Nachricht ist nicht gestattet. This information may contain confidential and/or privileged information. If you are not the intended recipient (or have received this information in error) please notify the sender immediately and destroy this information. Any unauthorized copying, disclosure or distribution of the material in this information is strictly forbidden. ________________________________ -------------- next part -------------- An HTML attachment was scrubbed... URL: From eric.wonderley at vt.edu Mon Jul 31 18:54:52 2017 From: eric.wonderley at vt.edu (J. Eric Wonderley) Date: Mon, 31 Jul 2017 13:54:52 -0400 Subject: [gpfsug-discuss] Quota and hardlimit enforcement In-Reply-To: <200a086c1740448da544e667c03887e5@SMXRF105.msg.hukrf.de> References: <200a086c1740448da544e667c03887e5@SMXRF105.msg.hukrf.de> Message-ID: Hi Renar: What does 'mmlsquota -j fileset filesystem' report? I did not think you would get a grace period of none unless the hardlimit=softlimit. On Mon, Jul 31, 2017 at 1:44 PM, Grunenberg, Renar < Renar.Grunenberg at huk-coburg.de> wrote: > Hallo All, > we are on Version 4.2.3.2 and see some missunderstandig in the enforcement > of hardlimit definitions on a flieset quota. What we see is we put some 200 > GB files on following quota definitions: quota 150 GB Limit 250 GB Grace > none. > After the creating of one 200 GB we hit the softquota limit, thats ok. But > After the the second file was created!! we expect an io error but it don?t > happen. We define all well know Parameters (-Q,..) on the filesystem . Is > this a bug or a Feature? mmcheckquota are already running at first. > Regards Renar. > > > > Renar Grunenberg > Abteilung Informatik ? Betrieb > > HUK-COBURG > Bahnhofsplatz > 96444 Coburg > Telefon: 09561 96-44110 > Telefax: 09561 96-44104 > E-Mail: Renar.Grunenberg at huk-coburg.de > Internet: www.huk.de > > ------------------------------ > HUK-COBURG Haftpflicht-Unterst?tzungs-Kasse kraftfahrender Beamter > Deutschlands a. G. in Coburg > Reg.-Gericht Coburg HRB 100; St.-Nr. 9212/101/00021 > Sitz der Gesellschaft: Bahnhofsplatz, 96444 Coburg > Vorsitzender des Aufsichtsrats: Prof. Dr. Heinrich R. Schradin. > Vorstand: Klaus-J?rgen Heitmann (Sprecher), Stefan Gronbach, Dr. Hans Olav > Her?y, Dr. J?rg Rheinl?nder (stv.), Sarah R?ssler, Daniel Thomas (stv.). > ------------------------------ > Diese Nachricht enth?lt vertrauliche und/oder rechtlich gesch?tzte > Informationen. > Wenn Sie nicht der richtige Adressat sind oder diese Nachricht irrt?mlich > erhalten haben, > informieren Sie bitte sofort den Absender und vernichten Sie diese > Nachricht. > Das unerlaubte Kopieren sowie die unbefugte Weitergabe dieser Nachricht > ist nicht gestattet. > > This information may contain confidential and/or privileged information. > If you are not the intended recipient (or have received this information > in error) please notify the > sender immediately and destroy this information. > Any unauthorized copying, disclosure or distribution of the material in > this information is strictly forbidden. > ------------------------------ > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jfosburg at mdanderson.org Mon Jul 31 18:56:46 2017 From: jfosburg at mdanderson.org (Fosburgh,Jonathan) Date: Mon, 31 Jul 2017 17:56:46 +0000 Subject: [gpfsug-discuss] Auditing In-Reply-To: References: Message-ID: At present there is not a method to audit file access. Jonathan Fosburgh Principal Application Systems Analyst Storage Team IT Operations jfosburg at mdanderson.org (713) 745-9346 On 07/31/2017 12:30 PM, Mark Bush wrote: Does someone already have a policy that can extract typical file audit items (user_id, last written, opened/accessed, modified, deleted, etc)? Am I barking up the wrong tree for this is there a better way to get this type of data from a Spectrum Scale filesystem? Mark This message (including any attachments) is intended only for the use of the individual or entity to which it is addressed and may contain information that is non-public, proprietary, privileged, confidential, and exempt from disclosure under applicable law. If you are not the intended recipient, you are hereby notified that any use, dissemination, distribution, or copying of this communication is strictly prohibited. This message may be viewed by parties at Sirius Computer Solutions other than those named in the message header. This message does not contain an official representation of Sirius Computer Solutions. If you have received this communication in error, notify Sirius Computer Solutions immediately and (i) destroy this message if a facsimile or (ii) delete this message immediately if this is an electronic communication. Thank you. Sirius Computer Solutions _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss The information contained in this e-mail message may be privileged, confidential, and/or protected from disclosure. This e-mail message may contain protected health information (PHI); dissemination of PHI should comply with applicable federal and state laws. If you are not the intended recipient, or an authorized representative of the intended recipient, any further review, disclosure, use, dissemination, distribution, or copying of this message or any attachment (or the information contained therein) is strictly prohibited. If you think that you have received this e-mail message in error, please notify the sender by return e-mail and delete all references to it and its contents from your systems. -------------- next part -------------- An HTML attachment was scrubbed... URL: From Robert.Oesterlin at nuance.com Mon Jul 31 19:02:30 2017 From: Robert.Oesterlin at nuance.com (Oesterlin, Robert) Date: Mon, 31 Jul 2017 18:02:30 +0000 Subject: [gpfsug-discuss] Re Auditing Message-ID: We run a policy that looks like this: -- cut here -- define(daysToEpoch, days(timestamp('1970-01-01 00:00:00.0'))) define(unixTS, char(int( (( days(\$1) - daysToEpoch ) * 86400) + ( hour(\$1) * 3600) + (minute(\$1) * 60) + (second(\$1)) )) ) rule 'dumpall' list '"$filesystem"' DIRECTORIES_PLUS SHOW( '|' || varchar(user_id) || '|' || varchar(group_id) || '|' || char(mode) || '|' || varchar(file_size) || '|' || varchar(kb_allocated) || '|' || varchar(nlink) || '|' || unixTS(access_time,19) || '|' || unixTS(modification_time) || '|' || unixTS(creation_time) || '|' || char(misc_attributes,1) || '|' ) -- cut here -- Bob Oesterlin Sr Principal Storage Engineer, Nuance From: on behalf of Mark Bush Reply-To: gpfsug main discussion list Date: Monday, July 31, 2017 at 12:31 PM To: "gpfsug-discuss at spectrumscale.org" Subject: [EXTERNAL] [gpfsug-discuss] Auditing Does someone already have a policy that can extract typical file audit items (user_id, last written, opened/accessed, modified, deleted, etc)? Am I barking up the wrong tree for this is there a better way to get this type of data from a Spectrum Scale filesystem? -------------- next part -------------- An HTML attachment was scrubbed... URL: From Mark.Bush at siriuscom.com Mon Jul 31 19:05:37 2017 From: Mark.Bush at siriuscom.com (Mark Bush) Date: Mon, 31 Jul 2017 18:05:37 +0000 Subject: [gpfsug-discuss] Re Auditing In-Reply-To: References: Message-ID: Brilliant. Thanks Bob. From: Oesterlin, Robert [mailto:Robert.Oesterlin at nuance.com] Sent: Monday, July 31, 2017 1:03 PM To: gpfsug main discussion list Subject: [gpfsug-discuss] Re Auditing We run a policy that looks like this: -- cut here -- define(daysToEpoch, days(timestamp('1970-01-01 00:00:00.0'))) define(unixTS, char(int( (( days(\$1) - daysToEpoch ) * 86400) + ( hour(\$1) * 3600) + (minute(\$1) * 60) + (second(\$1)) )) ) rule 'dumpall' list '"$filesystem"' DIRECTORIES_PLUS SHOW( '|' || varchar(user_id) || '|' || varchar(group_id) || '|' || char(mode) || '|' || varchar(file_size) || '|' || varchar(kb_allocated) || '|' || varchar(nlink) || '|' || unixTS(access_time,19) || '|' || unixTS(modification_time) || '|' || unixTS(creation_time) || '|' || char(misc_attributes,1) || '|' ) -- cut here -- Bob Oesterlin Sr Principal Storage Engineer, Nuance From: > on behalf of Mark Bush > Reply-To: gpfsug main discussion list > Date: Monday, July 31, 2017 at 12:31 PM To: "gpfsug-discuss at spectrumscale.org" > Subject: [EXTERNAL] [gpfsug-discuss] Auditing Does someone already have a policy that can extract typical file audit items (user_id, last written, opened/accessed, modified, deleted, etc)? Am I barking up the wrong tree for this is there a better way to get this type of data from a Spectrum Scale filesystem? This message (including any attachments) is intended only for the use of the individual or entity to which it is addressed and may contain information that is non-public, proprietary, privileged, confidential, and exempt from disclosure under applicable law. If you are not the intended recipient, you are hereby notified that any use, dissemination, distribution, or copying of this communication is strictly prohibited. This message may be viewed by parties at Sirius Computer Solutions other than those named in the message header. This message does not contain an official representation of Sirius Computer Solutions. If you have received this communication in error, notify Sirius Computer Solutions immediately and (i) destroy this message if a facsimile or (ii) delete this message immediately if this is an electronic communication. Thank you. Sirius Computer Solutions -------------- next part -------------- An HTML attachment was scrubbed... URL: From makaplan at us.ibm.com Mon Jul 31 19:26:52 2017 From: makaplan at us.ibm.com (Marc A Kaplan) Date: Mon, 31 Jul 2017 14:26:52 -0400 Subject: [gpfsug-discuss] Re Auditing - timestamps In-Reply-To: References: Message-ID: The "ILM" chapter in the Admin Guide has some tips, among which: 18. You can convert a time interval value to a number of seconds with the SQL cast syntax, as in the following example: define([toSeconds],[(($1) SECONDS(12,6))]) define([toUnixSeconds],[toSeconds($1 - ?1970-1-1 at 0:00?)]) RULE external list b RULE list b SHOW(?sinceNow=? toSeconds(current_timestamp-modification_time) ) RULE external list c RULE list c SHOW(?sinceUnixEpoch=? toUnixSeconds(modification_time) ) The following method is also supported: define(access_age_in_days,( INTEGER(( (CURRENT_TIMESTAMP - ACCESS_TIME) SECONDS)) /(24*3600.0) ) ) RULE external list w exec ?? RULE list w weight(access_age_in_days) show(access_age_in_days) --marc of GPFS -------------- next part -------------- An HTML attachment was scrubbed... URL: From pinto at scinet.utoronto.ca Mon Jul 31 19:46:53 2017 From: pinto at scinet.utoronto.ca (Jaime Pinto) Date: Mon, 31 Jul 2017 14:46:53 -0400 Subject: [gpfsug-discuss] Quota and hardlimit enforcement In-Reply-To: <200a086c1740448da544e667c03887e5@SMXRF105.msg.hukrf.de> References: <200a086c1740448da544e667c03887e5@SMXRF105.msg.hukrf.de> Message-ID: <20170731144653.160355y5whmerokd@support.scinet.utoronto.ca> Renar For as long as the usage is below the hard limit (space or inodes) and below the grace period you'll be able to write. I don't think you can set the grace period to an specific value as a quota parameter, such as none. That is set at the filesystem creation time. BTW, grace period limit has been a mystery to me for many years. My impression is that GPFS keeps changing it internally depending on the position of the moon. I think ours is 2 hours, but at times I can see users writing for longer. Jaime Quoting "Grunenberg, Renar" : > Hallo All, > we are on Version 4.2.3.2 and see some missunderstandig in the > enforcement of hardlimit definitions on a flieset quota. What we see > is we put some 200 GB files on following quota definitions: quota > 150 GB Limit 250 GB Grace none. > After the creating of one 200 GB we hit the softquota limit, thats > ok. But After the the second file was created!! we expect an io > error but it don?t happen. We define all well know Parameters > (-Q,..) on the filesystem . Is this a bug or a Feature? mmcheckquota > are already running at first. > Regards Renar. > > > Renar Grunenberg > Abteilung Informatik ? Betrieb > > HUK-COBURG > Bahnhofsplatz > 96444 Coburg > Telefon: 09561 96-44110 > Telefax: 09561 96-44104 > E-Mail: Renar.Grunenberg at huk-coburg.de > Internet: www.huk.de > ________________________________ > HUK-COBURG Haftpflicht-Unterst?tzungs-Kasse kraftfahrender Beamter > Deutschlands a. G. in Coburg > Reg.-Gericht Coburg HRB 100; St.-Nr. 9212/101/00021 > Sitz der Gesellschaft: Bahnhofsplatz, 96444 Coburg > Vorsitzender des Aufsichtsrats: Prof. Dr. Heinrich R. Schradin. > Vorstand: Klaus-J?rgen Heitmann (Sprecher), Stefan Gronbach, Dr. > Hans Olav Her?y, Dr. J?rg Rheinl?nder (stv.), Sarah R?ssler, Daniel > Thomas (stv.). > ________________________________ > Diese Nachricht enth?lt vertrauliche und/oder rechtlich gesch?tzte > Informationen. > Wenn Sie nicht der richtige Adressat sind oder diese Nachricht > irrt?mlich erhalten haben, > informieren Sie bitte sofort den Absender und vernichten Sie diese Nachricht. > Das unerlaubte Kopieren sowie die unbefugte Weitergabe dieser > Nachricht ist nicht gestattet. > > This information may contain confidential and/or privileged information. > If you are not the intended recipient (or have received this > information in error) please notify the > sender immediately and destroy this information. > Any unauthorized copying, disclosure or distribution of the material > in this information is strictly forbidden. > ________________________________ > ************************************ TELL US ABOUT YOUR SUCCESS STORIES http://www.scinethpc.ca/testimonials ************************************ --- Jaime Pinto SciNet HPC Consortium - Compute/Calcul Canada www.scinet.utoronto.ca - www.computecanada.ca University of Toronto 661 University Ave. (MaRS), Suite 1140 Toronto, ON, M5G1M1 P: 416-978-2755 C: 416-505-1477 ---------------------------------------------------------------- This message was sent using IMP at SciNet Consortium, University of Toronto. From Renar.Grunenberg at huk-coburg.de Mon Jul 31 20:04:56 2017 From: Renar.Grunenberg at huk-coburg.de (Grunenberg, Renar) Date: Mon, 31 Jul 2017 19:04:56 +0000 Subject: [gpfsug-discuss] Quota and hardlimit enforcement In-Reply-To: References: <200a086c1740448da544e667c03887e5@SMXRF105.msg.hukrf.de> Message-ID: Hallo J. Eric, hallo Jaime, Ok after we hit the softlimit we see that the graceperiod are go to 7 days. I think that?s the default. But was does it mean. After we reach the ?hard?-limit. we see additionaly the gbytes in_doubt. My interpretation now we can write many gb to the nospace-left event in the filesystem. But our intention is to restricted some application to write only to the hardlimit in the fileset. Any hints to accomplish this? Renar Grunenberg Abteilung Informatik ? Betrieb HUK-COBURG Bahnhofsplatz 96444 Coburg Telefon: 09561 96-44110 Telefax: 09561 96-44104 E-Mail: Renar.Grunenberg at huk-coburg.de Internet: www.huk.de ________________________________ HUK-COBURG Haftpflicht-Unterst?tzungs-Kasse kraftfahrender Beamter Deutschlands a. G. in Coburg Reg.-Gericht Coburg HRB 100; St.-Nr. 9212/101/00021 Sitz der Gesellschaft: Bahnhofsplatz, 96444 Coburg Vorsitzender des Aufsichtsrats: Prof. Dr. Heinrich R. Schradin. Vorstand: Klaus-J?rgen Heitmann (Sprecher), Stefan Gronbach, Dr. Hans Olav Her?y, Dr. J?rg Rheinl?nder (stv.), Sarah R?ssler, Daniel Thomas (stv.). ________________________________ Diese Nachricht enth?lt vertrauliche und/oder rechtlich gesch?tzte Informationen. Wenn Sie nicht der richtige Adressat sind oder diese Nachricht irrt?mlich erhalten haben, informieren Sie bitte sofort den Absender und vernichten Sie diese Nachricht. Das unerlaubte Kopieren sowie die unbefugte Weitergabe dieser Nachricht ist nicht gestattet. This information may contain confidential and/or privileged information. If you are not the intended recipient (or have received this information in error) please notify the sender immediately and destroy this information. Any unauthorized copying, disclosure or distribution of the material in this information is strictly forbidden. ________________________________ Von: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] Im Auftrag von J. Eric Wonderley Gesendet: Montag, 31. Juli 2017 19:55 An: gpfsug main discussion list Betreff: Re: [gpfsug-discuss] Quota and hardlimit enforcement Hi Renar: What does 'mmlsquota -j fileset filesystem' report? I did not think you would get a grace period of none unless the hardlimit=softlimit. On Mon, Jul 31, 2017 at 1:44 PM, Grunenberg, Renar > wrote: Hallo All, we are on Version 4.2.3.2 and see some missunderstandig in the enforcement of hardlimit definitions on a flieset quota. What we see is we put some 200 GB files on following quota definitions: quota 150 GB Limit 250 GB Grace none. After the creating of one 200 GB we hit the softquota limit, thats ok. But After the the second file was created!! we expect an io error but it don?t happen. We define all well know Parameters (-Q,..) on the filesystem . Is this a bug or a Feature? mmcheckquota are already running at first. Regards Renar. Renar Grunenberg Abteilung Informatik ? Betrieb HUK-COBURG Bahnhofsplatz 96444 Coburg Telefon: 09561 96-44110 Telefax: 09561 96-44104 E-Mail: Renar.Grunenberg at huk-coburg.de Internet: www.huk.de ________________________________ HUK-COBURG Haftpflicht-Unterst?tzungs-Kasse kraftfahrender Beamter Deutschlands a. G. in Coburg Reg.-Gericht Coburg HRB 100; St.-Nr. 9212/101/00021 Sitz der Gesellschaft: Bahnhofsplatz, 96444 Coburg Vorsitzender des Aufsichtsrats: Prof. Dr. Heinrich R. Schradin. Vorstand: Klaus-J?rgen Heitmann (Sprecher), Stefan Gronbach, Dr. Hans Olav Her?y, Dr. J?rg Rheinl?nder (stv.), Sarah R?ssler, Daniel Thomas (stv.). ________________________________ Diese Nachricht enth?lt vertrauliche und/oder rechtlich gesch?tzte Informationen. Wenn Sie nicht der richtige Adressat sind oder diese Nachricht irrt?mlich erhalten haben, informieren Sie bitte sofort den Absender und vernichten Sie diese Nachricht. Das unerlaubte Kopieren sowie die unbefugte Weitergabe dieser Nachricht ist nicht gestattet. This information may contain confidential and/or privileged information. If you are not the intended recipient (or have received this information in error) please notify the sender immediately and destroy this information. Any unauthorized copying, disclosure or distribution of the material in this information is strictly forbidden. ________________________________ _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From Kevin.Buterbaugh at Vanderbilt.Edu Mon Jul 31 20:21:46 2017 From: Kevin.Buterbaugh at Vanderbilt.Edu (Buterbaugh, Kevin L) Date: Mon, 31 Jul 2017 19:21:46 +0000 Subject: [gpfsug-discuss] Quota and hardlimit enforcement In-Reply-To: References: <200a086c1740448da544e667c03887e5@SMXRF105.msg.hukrf.de> Message-ID: Hi Renar, I?m sure this is the case, but I don?t see anywhere in this thread where this is explicitly stated ? you?re not doing your tests as root, are you? root, of course, is not bound by any quotas. Kevin On Jul 31, 2017, at 2:04 PM, Grunenberg, Renar > wrote: Hallo J. Eric, hallo Jaime, Ok after we hit the softlimit we see that the graceperiod are go to 7 days. I think that?s the default. But was does it mean. After we reach the ?hard?-limit. we see additionaly the gbytes in_doubt. My interpretation now we can write many gb to the nospace-left event in the filesystem. But our intention is to restricted some application to write only to the hardlimit in the fileset. Any hints to accomplish this? Renar Grunenberg Abteilung Informatik ? Betrieb HUK-COBURG Bahnhofsplatz 96444 Coburg Telefon: 09561 96-44110 Telefax: 09561 96-44104 E-Mail: Renar.Grunenberg at huk-coburg.de Internet: www.huk.de ________________________________ HUK-COBURG Haftpflicht-Unterst?tzungs-Kasse kraftfahrender Beamter Deutschlands a. G. in Coburg Reg.-Gericht Coburg HRB 100; St.-Nr. 9212/101/00021 Sitz der Gesellschaft: Bahnhofsplatz, 96444 Coburg Vorsitzender des Aufsichtsrats: Prof. Dr. Heinrich R. Schradin. Vorstand: Klaus-J?rgen Heitmann (Sprecher), Stefan Gronbach, Dr. Hans Olav Her?y, Dr. J?rg Rheinl?nder (stv.), Sarah R?ssler, Daniel Thomas (stv.). ________________________________ Diese Nachricht enth?lt vertrauliche und/oder rechtlich gesch?tzte Informationen. Wenn Sie nicht der richtige Adressat sind oder diese Nachricht irrt?mlich erhalten haben, informieren Sie bitte sofort den Absender und vernichten Sie diese Nachricht. Das unerlaubte Kopieren sowie die unbefugte Weitergabe dieser Nachricht ist nicht gestattet. This information may contain confidential and/or privileged information. If you are not the intended recipient (or have received this information in error) please notify the sender immediately and destroy this information. Any unauthorized copying, disclosure or distribution of the material in this information is strictly forbidden. ________________________________ Von: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] Im Auftrag von J. Eric Wonderley Gesendet: Montag, 31. Juli 2017 19:55 An: gpfsug main discussion list > Betreff: Re: [gpfsug-discuss] Quota and hardlimit enforcement Hi Renar: What does 'mmlsquota -j fileset filesystem' report? I did not think you would get a grace period of none unless the hardlimit=softlimit. On Mon, Jul 31, 2017 at 1:44 PM, Grunenberg, Renar > wrote: Hallo All, we are on Version 4.2.3.2 and see some missunderstandig in the enforcement of hardlimit definitions on a flieset quota. What we see is we put some 200 GB files on following quota definitions: quota 150 GB Limit 250 GB Grace none. After the creating of one 200 GB we hit the softquota limit, thats ok. But After the the second file was created!! we expect an io error but it don?t happen. We define all well know Parameters (-Q,..) on the filesystem . Is this a bug or a Feature? mmcheckquota are already running at first. Regards Renar. Renar Grunenberg Abteilung Informatik ? Betrieb HUK-COBURG Bahnhofsplatz 96444 Coburg Telefon: 09561 96-44110 Telefax: 09561 96-44104 E-Mail: Renar.Grunenberg at huk-coburg.de Internet: www.huk.de ________________________________ HUK-COBURG Haftpflicht-Unterst?tzungs-Kasse kraftfahrender Beamter Deutschlands a. G. in Coburg Reg.-Gericht Coburg HRB 100; St.-Nr. 9212/101/00021 Sitz der Gesellschaft: Bahnhofsplatz, 96444 Coburg Vorsitzender des Aufsichtsrats: Prof. Dr. Heinrich R. Schradin. Vorstand: Klaus-J?rgen Heitmann (Sprecher), Stefan Gronbach, Dr. Hans Olav Her?y, Dr. J?rg Rheinl?nder (stv.), Sarah R?ssler, Daniel Thomas (stv.). ________________________________ Diese Nachricht enth?lt vertrauliche und/oder rechtlich gesch?tzte Informationen. Wenn Sie nicht der richtige Adressat sind oder diese Nachricht irrt?mlich erhalten haben, informieren Sie bitte sofort den Absender und vernichten Sie diese Nachricht. Das unerlaubte Kopieren sowie die unbefugte Weitergabe dieser Nachricht ist nicht gestattet. This information may contain confidential and/or privileged information. If you are not the intended recipient (or have received this information in error) please notify the sender immediately and destroy this information. Any unauthorized copying, disclosure or distribution of the material in this information is strictly forbidden. ________________________________ _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From Renar.Grunenberg at huk-coburg.de Mon Jul 31 20:30:20 2017 From: Renar.Grunenberg at huk-coburg.de (Grunenberg, Renar) Date: Mon, 31 Jul 2017 19:30:20 +0000 Subject: [gpfsug-discuss] Quota and hardlimit enforcement In-Reply-To: References: <200a086c1740448da544e667c03887e5@SMXRF105.msg.hukrf.de> Message-ID: Hallo Kevin, thanks for your hint i will check these tomorrow, and yes as root, lol. Regards Renar Renar Grunenberg Abteilung Informatik ? Betrieb HUK-COBURG Bahnhofsplatz 96444 Coburg Telefon: 09561 96-44110 Telefax: 09561 96-44104 E-Mail: Renar.Grunenberg at huk-coburg.de Internet: www.huk.de ________________________________ HUK-COBURG Haftpflicht-Unterst?tzungs-Kasse kraftfahrender Beamter Deutschlands a. G. in Coburg Reg.-Gericht Coburg HRB 100; St.-Nr. 9212/101/00021 Sitz der Gesellschaft: Bahnhofsplatz, 96444 Coburg Vorsitzender des Aufsichtsrats: Prof. Dr. Heinrich R. Schradin. Vorstand: Klaus-J?rgen Heitmann (Sprecher), Stefan Gronbach, Dr. Hans Olav Her?y, Dr. J?rg Rheinl?nder (stv.), Sarah R?ssler, Daniel Thomas (stv.). ________________________________ Diese Nachricht enth?lt vertrauliche und/oder rechtlich gesch?tzte Informationen. Wenn Sie nicht der richtige Adressat sind oder diese Nachricht irrt?mlich erhalten haben, informieren Sie bitte sofort den Absender und vernichten Sie diese Nachricht. Das unerlaubte Kopieren sowie die unbefugte Weitergabe dieser Nachricht ist nicht gestattet. This information may contain confidential and/or privileged information. If you are not the intended recipient (or have received this information in error) please notify the sender immediately and destroy this information. Any unauthorized copying, disclosure or distribution of the material in this information is strictly forbidden. ________________________________ Von: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] Im Auftrag von Buterbaugh, Kevin L Gesendet: Montag, 31. Juli 2017 21:22 An: gpfsug main discussion list Betreff: Re: [gpfsug-discuss] Quota and hardlimit enforcement Hi Renar, I?m sure this is the case, but I don?t see anywhere in this thread where this is explicitly stated ? you?re not doing your tests as root, are you? root, of course, is not bound by any quotas. Kevin On Jul 31, 2017, at 2:04 PM, Grunenberg, Renar > wrote: Hallo J. Eric, hallo Jaime, Ok after we hit the softlimit we see that the graceperiod are go to 7 days. I think that?s the default. But was does it mean. After we reach the ?hard?-limit. we see additionaly the gbytes in_doubt. My interpretation now we can write many gb to the nospace-left event in the filesystem. But our intention is to restricted some application to write only to the hardlimit in the fileset. Any hints to accomplish this? Renar Grunenberg Abteilung Informatik ? Betrieb HUK-COBURG Bahnhofsplatz 96444 Coburg Telefon: 09561 96-44110 Telefax: 09561 96-44104 E-Mail: Renar.Grunenberg at huk-coburg.de Internet: www.huk.de ________________________________ HUK-COBURG Haftpflicht-Unterst?tzungs-Kasse kraftfahrender Beamter Deutschlands a. G. in Coburg Reg.-Gericht Coburg HRB 100; St.-Nr. 9212/101/00021 Sitz der Gesellschaft: Bahnhofsplatz, 96444 Coburg Vorsitzender des Aufsichtsrats: Prof. Dr. Heinrich R. Schradin. Vorstand: Klaus-J?rgen Heitmann (Sprecher), Stefan Gronbach, Dr. Hans Olav Her?y, Dr. J?rg Rheinl?nder (stv.), Sarah R?ssler, Daniel Thomas (stv.). ________________________________ Diese Nachricht enth?lt vertrauliche und/oder rechtlich gesch?tzte Informationen. Wenn Sie nicht der richtige Adressat sind oder diese Nachricht irrt?mlich erhalten haben, informieren Sie bitte sofort den Absender und vernichten Sie diese Nachricht. Das unerlaubte Kopieren sowie die unbefugte Weitergabe dieser Nachricht ist nicht gestattet. This information may contain confidential and/or privileged information. If you are not the intended recipient (or have received this information in error) please notify the sender immediately and destroy this information. Any unauthorized copying, disclosure or distribution of the material in this information is strictly forbidden. ________________________________ Von: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] Im Auftrag von J. Eric Wonderley Gesendet: Montag, 31. Juli 2017 19:55 An: gpfsug main discussion list > Betreff: Re: [gpfsug-discuss] Quota and hardlimit enforcement Hi Renar: What does 'mmlsquota -j fileset filesystem' report? I did not think you would get a grace period of none unless the hardlimit=softlimit. On Mon, Jul 31, 2017 at 1:44 PM, Grunenberg, Renar > wrote: Hallo All, we are on Version 4.2.3.2 and see some missunderstandig in the enforcement of hardlimit definitions on a flieset quota. What we see is we put some 200 GB files on following quota definitions: quota 150 GB Limit 250 GB Grace none. After the creating of one 200 GB we hit the softquota limit, thats ok. But After the the second file was created!! we expect an io error but it don?t happen. We define all well know Parameters (-Q,..) on the filesystem . Is this a bug or a Feature? mmcheckquota are already running at first. Regards Renar. Renar Grunenberg Abteilung Informatik ? Betrieb HUK-COBURG Bahnhofsplatz 96444 Coburg Telefon: 09561 96-44110 Telefax: 09561 96-44104 E-Mail: Renar.Grunenberg at huk-coburg.de Internet: www.huk.de ________________________________ HUK-COBURG Haftpflicht-Unterst?tzungs-Kasse kraftfahrender Beamter Deutschlands a. G. in Coburg Reg.-Gericht Coburg HRB 100; St.-Nr. 9212/101/00021 Sitz der Gesellschaft: Bahnhofsplatz, 96444 Coburg Vorsitzender des Aufsichtsrats: Prof. Dr. Heinrich R. Schradin. Vorstand: Klaus-J?rgen Heitmann (Sprecher), Stefan Gronbach, Dr. Hans Olav Her?y, Dr. J?rg Rheinl?nder (stv.), Sarah R?ssler, Daniel Thomas (stv.). ________________________________ Diese Nachricht enth?lt vertrauliche und/oder rechtlich gesch?tzte Informationen. Wenn Sie nicht der richtige Adressat sind oder diese Nachricht irrt?mlich erhalten haben, informieren Sie bitte sofort den Absender und vernichten Sie diese Nachricht. Das unerlaubte Kopieren sowie die unbefugte Weitergabe dieser Nachricht ist nicht gestattet. This information may contain confidential and/or privileged information. If you are not the intended recipient (or have received this information in error) please notify the sender immediately and destroy this information. Any unauthorized copying, disclosure or distribution of the material in this information is strictly forbidden. ________________________________ _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From pinto at scinet.utoronto.ca Mon Jul 31 21:03:53 2017 From: pinto at scinet.utoronto.ca (Jaime Pinto) Date: Mon, 31 Jul 2017 16:03:53 -0400 Subject: [gpfsug-discuss] Quota and hardlimit enforcement In-Reply-To: References: <200a086c1740448da544e667c03887e5@SMXRF105.msg.hukrf.de> Message-ID: <20170731160353.54412s4i1r957eax@support.scinet.utoronto.ca> In addition, the in_doubt column is a function of the data turn-over and the internal gpfs accounting synchronization period (beyond root control). The higher the in_doubt values the less accurate the real amount of space/inodes a user/group/fileset has in the filesystem. What I noticed in practice is the the in_doubt values only get worst overtime, and work against the quotas, making them hit the limits sooner. Therefore, you may wish to run a 'mmcheckquota' crontab job once or twice a day, to reset the in_doubt column to zero mover often. GPFS has a very high lag to do this on its own in the most recent versions, and seldom really catches up on a very active filesystem. If your grace period is set to 7 days I can assure you that in an HPC environment it's the equivalent of not having quotas effectively. You should set it to 2 hours or 4 hours. In an environment such as ours a runway process can easily generate 500TB of data or 1 billion inodes in few hours, and choke the file system to all users/jobs. Jaime Quoting "Buterbaugh, Kevin L" : > Hi Renar, > > I?m sure this is the case, but I don?t see anywhere in this thread > where this is explicitly stated ? you?re not doing your tests as > root, are you? root, of course, is not bound by any quotas. > > Kevin > > On Jul 31, 2017, at 2:04 PM, Grunenberg, Renar > > > wrote: > > > Hallo J. Eric, hallo Jaime, > Ok after we hit the softlimit we see that the graceperiod are go to > 7 days. I think that?s the default. But was does it mean. > After we reach the ?hard?-limit. we see additionaly the gbytes in_doubt. > My interpretation now we can write many gb to the nospace-left event > in the filesystem. > But our intention is to restricted some application to write only to > the hardlimit in the fileset. Any hints to accomplish this? > > > > Renar Grunenberg > Abteilung Informatik ? Betrieb > > HUK-COBURG > Bahnhofsplatz > 96444 Coburg > Telefon: 09561 96-44110 > Telefax: 09561 96-44104 > E-Mail: Renar.Grunenberg at huk-coburg.de > Internet: www.huk.de > > ________________________________ > HUK-COBURG Haftpflicht-Unterst?tzungs-Kasse kraftfahrender Beamter > Deutschlands a. G. in Coburg > Reg.-Gericht Coburg HRB 100; St.-Nr. 9212/101/00021 > Sitz der Gesellschaft: Bahnhofsplatz, 96444 Coburg > Vorsitzender des Aufsichtsrats: Prof. Dr. Heinrich R. Schradin. > Vorstand: Klaus-J?rgen Heitmann (Sprecher), Stefan Gronbach, Dr. > Hans Olav Her?y, Dr. J?rg Rheinl?nder (stv.), Sarah R?ssler, Daniel > Thomas (stv.). > ________________________________ > Diese Nachricht enth?lt vertrauliche und/oder rechtlich gesch?tzte > Informationen. > Wenn Sie nicht der richtige Adressat sind oder diese Nachricht > irrt?mlich erhalten haben, > informieren Sie bitte sofort den Absender und vernichten Sie diese Nachricht. > Das unerlaubte Kopieren sowie die unbefugte Weitergabe dieser > Nachricht ist nicht gestattet. > > This information may contain confidential and/or privileged information. > If you are not the intended recipient (or have received this > information in error) please notify the > sender immediately and destroy this information. > Any unauthorized copying, disclosure or distribution of the material > in this information is strictly forbidden. > ________________________________ > > > > Von: > gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] Im Auftrag von J. Eric > Wonderley > Gesendet: Montag, 31. Juli 2017 19:55 > An: gpfsug main discussion list > > > Betreff: Re: [gpfsug-discuss] Quota and hardlimit enforcement > > Hi Renar: > What does 'mmlsquota -j fileset filesystem' report? > I did not think you would get a grace period of none unless the > hardlimit=softlimit. > > On Mon, Jul 31, 2017 at 1:44 PM, Grunenberg, Renar > > > wrote: > Hallo All, > we are on Version 4.2.3.2 and see some missunderstandig in the > enforcement of hardlimit definitions on a flieset quota. What we see > is we put some 200 GB files on following quota definitions: quota > 150 GB Limit 250 GB Grace none. > After the creating of one 200 GB we hit the softquota limit, thats > ok. But After the the second file was created!! we expect an io > error but it don?t happen. We define all well know Parameters > (-Q,..) on the filesystem . Is this a bug or a Feature? mmcheckquota > are already running at first. > Regards Renar. > > > Renar Grunenberg > Abteilung Informatik ? Betrieb > > HUK-COBURG > Bahnhofsplatz > 96444 Coburg > > Telefon: > > 09561 96-44110 > > Telefax: > > 09561 96-44104 > > E-Mail: > > Renar.Grunenberg at huk-coburg.de > > Internet: > > www.huk.de > > ________________________________ > HUK-COBURG Haftpflicht-Unterst?tzungs-Kasse kraftfahrender Beamter > Deutschlands a. G. in Coburg > Reg.-Gericht Coburg HRB 100; St.-Nr. 9212/101/00021 > Sitz der Gesellschaft: Bahnhofsplatz, 96444 Coburg > Vorsitzender des Aufsichtsrats: Prof. Dr. Heinrich R. Schradin. > Vorstand: Klaus-J?rgen Heitmann (Sprecher), Stefan Gronbach, Dr. > Hans Olav Her?y, Dr. J?rg Rheinl?nder (stv.), Sarah R?ssler, Daniel > Thomas (stv.). > ________________________________ > Diese Nachricht enth?lt vertrauliche und/oder rechtlich gesch?tzte > Informationen. > Wenn Sie nicht der richtige Adressat sind oder diese Nachricht > irrt?mlich erhalten haben, > informieren Sie bitte sofort den Absender und vernichten Sie diese Nachricht. > Das unerlaubte Kopieren sowie die unbefugte Weitergabe dieser > Nachricht ist nicht gestattet. > > This information may contain confidential and/or privileged information. > If you are not the intended recipient (or have received this > information in error) please notify the > sender immediately and destroy this information. > Any unauthorized copying, disclosure or distribution of the material > in this information is strictly forbidden. > ________________________________ > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > > > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > ************************************ TELL US ABOUT YOUR SUCCESS STORIES http://www.scinethpc.ca/testimonials ************************************ --- Jaime Pinto SciNet HPC Consortium - Compute/Calcul Canada www.scinet.utoronto.ca - www.computecanada.ca University of Toronto 661 University Ave. (MaRS), Suite 1140 Toronto, ON, M5G1M1 P: 416-978-2755 C: 416-505-1477 ---------------------------------------------------------------- This message was sent using IMP at SciNet Consortium, University of Toronto. From Kevin.Buterbaugh at Vanderbilt.Edu Mon Jul 31 21:11:14 2017 From: Kevin.Buterbaugh at Vanderbilt.Edu (Buterbaugh, Kevin L) Date: Mon, 31 Jul 2017 20:11:14 +0000 Subject: [gpfsug-discuss] Quota and hardlimit enforcement In-Reply-To: <20170731160353.54412s4i1r957eax@support.scinet.utoronto.ca> References: <200a086c1740448da544e667c03887e5@SMXRF105.msg.hukrf.de> <20170731160353.54412s4i1r957eax@support.scinet.utoronto.ca> Message-ID: <3789811E-523F-47AE-93F3-E2985DD84D60@vanderbilt.edu> Jaime, That?s heavily workload dependent. We run a traditional HPC cluster and have a 7 day grace on home and 14 days on scratch. By setting the soft and hard limits appropriately we?ve slammed the door on many a runaway user / group / fileset. YMMV? Kevin On Jul 31, 2017, at 3:03 PM, Jaime Pinto > wrote: If your grace period is set to 7 days I can assure you that in an HPC environment it's the equivalent of not having quotas effectively. You should set it to 2 hours or 4 hours. In an environment such as ours a runway process can easily generate 500TB of data or 1 billion inodes in few hours, and choke the file system to all users/jobs. Jaime ? Kevin Buterbaugh - Senior System Administrator Vanderbilt University - Advanced Computing Center for Research and Education Kevin.Buterbaugh at vanderbilt.edu - (615)875-9633 -------------- next part -------------- An HTML attachment was scrubbed... URL: