From jonathan at buzzard.me.uk  Sat Jul  1 10:20:18 2017
From: jonathan at buzzard.me.uk (Jonathan Buzzard)
Date: Sat, 1 Jul 2017 10:20:18 +0100
Subject: [gpfsug-discuss] Mass UID migration suggestions
In-Reply-To: <59566c3b.kqOSy9e2kg80XZ/k%hpc-luke@uconn.edu>
References: <59566c3b.kqOSy9e2kg80XZ/k%hpc-luke@uconn.edu>
Message-ID: <a2e559ea-476b-5628-c4aa-a6dacf382c68@buzzard.me.uk>

On 30/06/17 16:20, hpc-luke at uconn.edu wrote:
> Hello,
>
> 	We're trying to change most of our users uids, is there a clean way to
> migrate all of one users files with say `mmapplypolicy`? We have to change the
> owner of around 273539588 files, and my estimates for runtime are around 6 days.
>
> 	What we've been doing is indexing all of the files and splitting them up by
> owner which takes around an hour, and then we were locking the user out while we
> chown their files. I made it multi threaded as it weirdly gave a 10% speedup
> despite my expectation that multi threading access from a single node would not
> give any speedup.
>
> 	Generally I'm looking for advice on how to make the chowning faster. Would
> spreading the chowning processes over multiple nodes improve performance? Should
> I not stat the files before running lchown on them, since lchown checks the file
> before changing it? I saw mention of inodescan(), in an old gpfsug email, which
> speeds up disk read access, by not guaranteeing that the data is up to date. We
> have a maintenance day coming up where all users will be locked out, so the file
> handles(?) from GPFS's perspective will not be able to go stale. Is there a
> function with similar constraints to inodescan that I can use to speed up this
> process?

My suggestion is to do some development work in C to write a custom 
program to do it for you. That way you can hook into the GPFS API to 
leverage the fast file system scanning API. Take a look at the 
tsbackup.C file in the samples directory. Obviously this is going to 
require someone with appropriate coding skills to develop. On the other 
hand given it is a one off and input is strictly controlled so error 
checking is a one off, then couple hundred lines C tops.

My tip for this would be load the new UID's into a sparse array so you 
can just use the current UID to index into the array for the new UID, 
for speeding things up. It burns RAM but these days RAM is cheap and 
plentiful and speed is the major consideration here.

This should in theory be able to do this in a few hours with this technique.

One thing to bear in mind is that once the UID change is complete you 
will have to backup the entire file system again.


JAB.

-- 
Jonathan A. Buzzard                 Email: jonathan (at) buzzard.me.uk
Fife, United Kingdom.


From ilan84 at gmail.com  Tue Jul  4 09:16:43 2017
From: ilan84 at gmail.com (Ilan Schwarts)
Date: Tue, 4 Jul 2017 11:16:43 +0300
Subject: [gpfsug-discuss] Fail to mount file system
Message-ID: <CAJUuSvEu0xw-tby+7CT8=v7fey5HpXqDuFacxupBUqH4eqx0Xw@mail.gmail.com>

Hi everyone, I have received existing corrupted GPFS 4.2.2 lab and I
am trying to make it work.
There are 2 nodes in a cluster:
[root at LH20-GPFS1 ~]# mmgetstate -a

 Node number  Node name        GPFS state
------------------------------------------
       1      LH20-GPFS1       active
       3      LH20-GPFS2       active

The Cluster status is:
[root at LH20-GPFS1 ~]# mmlscluster

GPFS cluster information
========================
  GPFS cluster name:         MyCluster.LH20-GPFS2
  GPFS cluster id:           10777108240438931454
  GPFS UID domain:           MyCluster.LH20-GPFS2
  Remote shell command:      /usr/bin/ssh
  Remote file copy command:  /usr/bin/scp
  Repository type:           CCR

 Node  Daemon node name  IP address    Admin node name  Designation
--------------------------------------------------------------------
   1   LH20-GPFS1        10.10.158.61  LH20-GPFS1       quorum-manager
   3   LH20-GPFS2        10.10.158.62  LH20-GPFS2

There is a file system:
[root at LH20-GPFS1 ~]# mmlsnsd

 File system   Disk name    NSD servers
---------------------------------------------------------------------------
 fs_gpfs01     nynsd1       (directly attached)
 fs_gpfs01     nynsd2       (directly attached)

[root at LH20-GPFS1 ~]#

On each Node, There is folder /fs_gpfs01
The next step is to mount this fs_gpfs01 to be synced between the 2 nodes.
Whilte executing mmmount i get exception:
[root at LH20-GPFS1 ~]# mmmount /fs_gpfs01
Tue Jul  4 11:14:18 IDT 2017: mmmount: Mounting file systems ...
mount: mount fs_gpfs01 on /fs_gpfs01 failed: Wrong medium type
mmmount: Command failed. Examine previous error messages to determine cause.


What am i doing wrong ?


From scale at us.ibm.com  Tue Jul  4 09:36:43 2017
From: scale at us.ibm.com (IBM Spectrum Scale)
Date: Tue, 4 Jul 2017 14:06:43 +0530
Subject: [gpfsug-discuss] Fail to mount file system
In-Reply-To: <CAJUuSvEu0xw-tby+7CT8=v7fey5HpXqDuFacxupBUqH4eqx0Xw@mail.gmail.com>
References: <CAJUuSvEu0xw-tby+7CT8=v7fey5HpXqDuFacxupBUqH4eqx0Xw@mail.gmail.com>
Message-ID: <OF784B68C1.5583C4A9-ON85258153.002DC7F6-65258153.002F463B@notes.na.collabserv.com>

What exactly do you mean by "I have received existing corrupted GPFS 4.2.2 
lab"?
Is the file system corrupted ? Maybe this error is then due to file system 
corruption.

Can you once try: mmmount fs_gpfs01 -a
If this does not work then try: mmmount -o rs fs_gpfs01

Let me know which mount is working.

Regards, The Spectrum Scale (GPFS) team

------------------------------------------------------------------------------------------------------------------
If you feel that your question can benefit other users of  Spectrum Scale 
(GPFS), then please post it to the public IBM developerWroks Forum at 
https://www.ibm.com/developerworks/community/forums/html/forum?id=11111111-0000-0000-0000-000000000479
. 

If your query concerns a potential software error in Spectrum Scale (GPFS) 
and you have an IBM software maintenance contract please contact 
1-800-237-5511 in the United States or your local IBM Service Center in 
other countries. 

The forum is informally monitored as time permits and should not be used 
for priority messages to the Spectrum Scale (GPFS) team.


From:   Ilan Schwarts <ilan84 at gmail.com>
To:     gpfsug-discuss at spectrumscale.org
Date:   07/04/2017 01:47 PM
Subject:        [gpfsug-discuss] Fail to mount file system
Sent by:        gpfsug-discuss-bounces at spectrumscale.org


Hi everyone, I have received existing corrupted GPFS 4.2.2 lab and I
am trying to make it work.
There are 2 nodes in a cluster:
[root at LH20-GPFS1 ~]# mmgetstate -a

 Node number  Node name        GPFS state
------------------------------------------
       1      LH20-GPFS1       active
       3      LH20-GPFS2       active

The Cluster status is:
[root at LH20-GPFS1 ~]# mmlscluster

GPFS cluster information
========================
  GPFS cluster name:         MyCluster.LH20-GPFS2
  GPFS cluster id:           10777108240438931454
  GPFS UID domain:           MyCluster.LH20-GPFS2
  Remote shell command:      /usr/bin/ssh
  Remote file copy command:  /usr/bin/scp
  Repository type:           CCR

 Node  Daemon node name  IP address    Admin node name  Designation
--------------------------------------------------------------------
   1   LH20-GPFS1        10.10.158.61  LH20-GPFS1       quorum-manager
   3   LH20-GPFS2        10.10.158.62  LH20-GPFS2

There is a file system:
[root at LH20-GPFS1 ~]# mmlsnsd

 File system   Disk name    NSD servers
---------------------------------------------------------------------------
 fs_gpfs01     nynsd1       (directly attached)
 fs_gpfs01     nynsd2       (directly attached)

[root at LH20-GPFS1 ~]#

On each Node, There is folder /fs_gpfs01
The next step is to mount this fs_gpfs01 to be synced between the 2 nodes.
Whilte executing mmmount i get exception:
[root at LH20-GPFS1 ~]# mmmount /fs_gpfs01
Tue Jul  4 11:14:18 IDT 2017: mmmount: Mounting file systems ...
mount: mount fs_gpfs01 on /fs_gpfs01 failed: Wrong medium type
mmmount: Command failed. Examine previous error messages to determine 
cause.


What am i doing wrong ?
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20170704/fbe6e420/attachment.htm>

From ilan84 at gmail.com  Tue Jul  4 09:38:28 2017
From: ilan84 at gmail.com (Ilan Schwarts)
Date: Tue, 4 Jul 2017 11:38:28 +0300
Subject: [gpfsug-discuss] Fail to mount file system
In-Reply-To: <OF784B68C1.5583C4A9-ON85258153.002DC7F6-65258153.002F463B@notes.na.collabserv.com>
References: <CAJUuSvEu0xw-tby+7CT8=v7fey5HpXqDuFacxupBUqH4eqx0Xw@mail.gmail.com>
	<OF784B68C1.5583C4A9-ON85258153.002DC7F6-65258153.002F463B@notes.na.collabserv.com>
Message-ID: <CAJUuSvEpbZgQde7ZW1hibGuiPOR_AMBtHEr28L8uCqfy1w=bBw@mail.gmail.com>

I mean the person tried to configure it... didnt do good job so now its me
to continue
On Jul 4, 2017 11:37, "IBM Spectrum Scale" <scale at us.ibm.com> wrote:

> What exactly do you mean by "I have received existing corrupted GPFS
> 4.2.2 lab"?
> Is the file system corrupted ? Maybe this error is then due to file system
> corruption.
>
> Can you once try: mmmount fs_gpfs01 -a
> If this does not work then try: mmmount -o rs fs_gpfs01
>
> Let me know which mount is working.
>
> Regards, The Spectrum Scale (GPFS) team
>
> ------------------------------------------------------------
> ------------------------------------------------------
> If you feel that your question can benefit other users of  Spectrum Scale
> (GPFS), then please post it to the public IBM developerWroks Forum at
> https://www.ibm.com/developerworks/community/
> forums/html/forum?id=11111111-0000-0000-0000-000000000479.
>
> If your query concerns a potential software error in Spectrum Scale (GPFS)
> and you have an IBM software maintenance contract please contact
>  1-800-237-5511 in the United States or your local IBM Service Center in
> other countries.
>
> The forum is informally monitored as time permits and should not be used
> for priority messages to the Spectrum Scale (GPFS) team.
>
>
>
> From:        Ilan Schwarts <ilan84 at gmail.com>
> To:        gpfsug-discuss at spectrumscale.org
> Date:        07/04/2017 01:47 PM
> Subject:        [gpfsug-discuss] Fail to mount file system
> Sent by:        gpfsug-discuss-bounces at spectrumscale.org
> ------------------------------
>
>
>
> Hi everyone, I have received existing corrupted GPFS 4.2.2 lab and I
> am trying to make it work.
> There are 2 nodes in a cluster:
> [root at LH20-GPFS1 ~]# mmgetstate -a
>
> Node number  Node name        GPFS state
> ------------------------------------------
>       1      LH20-GPFS1       active
>       3      LH20-GPFS2       active
>
> The Cluster status is:
> [root at LH20-GPFS1 ~]# mmlscluster
>
> GPFS cluster information
> ========================
>  GPFS cluster name:         MyCluster.LH20-GPFS2
>  GPFS cluster id:           10777108240438931454
>  GPFS UID domain:           MyCluster.LH20-GPFS2
>  Remote shell command:      /usr/bin/ssh
>  Remote file copy command:  /usr/bin/scp
>  Repository type:           CCR
>
> Node  Daemon node name  IP address    Admin node name  Designation
> --------------------------------------------------------------------
>   1   LH20-GPFS1        10.10.158.61  LH20-GPFS1       quorum-manager
>   3   LH20-GPFS2        10.10.158.62  LH20-GPFS2
>
> There is a file system:
> [root at LH20-GPFS1 ~]# mmlsnsd
>
> File system   Disk name    NSD servers
> ------------------------------------------------------------
> ---------------
> fs_gpfs01     nynsd1       (directly attached)
> fs_gpfs01     nynsd2       (directly attached)
>
> [root at LH20-GPFS1 ~]#
>
> On each Node, There is folder /fs_gpfs01
> The next step is to mount this fs_gpfs01 to be synced between the 2 nodes.
> Whilte executing mmmount i get exception:
> [root at LH20-GPFS1 ~]# mmmount /fs_gpfs01
> Tue Jul  4 11:14:18 IDT 2017: mmmount: Mounting file systems ...
> mount: mount fs_gpfs01 on /fs_gpfs01 failed: Wrong medium type
> mmmount: Command failed. Examine previous error messages to determine
> cause.
>
>
> What am i doing wrong ?
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>
>
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20170704/9ea80fed/attachment.htm>

From r.sobey at imperial.ac.uk  Tue Jul  4 11:54:52 2017
From: r.sobey at imperial.ac.uk (Sobey, Richard A)
Date: Tue, 4 Jul 2017 10:54:52 +0000
Subject: [gpfsug-discuss] Requirement to keep gpfs.smb the same on all nodes
Message-ID: <VI1PR0602MB3229C27F0E58832F85EAED1ADFD70@VI1PR0602MB3229.eurprd06.prod.outlook.com>

Hi all,

For how long has this requirement been in force, and why?

https://www.ibm.com/support/knowledgecenter/en/STXKQY_4.2.3/com.ibm.spectrum.scale.v4r23.doc/bl1ins_updatingsmb.htm

All protocol nodes running the SMB service must have the same version of gpfs.smb installed at any time. This requires a brief outage of the SMB service to upgrade gpfs.smb to the newer version across all protocol nodes. The procedure outlined here is intended to reduce the outage to a minimum.

Previously I've upgraded nodes one at a time over the course of a few days.

Is the impact just that we won't be supported, or will a hole open up beneath my feet and swallow me whole?

I really don't fancy the headache of getting approvals to get an outage of even 5 minutes at 6am....

Cheers
Richard

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20170704/f39e1a58/attachment.htm>

From ilan84 at gmail.com  Tue Jul  4 11:56:20 2017
From: ilan84 at gmail.com (Ilan Schwarts)
Date: Tue, 4 Jul 2017 13:56:20 +0300
Subject: [gpfsug-discuss] Fail to mount file system
In-Reply-To: <OF784B68C1.5583C4A9-ON85258153.002DC7F6-65258153.002F463B@notes.na.collabserv.com>
References: <CAJUuSvEu0xw-tby+7CT8=v7fey5HpXqDuFacxupBUqH4eqx0Xw@mail.gmail.com>
	<OF784B68C1.5583C4A9-ON85258153.002DC7F6-65258153.002F463B@notes.na.collabserv.com>
Message-ID: <CAJUuSvEAr41oB8ZSUNfViQzw18jhBP8TnSd+_CJ5WieW461X+Q@mail.gmail.com>

[root at LH20-GPFS1 ~]# mmmount fs_gpfs01 -a
Tue Jul  4 13:52:07 IDT 2017: mmmount: Mounting file systems ...
LH20-GPFS1:  mount: mount fs_gpfs01 on /fs_gpfs01 failed: Wrong medium type
mmdsh: LH20-GPFS1 remote shell process had return code 32.
LH20-GPFS2:  mount: mount fs_gpfs01 on /fs_gpfs01 failed: Stale file handle
mmdsh: LH20-GPFS2 remote shell process had return code 32.
mmmount: Command failed. Examine previous error messages to determine cause.

[root at LH20-GPFS1 ~]# mmmount -o rs /fs_gpfs01
mmmount: Mount point can not be a relative path name: rs
[root at LH20-GPFS1 ~]# mmmount -o rs fs_gpfs01
mmmount: Mount point can not be a relative path name: rs


I recieve in "dmesg":

[   18.338044] sd 2:0:0:1: [sdc] Attached SCSI disk
[  141.363422] hvt_cn_callback: unexpected netlink message!
[  141.366153] hvt_cn_callback: unexpected netlink message!
[ 4479.292850] tracedev: loading out-of-tree module taints kernel.
[ 4479.292888] tracedev: module verification failed: signature and/or
required key missing - tainting kernel
[ 4482.928413] ------------[ cut here ]------------
[ 4482.928445] WARNING: at fs/xfs/xfs_aops.c:906
xfs_do_writepage+0x537/0x550 [xfs]()
[ 4482.928446] Modules linked in: mmfs26(OE) mmfslinux(OE)
tracedev(OE) iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi ext4
mbcache jbd2 loop intel_powerclamp iosf_mbi sg pcspkr hv_utils
i2c_piix4 i2c_core nfsd auth_rpcgss nfs_acl lockd grace sunrpc
binfmt_misc ip_tables xfs libcrc32c sd_mod crc_t10dif
crct10dif_generic crct10dif_common ata_generic pata_acpi hv_netvsc
hyperv_keyboard hid_hyperv hv_storvsc hyperv_fb serio_raw fjes floppy
libata hv_vmbus dm_mirror dm_region_hash dm_log dm_mod
[ 4482.928471] CPU: 1 PID: 15210 Comm: mmfsd Tainted: G           OE
------------   3.10.0-514.21.2.el7.x86_64 #1

On Tue, Jul 4, 2017 at 11:36 AM, IBM Spectrum Scale <scale at us.ibm.com> wrote:
> What exactly do you mean by "I have received existing corrupted GPFS 4.2.2
> lab"?
> Is the file system corrupted ? Maybe this error is then due to file system
> corruption.
>
> Can you once try: mmmount fs_gpfs01 -a
> If this does not work then try: mmmount -o rs fs_gpfs01
>
> Let me know which mount is working.
>
> Regards, The Spectrum Scale (GPFS) team
>
> ------------------------------------------------------------------------------------------------------------------
> If you feel that your question can benefit other users of  Spectrum Scale
> (GPFS), then please post it to the public IBM developerWroks Forum at
> https://www.ibm.com/developerworks/community/forums/html/forum?id=11111111-0000-0000-0000-000000000479.
>
> If your query concerns a potential software error in Spectrum Scale (GPFS)
> and you have an IBM software maintenance contract please contact
> 1-800-237-5511 in the United States or your local IBM Service Center in
> other countries.
>
> The forum is informally monitored as time permits and should not be used for
> priority messages to the Spectrum Scale (GPFS) team.
>
>
>
> From:        Ilan Schwarts <ilan84 at gmail.com>
> To:        gpfsug-discuss at spectrumscale.org
> Date:        07/04/2017 01:47 PM
> Subject:        [gpfsug-discuss] Fail to mount file system
> Sent by:        gpfsug-discuss-bounces at spectrumscale.org
> ________________________________
>
>
>
> Hi everyone, I have received existing corrupted GPFS 4.2.2 lab and I
> am trying to make it work.
> There are 2 nodes in a cluster:
> [root at LH20-GPFS1 ~]# mmgetstate -a
>
> Node number  Node name        GPFS state
> ------------------------------------------
>       1      LH20-GPFS1       active
>       3      LH20-GPFS2       active
>
> The Cluster status is:
> [root at LH20-GPFS1 ~]# mmlscluster
>
> GPFS cluster information
> ========================
>  GPFS cluster name:         MyCluster.LH20-GPFS2
>  GPFS cluster id:           10777108240438931454
>  GPFS UID domain:           MyCluster.LH20-GPFS2
>  Remote shell command:      /usr/bin/ssh
>  Remote file copy command:  /usr/bin/scp
>  Repository type:           CCR
>
> Node  Daemon node name  IP address    Admin node name  Designation
> --------------------------------------------------------------------
>   1   LH20-GPFS1        10.10.158.61  LH20-GPFS1       quorum-manager
>   3   LH20-GPFS2        10.10.158.62  LH20-GPFS2
>
> There is a file system:
> [root at LH20-GPFS1 ~]# mmlsnsd
>
> File system   Disk name    NSD servers
> ---------------------------------------------------------------------------
> fs_gpfs01     nynsd1       (directly attached)
> fs_gpfs01     nynsd2       (directly attached)
>
> [root at LH20-GPFS1 ~]#
>
> On each Node, There is folder /fs_gpfs01
> The next step is to mount this fs_gpfs01 to be synced between the 2 nodes.
> Whilte executing mmmount i get exception:
> [root at LH20-GPFS1 ~]# mmmount /fs_gpfs01
> Tue Jul  4 11:14:18 IDT 2017: mmmount: Mounting file systems ...
> mount: mount fs_gpfs01 on /fs_gpfs01 failed: Wrong medium type
> mmmount: Command failed. Examine previous error messages to determine cause.
>
>
> What am i doing wrong ?
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>
>
>
>


-- 


-
Ilan Schwarts


From S.J.Thompson at bham.ac.uk  Tue Jul  4 12:09:18 2017
From: S.J.Thompson at bham.ac.uk (Simon Thompson (IT Research Support))
Date: Tue, 4 Jul 2017 11:09:18 +0000
Subject: [gpfsug-discuss] Requirement to keep gpfs.smb the same on all
 nodes
Message-ID: <D581357C.3F1B0%s.j.thompson@bham.ac.uk>

AFAIK. Always.

We have had the service eat itself BTW by having different code releases and trying this.

Yes its a PITA that we have to get a change approval for it (so we don't do it as often as we should)...

The upgrade process upgrades the SMB registry, we have also seen the CTDB lock stuff break when they are not running the same code release, so now we just don't do this.

Simon

From: <gpfsug-discuss-bounces at spectrumscale.org<mailto:gpfsug-discuss-bounces at spectrumscale.org>> on behalf of "Sobey, Richard A" <r.sobey at imperial.ac.uk<mailto:r.sobey at imperial.ac.uk>>
Reply-To: "gpfsug-discuss at spectrumscale.org<mailto:gpfsug-discuss at spectrumscale.org>" <gpfsug-discuss at spectrumscale.org<mailto:gpfsug-discuss at spectrumscale.org>>
Date: Tuesday, 4 July 2017 at 11:54
To: "gpfsug-discuss at spectrumscale.org<mailto:gpfsug-discuss at spectrumscale.org>" <gpfsug-discuss at spectrumscale.org<mailto:gpfsug-discuss at spectrumscale.org>>
Subject: [gpfsug-discuss] Requirement to keep gpfs.smb the same on all nodes

Hi all,

For how long has this requirement been in force, and why?

https://www.ibm.com/support/knowledgecenter/en/STXKQY_4.2.3/com.ibm.spectrum.scale.v4r23.doc/bl1ins_updatingsmb.htm

All protocol nodes running the SMB service must have the same version of gpfs.smb installed at any time. This requires a brief outage of the SMB service to upgrade gpfs.smb to the newer version across all protocol nodes. The procedure outlined here is intended to reduce the outage to a minimum.

Previously I?ve upgraded nodes one at a time over the course of a few days.

Is the impact just that we won?t be supported, or will a hole open up beneath my feet and swallow me whole?

I really don?t fancy the headache of getting approvals to get an outage of even 5 minutes at 6am?.

Cheers
Richard

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20170704/ea91faa1/attachment.htm>

From r.sobey at imperial.ac.uk  Tue Jul  4 12:12:10 2017
From: r.sobey at imperial.ac.uk (Sobey, Richard A)
Date: Tue, 4 Jul 2017 11:12:10 +0000
Subject: [gpfsug-discuss] Requirement to keep gpfs.smb the same on all
 nodes
In-Reply-To: <D581357C.3F1B0%s.j.thompson@bham.ac.uk>
References: <D581357C.3F1B0%s.j.thompson@bham.ac.uk>
Message-ID: <VI1PR0602MB322977D4A4CE40B89C164ED4DFD70@VI1PR0602MB3229.eurprd06.prod.outlook.com>

OK Simon, thanks. I suppose we're all in the same boat having to get change management approval etc!

From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Simon Thompson (IT Research Support)
Sent: 04 July 2017 12:09
To: gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
Subject: Re: [gpfsug-discuss] Requirement to keep gpfs.smb the same on all nodes

AFAIK. Always.

We have had the service eat itself BTW by having different code releases and trying this.

Yes its a PITA that we have to get a change approval for it (so we don't do it as often as we should)...

The upgrade process upgrades the SMB registry, we have also seen the CTDB lock stuff break when they are not running the same code release, so now we just don't do this.

Simon

From: <gpfsug-discuss-bounces at spectrumscale.org<mailto:gpfsug-discuss-bounces at spectrumscale.org>> on behalf of "Sobey, Richard A" <r.sobey at imperial.ac.uk<mailto:r.sobey at imperial.ac.uk>>
Reply-To: "gpfsug-discuss at spectrumscale.org<mailto:gpfsug-discuss at spectrumscale.org>" <gpfsug-discuss at spectrumscale.org<mailto:gpfsug-discuss at spectrumscale.org>>
Date: Tuesday, 4 July 2017 at 11:54
To: "gpfsug-discuss at spectrumscale.org<mailto:gpfsug-discuss at spectrumscale.org>" <gpfsug-discuss at spectrumscale.org<mailto:gpfsug-discuss at spectrumscale.org>>
Subject: [gpfsug-discuss] Requirement to keep gpfs.smb the same on all nodes

Hi all,

For how long has this requirement been in force, and why?

https://www.ibm.com/support/knowledgecenter/en/STXKQY_4.2.3/com.ibm.spectrum.scale.v4r23.doc/bl1ins_updatingsmb.htm

All protocol nodes running the SMB service must have the same version of gpfs.smb installed at any time. This requires a brief outage of the SMB service to upgrade gpfs.smb to the newer version across all protocol nodes. The procedure outlined here is intended to reduce the outage to a minimum.

Previously I've upgraded nodes one at a time over the course of a few days.

Is the impact just that we won't be supported, or will a hole open up beneath my feet and swallow me whole?

I really don't fancy the headache of getting approvals to get an outage of even 5 minutes at 6am....

Cheers
Richard

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20170704/fcdd74a5/attachment.htm>

From scale at us.ibm.com  Tue Jul  4 17:28:07 2017
From: scale at us.ibm.com (IBM Spectrum Scale)
Date: Tue, 4 Jul 2017 21:58:07 +0530
Subject: [gpfsug-discuss] Fail to mount file system
In-Reply-To: <CAJUuSvEAr41oB8ZSUNfViQzw18jhBP8TnSd+_CJ5WieW461X+Q@mail.gmail.com>
References: <CAJUuSvEu0xw-tby+7CT8=v7fey5HpXqDuFacxupBUqH4eqx0Xw@mail.gmail.com>
	<OF784B68C1.5583C4A9-ON85258153.002DC7F6-65258153.002F463B@notes.na.collabserv.com>
	<CAJUuSvEAr41oB8ZSUNfViQzw18jhBP8TnSd+_CJ5WieW461X+Q@mail.gmail.com>
Message-ID: <OF03A7AC27.735C97D1-ON85258153.0059A011-65258153.005A6C81@notes.na.collabserv.com>

My bad gave the wrong command, the right one is: mmmount fs_gpfs01 -o rs

Also can you send output of mmlsnsd -X, need to check device type of the 
NSDs.

Are you ok with deleting the file system and disks and building everything 
from scratch?


Regards, The Spectrum Scale (GPFS) team

------------------------------------------------------------------------------------------------------------------
If you feel that your question can benefit other users of  Spectrum Scale 
(GPFS), then please post it to the public IBM developerWroks Forum at 
https://www.ibm.com/developerworks/community/forums/html/forum?id=11111111-0000-0000-0000-000000000479
. 

If your query concerns a potential software error in Spectrum Scale (GPFS) 
and you have an IBM software maintenance contract please contact 
1-800-237-5511 in the United States or your local IBM Service Center in 
other countries. 

The forum is informally monitored as time permits and should not be used 
for priority messages to the Spectrum Scale (GPFS) team.


From:   Ilan Schwarts <ilan84 at gmail.com>
To:     IBM Spectrum Scale <scale at us.ibm.com>
Cc:     gpfsug-discuss-bounces at spectrumscale.org, gpfsug main discussion 
list <gpfsug-discuss at spectrumscale.org>
Date:   07/04/2017 04:26 PM
Subject:        Re: [gpfsug-discuss] Fail to mount file system


[root at LH20-GPFS1 ~]# mmmount fs_gpfs01 -a
Tue Jul  4 13:52:07 IDT 2017: mmmount: Mounting file systems ...
LH20-GPFS1:  mount: mount fs_gpfs01 on /fs_gpfs01 failed: Wrong medium 
type
mmdsh: LH20-GPFS1 remote shell process had return code 32.
LH20-GPFS2:  mount: mount fs_gpfs01 on /fs_gpfs01 failed: Stale file 
handle
mmdsh: LH20-GPFS2 remote shell process had return code 32.
mmmount: Command failed. Examine previous error messages to determine 
cause.

[root at LH20-GPFS1 ~]# mmmount -o rs /fs_gpfs01
mmmount: Mount point can not be a relative path name: rs
[root at LH20-GPFS1 ~]# mmmount -o rs fs_gpfs01
mmmount: Mount point can not be a relative path name: rs


I recieve in "dmesg":

[   18.338044] sd 2:0:0:1: [sdc] Attached SCSI disk
[  141.363422] hvt_cn_callback: unexpected netlink message!
[  141.366153] hvt_cn_callback: unexpected netlink message!
[ 4479.292850] tracedev: loading out-of-tree module taints kernel.
[ 4479.292888] tracedev: module verification failed: signature and/or
required key missing - tainting kernel
[ 4482.928413] ------------[ cut here ]------------
[ 4482.928445] WARNING: at fs/xfs/xfs_aops.c:906
xfs_do_writepage+0x537/0x550 [xfs]()
[ 4482.928446] Modules linked in: mmfs26(OE) mmfslinux(OE)
tracedev(OE) iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi ext4
mbcache jbd2 loop intel_powerclamp iosf_mbi sg pcspkr hv_utils
i2c_piix4 i2c_core nfsd auth_rpcgss nfs_acl lockd grace sunrpc
binfmt_misc ip_tables xfs libcrc32c sd_mod crc_t10dif
crct10dif_generic crct10dif_common ata_generic pata_acpi hv_netvsc
hyperv_keyboard hid_hyperv hv_storvsc hyperv_fb serio_raw fjes floppy
libata hv_vmbus dm_mirror dm_region_hash dm_log dm_mod
[ 4482.928471] CPU: 1 PID: 15210 Comm: mmfsd Tainted: G           OE
------------   3.10.0-514.21.2.el7.x86_64 #1

On Tue, Jul 4, 2017 at 11:36 AM, IBM Spectrum Scale <scale at us.ibm.com> 
wrote:
> What exactly do you mean by "I have received existing corrupted GPFS 
4.2.2
> lab"?
> Is the file system corrupted ? Maybe this error is then due to file 
system
> corruption.
>
> Can you once try: mmmount fs_gpfs01 -a
> If this does not work then try: mmmount -o rs fs_gpfs01
>
> Let me know which mount is working.
>
> Regards, The Spectrum Scale (GPFS) team
>
> 
------------------------------------------------------------------------------------------------------------------
> If you feel that your question can benefit other users of  Spectrum 
Scale
> (GPFS), then please post it to the public IBM developerWroks Forum at
> 
https://www.ibm.com/developerworks/community/forums/html/forum?id=11111111-0000-0000-0000-000000000479
.
>
> If your query concerns a potential software error in Spectrum Scale 
(GPFS)
> and you have an IBM software maintenance contract please contact
> 1-800-237-5511 in the United States or your local IBM Service Center in
> other countries.
>
> The forum is informally monitored as time permits and should not be used 
for
> priority messages to the Spectrum Scale (GPFS) team.
>
>
>
> From:        Ilan Schwarts <ilan84 at gmail.com>
> To:        gpfsug-discuss at spectrumscale.org
> Date:        07/04/2017 01:47 PM
> Subject:        [gpfsug-discuss] Fail to mount file system
> Sent by:        gpfsug-discuss-bounces at spectrumscale.org
> ________________________________
>
>
>
> Hi everyone, I have received existing corrupted GPFS 4.2.2 lab and I
> am trying to make it work.
> There are 2 nodes in a cluster:
> [root at LH20-GPFS1 ~]# mmgetstate -a
>
> Node number  Node name        GPFS state
> ------------------------------------------
>       1      LH20-GPFS1       active
>       3      LH20-GPFS2       active
>
> The Cluster status is:
> [root at LH20-GPFS1 ~]# mmlscluster
>
> GPFS cluster information
> ========================
>  GPFS cluster name:         MyCluster.LH20-GPFS2
>  GPFS cluster id:           10777108240438931454
>  GPFS UID domain:           MyCluster.LH20-GPFS2
>  Remote shell command:      /usr/bin/ssh
>  Remote file copy command:  /usr/bin/scp
>  Repository type:           CCR
>
> Node  Daemon node name  IP address    Admin node name  Designation
> --------------------------------------------------------------------
>   1   LH20-GPFS1        10.10.158.61  LH20-GPFS1       quorum-manager
>   3   LH20-GPFS2        10.10.158.62  LH20-GPFS2
>
> There is a file system:
> [root at LH20-GPFS1 ~]# mmlsnsd
>
> File system   Disk name    NSD servers
> 
---------------------------------------------------------------------------
> fs_gpfs01     nynsd1       (directly attached)
> fs_gpfs01     nynsd2       (directly attached)
>
> [root at LH20-GPFS1 ~]#
>
> On each Node, There is folder /fs_gpfs01
> The next step is to mount this fs_gpfs01 to be synced between the 2 
nodes.
> Whilte executing mmmount i get exception:
> [root at LH20-GPFS1 ~]# mmmount /fs_gpfs01
> Tue Jul  4 11:14:18 IDT 2017: mmmount: Mounting file systems ...
> mount: mount fs_gpfs01 on /fs_gpfs01 failed: Wrong medium type
> mmmount: Command failed. Examine previous error messages to determine 
cause.
>
>
> What am i doing wrong ?
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>
>
>
>


-- 


-
Ilan Schwarts


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20170704/6bf691d9/attachment.htm>

From ilan84 at gmail.com  Tue Jul  4 17:46:17 2017
From: ilan84 at gmail.com (Ilan Schwarts)
Date: Tue, 4 Jul 2017 19:46:17 +0300
Subject: [gpfsug-discuss] Fail to mount file system
In-Reply-To: <OF03A7AC27.735C97D1-ON85258153.0059A011-65258153.005A6C81@notes.na.collabserv.com>
References: <CAJUuSvEu0xw-tby+7CT8=v7fey5HpXqDuFacxupBUqH4eqx0Xw@mail.gmail.com>
	<OF784B68C1.5583C4A9-ON85258153.002DC7F6-65258153.002F463B@notes.na.collabserv.com>
	<CAJUuSvEAr41oB8ZSUNfViQzw18jhBP8TnSd+_CJ5WieW461X+Q@mail.gmail.com>
	<OF03A7AC27.735C97D1-ON85258153.0059A011-65258153.005A6C81@notes.na.collabserv.com>
Message-ID: <CAJUuSvHx7GCi3oabg+XqvrSwr_5Ux8qf9jdGjNHS5Rx5CZEwqw@mail.gmail.com>

Yes I am ok with deleting. I follow a guide from john olsen at the ibm team
from tuscon.. but the guide had steps after the gpfs setup... Is there step
by step guide for gpfs cluster setup other than the one in the ibm site?
Thank
My bad gave the wrong command, the right one is: mmmount fs_gpfs01-o rs

Also can you send output of mmlsnsd -X, need to check device type of the
NSDs.

Are you ok with deleting the file system and disks and building everything
from scratch?


Regards, The Spectrum Scale (GPFS) team

------------------------------------------------------------
------------------------------------------------------
If you feel that your question can benefit other users of  Spectrum Scale
(GPFS), then please post it to the public IBM developerWroks Forum at
https://www.ibm.com/developerworks/community/forums/html/forum?id=11111111-
0000-0000-0000-000000000479.

If your query concerns a potential software error in Spectrum Scale (GPFS)
and you have an IBM software maintenance contract please contact
 1-800-237-5511 in the United States or your local IBM Service Center in
other countries.

The forum is informally monitored as time permits and should not be used
for priority messages to the Spectrum Scale (GPFS) team.


From:        Ilan Schwarts <ilan84 at gmail.com>
To:        IBM Spectrum Scale <scale at us.ibm.com>
Cc:        gpfsug-discuss-bounces at spectrumscale.org, gpfsug main discussion
list <gpfsug-discuss at spectrumscale.org>
Date:        07/04/2017 04:26 PM
Subject:        Re: [gpfsug-discuss] Fail to mount file system
------------------------------


[root at LH20-GPFS1 ~]# mmmount fs_gpfs01 -a
Tue Jul  4 13:52:07 IDT 2017: mmmount: Mounting file systems ...
LH20-GPFS1:  mount: mount fs_gpfs01 on /fs_gpfs01 failed: Wrong medium type
mmdsh: LH20-GPFS1 remote shell process had return code 32.
LH20-GPFS2:  mount: mount fs_gpfs01 on /fs_gpfs01 failed: Stale file handle
mmdsh: LH20-GPFS2 remote shell process had return code 32.
mmmount: Command failed. Examine previous error messages to determine cause.

[root at LH20-GPFS1 ~]# mmmount -o rs /fs_gpfs01
mmmount: Mount point can not be a relative path name: rs
[root at LH20-GPFS1 ~]# mmmount -o rs fs_gpfs01
mmmount: Mount point can not be a relative path name: rs


I recieve in "dmesg":

[   18.338044] sd 2:0:0:1: [sdc] Attached SCSI disk
[  141.363422] hvt_cn_callback: unexpected netlink message!
[  141.366153] hvt_cn_callback: unexpected netlink message!
[ 4479.292850] tracedev: loading out-of-tree module taints kernel.
[ 4479.292888] tracedev: module verification failed: signature and/or
required key missing - tainting kernel
[ 4482.928413] ------------[ cut here ]------------
[ 4482.928445] WARNING: at fs/xfs/xfs_aops.c:906
xfs_do_writepage+0x537/0x550 [xfs]()
[ 4482.928446] Modules linked in: mmfs26(OE) mmfslinux(OE)
tracedev(OE) iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi ext4
mbcache jbd2 loop intel_powerclamp iosf_mbi sg pcspkr hv_utils
i2c_piix4 i2c_core nfsd auth_rpcgss nfs_acl lockd grace sunrpc
binfmt_misc ip_tables xfs libcrc32c sd_mod crc_t10dif
crct10dif_generic crct10dif_common ata_generic pata_acpi hv_netvsc
hyperv_keyboard hid_hyperv hv_storvsc hyperv_fb serio_raw fjes floppy
libata hv_vmbus dm_mirror dm_region_hash dm_log dm_mod
[ 4482.928471] CPU: 1 PID: 15210 Comm: mmfsd Tainted: G           OE
------------   3.10.0-514.21.2.el7.x86_64 #1

On Tue, Jul 4, 2017 at 11:36 AM, IBM Spectrum Scale <scale at us.ibm.com>
wrote:
> What exactly do you mean by "I have received existing corrupted GPFS 4.2.2
> lab"?
> Is the file system corrupted ? Maybe this error is then due to file system
> corruption.
>
> Can you once try: mmmount fs_gpfs01 -a
> If this does not work then try: mmmount -o rs fs_gpfs01
>
> Let me know which mount is working.
>
> Regards, The Spectrum Scale (GPFS) team
>
> ------------------------------------------------------------
------------------------------------------------------
> If you feel that your question can benefit other users of  Spectrum Scale
> (GPFS), then please post it to the public IBM developerWroks Forum at
> https://www.ibm.com/developerworks/community/
forums/html/forum?id=11111111-0000-0000-0000-000000000479.
>
> If your query concerns a potential software error in Spectrum Scale (GPFS)
> and you have an IBM software maintenance contract please contact
> 1-800-237-5511 in the United States or your local IBM Service Center in
> other countries.
>
> The forum is informally monitored as time permits and should not be used
for
> priority messages to the Spectrum Scale (GPFS) team.
>
>
>
> From:        Ilan Schwarts <ilan84 at gmail.com>
> To:        gpfsug-discuss at spectrumscale.org
> Date:        07/04/2017 01:47 PM
> Subject:        [gpfsug-discuss] Fail to mount file system
> Sent by:        gpfsug-discuss-bounces at spectrumscale.org
> ________________________________
>
>
>
> Hi everyone, I have received existing corrupted GPFS 4.2.2 lab and I
> am trying to make it work.
> There are 2 nodes in a cluster:
> [root at LH20-GPFS1 ~]# mmgetstate -a
>
> Node number  Node name        GPFS state
> ------------------------------------------
>       1      LH20-GPFS1       active
>       3      LH20-GPFS2       active
>
> The Cluster status is:
> [root at LH20-GPFS1 ~]# mmlscluster
>
> GPFS cluster information
> ========================
>  GPFS cluster name:         MyCluster.LH20-GPFS2
>  GPFS cluster id:           10777108240438931454
>  GPFS UID domain:           MyCluster.LH20-GPFS2
>  Remote shell command:      /usr/bin/ssh
>  Remote file copy command:  /usr/bin/scp
>  Repository type:           CCR
>
> Node  Daemon node name  IP address    Admin node name  Designation
> --------------------------------------------------------------------
>   1   LH20-GPFS1        10.10.158.61  LH20-GPFS1       quorum-manager
>   3   LH20-GPFS2        10.10.158.62  LH20-GPFS2
>
> There is a file system:
> [root at LH20-GPFS1 ~]# mmlsnsd
>
> File system   Disk name    NSD servers
> ------------------------------------------------------------
---------------
> fs_gpfs01     nynsd1       (directly attached)
> fs_gpfs01     nynsd2       (directly attached)
>
> [root at LH20-GPFS1 ~]#
>
> On each Node, There is folder /fs_gpfs01
> The next step is to mount this fs_gpfs01 to be synced between the 2 nodes.
> Whilte executing mmmount i get exception:
> [root at LH20-GPFS1 ~]# mmmount /fs_gpfs01
> Tue Jul  4 11:14:18 IDT 2017: mmmount: Mounting file systems ...
> mount: mount fs_gpfs01 on /fs_gpfs01 failed: Wrong medium type
> mmmount: Command failed. Examine previous error messages to determine
cause.
>
>
> What am i doing wrong ?
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>
>
>
>


-- 


-
Ilan Schwarts
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20170704/08609b54/attachment.htm>

From jcatana at gmail.com  Tue Jul  4 17:47:09 2017
From: jcatana at gmail.com (Josh Catana)
Date: Tue, 4 Jul 2017 12:47:09 -0400
Subject: [gpfsug-discuss] Fail to mount file system
In-Reply-To: <OF03A7AC27.735C97D1-ON85258153.0059A011-65258153.005A6C81@notes.na.collabserv.com>
References: <CAJUuSvEu0xw-tby+7CT8=v7fey5HpXqDuFacxupBUqH4eqx0Xw@mail.gmail.com>
	<OF784B68C1.5583C4A9-ON85258153.002DC7F6-65258153.002F463B@notes.na.collabserv.com>
	<CAJUuSvEAr41oB8ZSUNfViQzw18jhBP8TnSd+_CJ5WieW461X+Q@mail.gmail.com>
	<OF03A7AC27.735C97D1-ON85258153.0059A011-65258153.005A6C81@notes.na.collabserv.com>
Message-ID: <CAJOKg0RwjxeEofTy2QK0r4NG4ShfQmzOsNLTW06uFZtc1Sxo1A@mail.gmail.com>

Check /var/adm/ras/mmfs.log.latest
The dmesg xfs bug is probably from boot if you look at the dmesg with -T to
show the timestamp

On Jul 4, 2017 12:29 PM, "IBM Spectrum Scale" <scale at us.ibm.com> wrote:

> My bad gave the wrong command, the right one is: mmmount fs_gpfs01-o rs
>
> Also can you send output of mmlsnsd -X, need to check device type of the
> NSDs.
>
> Are you ok with deleting the file system and disks and building everything
> from scratch?
>
>
> Regards, The Spectrum Scale (GPFS) team
>
> ------------------------------------------------------------
> ------------------------------------------------------
> If you feel that your question can benefit other users of  Spectrum Scale
> (GPFS), then please post it to the public IBM developerWroks Forum at
> https://www.ibm.com/developerworks/community/
> forums/html/forum?id=11111111-0000-0000-0000-000000000479.
>
> If your query concerns a potential software error in Spectrum Scale (GPFS)
> and you have an IBM software maintenance contract please contact
> 1-800-237-5511 <(800)%20237-5511> in the United States or your local IBM
> Service Center in other countries.
>
> The forum is informally monitored as time permits and should not be used
> for priority messages to the Spectrum Scale (GPFS) team.
>
>
>
> From:        Ilan Schwarts <ilan84 at gmail.com>
> To:        IBM Spectrum Scale <scale at us.ibm.com>
> Cc:        gpfsug-discuss-bounces at spectrumscale.org, gpfsug main
> discussion list <gpfsug-discuss at spectrumscale.org>
> Date:        07/04/2017 04:26 PM
> Subject:        Re: [gpfsug-discuss] Fail to mount file system
> ------------------------------
>
>
>
> [root at LH20-GPFS1 ~]# mmmount fs_gpfs01 -a
> Tue Jul  4 13:52:07 IDT 2017: mmmount: Mounting file systems ...
> LH20-GPFS1:  mount: mount fs_gpfs01 on /fs_gpfs01 failed: Wrong medium type
> mmdsh: LH20-GPFS1 remote shell process had return code 32.
> LH20-GPFS2:  mount: mount fs_gpfs01 on /fs_gpfs01 failed: Stale file handle
> mmdsh: LH20-GPFS2 remote shell process had return code 32.
> mmmount: Command failed. Examine previous error messages to determine
> cause.
>
> [root at LH20-GPFS1 ~]# mmmount -o rs /fs_gpfs01
> mmmount: Mount point can not be a relative path name: rs
> [root at LH20-GPFS1 ~]# mmmount -o rs fs_gpfs01
> mmmount: Mount point can not be a relative path name: rs
>
>
>
> I recieve in "dmesg":
>
> [   18.338044] sd 2:0:0:1: [sdc] Attached SCSI disk
> [  141.363422] hvt_cn_callback: unexpected netlink message!
> [  141.366153] hvt_cn_callback: unexpected netlink message!
> [ 4479.292850] tracedev: loading out-of-tree module taints kernel.
> [ 4479.292888] tracedev: module verification failed: signature and/or
> required key missing - tainting kernel
> [ 4482.928413] ------------[ cut here ]------------
> [ 4482.928445] WARNING: at fs/xfs/xfs_aops.c:906
> xfs_do_writepage+0x537/0x550 [xfs]()
> [ 4482.928446] Modules linked in: mmfs26(OE) mmfslinux(OE)
> tracedev(OE) iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi ext4
> mbcache jbd2 loop intel_powerclamp iosf_mbi sg pcspkr hv_utils
> i2c_piix4 i2c_core nfsd auth_rpcgss nfs_acl lockd grace sunrpc
> binfmt_misc ip_tables xfs libcrc32c sd_mod crc_t10dif
> crct10dif_generic crct10dif_common ata_generic pata_acpi hv_netvsc
> hyperv_keyboard hid_hyperv hv_storvsc hyperv_fb serio_raw fjes floppy
> libata hv_vmbus dm_mirror dm_region_hash dm_log dm_mod
> [ 4482.928471] CPU: 1 PID: 15210 Comm: mmfsd Tainted: G           OE
> ------------   3.10.0-514.21.2.el7.x86_64 #1
>
> On Tue, Jul 4, 2017 at 11:36 AM, IBM Spectrum Scale <scale at us.ibm.com>
> wrote:
> > What exactly do you mean by "I have received existing corrupted GPFS
> 4.2.2
> > lab"?
> > Is the file system corrupted ? Maybe this error is then due to file
> system
> > corruption.
> >
> > Can you once try: mmmount fs_gpfs01 -a
> > If this does not work then try: mmmount -o rs fs_gpfs01
> >
> > Let me know which mount is working.
> >
> > Regards, The Spectrum Scale (GPFS) team
> >
> > ------------------------------------------------------------
> ------------------------------------------------------
> > If you feel that your question can benefit other users of  Spectrum Scale
> > (GPFS), then please post it to the public IBM developerWroks Forum at
> > https://www.ibm.com/developerworks/community/
> forums/html/forum?id=11111111-0000-0000-0000-000000000479.
> >
> > If your query concerns a potential software error in Spectrum Scale
> (GPFS)
> > and you have an IBM software maintenance contract please contact
> > 1-800-237-5511 <(800)%20237-5511> in the United States or your local
> IBM Service Center in
> > other countries.
> >
> > The forum is informally monitored as time permits and should not be used
> for
> > priority messages to the Spectrum Scale (GPFS) team.
> >
> >
> >
> > From:        Ilan Schwarts <ilan84 at gmail.com>
> > To:        gpfsug-discuss at spectrumscale.org
> > Date:        07/04/2017 01:47 PM
> > Subject:        [gpfsug-discuss] Fail to mount file system
> > Sent by:        gpfsug-discuss-bounces at spectrumscale.org
> > ________________________________
> >
> >
> >
> > Hi everyone, I have received existing corrupted GPFS 4.2.2 lab and I
> > am trying to make it work.
> > There are 2 nodes in a cluster:
> > [root at LH20-GPFS1 ~]# mmgetstate -a
> >
> > Node number  Node name        GPFS state
> > ------------------------------------------
> >       1      LH20-GPFS1       active
> >       3      LH20-GPFS2       active
> >
> > The Cluster status is:
> > [root at LH20-GPFS1 ~]# mmlscluster
> >
> > GPFS cluster information
> > ========================
> >  GPFS cluster name:         MyCluster.LH20-GPFS2
> >  GPFS cluster id:           10777108240438931454
> >  GPFS UID domain:           MyCluster.LH20-GPFS2
> >  Remote shell command:      /usr/bin/ssh
> >  Remote file copy command:  /usr/bin/scp
> >  Repository type:           CCR
> >
> > Node  Daemon node name  IP address    Admin node name  Designation
> > --------------------------------------------------------------------
> >   1   LH20-GPFS1        10.10.158.61  LH20-GPFS1       quorum-manager
> >   3   LH20-GPFS2        10.10.158.62  LH20-GPFS2
> >
> > There is a file system:
> > [root at LH20-GPFS1 ~]# mmlsnsd
> >
> > File system   Disk name    NSD servers
> > ------------------------------------------------------------
> ---------------
> > fs_gpfs01     nynsd1       (directly attached)
> > fs_gpfs01     nynsd2       (directly attached)
> >
> > [root at LH20-GPFS1 ~]#
> >
> > On each Node, There is folder /fs_gpfs01
> > The next step is to mount this fs_gpfs01 to be synced between the 2
> nodes.
> > Whilte executing mmmount i get exception:
> > [root at LH20-GPFS1 ~]# mmmount /fs_gpfs01
> > Tue Jul  4 11:14:18 IDT 2017: mmmount: Mounting file systems ...
> > mount: mount fs_gpfs01 on /fs_gpfs01 failed: Wrong medium type
> > mmmount: Command failed. Examine previous error messages to determine
> cause.
> >
> >
> > What am i doing wrong ?
> > _______________________________________________
> > gpfsug-discuss mailing list
> > gpfsug-discuss at spectrumscale.org
> > http://gpfsug.org/mailman/listinfo/gpfsug-discuss
> >
> >
> >
> >
>
>
>
> --
>
>
> -
> Ilan Schwarts
>
>
>
>
>
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20170704/3795de79/attachment.htm>

From scale at us.ibm.com  Tue Jul  4 19:15:49 2017
From: scale at us.ibm.com (IBM Spectrum Scale)
Date: Tue, 4 Jul 2017 23:45:49 +0530
Subject: [gpfsug-discuss] Fail to mount file system
In-Reply-To: <CAJUuSvHx7GCi3oabg+XqvrSwr_5Ux8qf9jdGjNHS5Rx5CZEwqw@mail.gmail.com>
References: <CAJUuSvEu0xw-tby+7CT8=v7fey5HpXqDuFacxupBUqH4eqx0Xw@mail.gmail.com>
	<OF784B68C1.5583C4A9-ON85258153.002DC7F6-65258153.002F463B@notes.na.collabserv.com>
	<CAJUuSvEAr41oB8ZSUNfViQzw18jhBP8TnSd+_CJ5WieW461X+Q@mail.gmail.com>
	<OF03A7AC27.735C97D1-ON85258153.0059A011-65258153.005A6C81@notes.na.collabserv.com>
	<CAJUuSvHx7GCi3oabg+XqvrSwr_5Ux8qf9jdGjNHS5Rx5CZEwqw@mail.gmail.com>
Message-ID: <OF3BE98366.CFEC8093-ON85258153.0062D382-65258153.006448CB@notes.na.collabserv.com>

You can refer to the concepts, planning and installation guide at the link 
(
https://www.ibm.com/support/knowledgecenter/STXKQY_4.2.2/com.ibm.spectrum.scale.v4r22.doc/bl1xx_library_prodoc.htm
) for
finding detailed steps on setting up a cluster or creating a file system. 
Or open a PMR and work with IBM support to set it up.

In your case (just as an example) you can use the below simple steps to 
delete and recreate the file system:
1) To delete file system and NSDs:
 a) Unmount file system - mmumount <fs> -a
 b) Delete file system  - mmdelfs <fs>
 c) Delete NSDs  - mmdelnsd "nynsd1;nynsd2"

2) To create file system with both disks in one system pool and having 
dataAndMetadata and data and metadata replica and directly attached to the 
nodes, you can use following steps:
 a) Create a /tmp/nsd file and fill it up with below information
    <Disk name>:::dataAndMetadata:1:nynsd1:system
    <Disk name>:::dataAndMetadata:2:nynsd2:system
 b) Use mmcrnsd -F /tmp/nsd to create NSDs
 c) Create file system using (just an example with assumptions on config) 
- mmcrfs /dev/fs_gpfs01 -F /tmp/nsd -A yes -B 256K -n 32 -m 2 -r 2 -T 
/fs_gpfs01

You can refer to above guide for configuring it in other ways as you want. 
If you have any issues with these steps you can raise PMR and follow 
proper channel to setup file system as well.

Regards, The Spectrum Scale (GPFS) team

------------------------------------------------------------------------------------------------------------------
If you feel that your question can benefit other users of  Spectrum Scale 
(GPFS), then please post it to the public IBM developerWroks Forum at 
https://www.ibm.com/developerworks/community/forums/html/forum?id=11111111-0000-0000-0000-000000000479
. 

If your query concerns a potential software error in Spectrum Scale (GPFS) 
and you have an IBM software maintenance contract please contact 
1-800-237-5511 in the United States or your local IBM Service Center in 
other countries. 

The forum is informally monitored as time permits and should not be used 
for priority messages to the Spectrum Scale (GPFS) team.


From:   Ilan Schwarts <ilan84 at gmail.com>
To:     IBM Spectrum Scale <scale at us.ibm.com>
Cc:     gpfsug-discuss-bounces at spectrumscale.org, gpfsug main discussion 
list <gpfsug-discuss at spectrumscale.org>
Date:   07/04/2017 10:16 PM
Subject:        Re: [gpfsug-discuss] Fail to mount file system


Yes I am ok with deleting. I follow a guide from john olsen at the ibm 
team from tuscon.. but the guide had steps after the gpfs setup... Is 
there step by step guide for gpfs cluster setup other than the one in the 
ibm site? Thank
My bad gave the wrong command, the right one is: mmmount fs_gpfs01-o rs

Also can you send output of mmlsnsd -X, need to check device type of the 
NSDs.

Are you ok with deleting the file system and disks and building everything 
from scratch?


Regards, The Spectrum Scale (GPFS) team

------------------------------------------------------------------------------------------------------------------
If you feel that your question can benefit other users of  Spectrum Scale 
(GPFS), then please post it to the public IBM developerWroks Forum at 
https://www.ibm.com/developerworks/community/forums/html/forum?id=11111111-0000-0000-0000-000000000479
. 

If your query concerns a potential software error in Spectrum Scale (GPFS) 
and you have an IBM software maintenance contract please contact 
 1-800-237-5511 in the United States or your local IBM Service Center in 
other countries. 

The forum is informally monitored as time permits and should not be used 
for priority messages to the Spectrum Scale (GPFS) team.


From:        Ilan Schwarts <ilan84 at gmail.com>
To:        IBM Spectrum Scale <scale at us.ibm.com>
Cc:        gpfsug-discuss-bounces at spectrumscale.org, gpfsug main 
discussion list <gpfsug-discuss at spectrumscale.org>
Date:        07/04/2017 04:26 PM
Subject:        Re: [gpfsug-discuss] Fail to mount file system


[root at LH20-GPFS1 ~]# mmmount fs_gpfs01 -a
Tue Jul  4 13:52:07 IDT 2017: mmmount: Mounting file systems ...
LH20-GPFS1:  mount: mount fs_gpfs01 on /fs_gpfs01 failed: Wrong medium 
type
mmdsh: LH20-GPFS1 remote shell process had return code 32.
LH20-GPFS2:  mount: mount fs_gpfs01 on /fs_gpfs01 failed: Stale file 
handle
mmdsh: LH20-GPFS2 remote shell process had return code 32.
mmmount: Command failed. Examine previous error messages to determine 
cause.

[root at LH20-GPFS1 ~]# mmmount -o rs /fs_gpfs01
mmmount: Mount point can not be a relative path name: rs
[root at LH20-GPFS1 ~]# mmmount -o rs fs_gpfs01
mmmount: Mount point can not be a relative path name: rs


I recieve in "dmesg":

[   18.338044] sd 2:0:0:1: [sdc] Attached SCSI disk
[  141.363422] hvt_cn_callback: unexpected netlink message!
[  141.366153] hvt_cn_callback: unexpected netlink message!
[ 4479.292850] tracedev: loading out-of-tree module taints kernel.
[ 4479.292888] tracedev: module verification failed: signature and/or
required key missing - tainting kernel
[ 4482.928413] ------------[ cut here ]------------
[ 4482.928445] WARNING: at fs/xfs/xfs_aops.c:906
xfs_do_writepage+0x537/0x550 [xfs]()
[ 4482.928446] Modules linked in: mmfs26(OE) mmfslinux(OE)
tracedev(OE) iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi ext4
mbcache jbd2 loop intel_powerclamp iosf_mbi sg pcspkr hv_utils
i2c_piix4 i2c_core nfsd auth_rpcgss nfs_acl lockd grace sunrpc
binfmt_misc ip_tables xfs libcrc32c sd_mod crc_t10dif
crct10dif_generic crct10dif_common ata_generic pata_acpi hv_netvsc
hyperv_keyboard hid_hyperv hv_storvsc hyperv_fb serio_raw fjes floppy
libata hv_vmbus dm_mirror dm_region_hash dm_log dm_mod
[ 4482.928471] CPU: 1 PID: 15210 Comm: mmfsd Tainted: G           OE
------------   3.10.0-514.21.2.el7.x86_64 #1

On Tue, Jul 4, 2017 at 11:36 AM, IBM Spectrum Scale <scale at us.ibm.com> 
wrote:
> What exactly do you mean by "I have received existing corrupted GPFS 
4.2.2
> lab"?
> Is the file system corrupted ? Maybe this error is then due to file 
system
> corruption.
>
> Can you once try: mmmount fs_gpfs01 -a
> If this does not work then try: mmmount -o rs fs_gpfs01
>
> Let me know which mount is working.
>
> Regards, The Spectrum Scale (GPFS) team
>
> 
------------------------------------------------------------------------------------------------------------------
> If you feel that your question can benefit other users of  Spectrum 
Scale
> (GPFS), then please post it to the public IBM developerWroks Forum at
> 
https://www.ibm.com/developerworks/community/forums/html/forum?id=11111111-0000-0000-0000-000000000479
.
>
> If your query concerns a potential software error in Spectrum Scale 
(GPFS)
> and you have an IBM software maintenance contract please contact
> 1-800-237-5511 in the United States or your local IBM Service Center in
> other countries.
>
> The forum is informally monitored as time permits and should not be used 
for
> priority messages to the Spectrum Scale (GPFS) team.
>
>
>
> From:        Ilan Schwarts <ilan84 at gmail.com>
> To:        gpfsug-discuss at spectrumscale.org
> Date:        07/04/2017 01:47 PM
> Subject:        [gpfsug-discuss] Fail to mount file system
> Sent by:        gpfsug-discuss-bounces at spectrumscale.org
> ________________________________
>
>
>
> Hi everyone, I have received existing corrupted GPFS 4.2.2 lab and I
> am trying to make it work.
> There are 2 nodes in a cluster:
> [root at LH20-GPFS1 ~]# mmgetstate -a
>
> Node number  Node name        GPFS state
> ------------------------------------------
>       1      LH20-GPFS1       active
>       3      LH20-GPFS2       active
>
> The Cluster status is:
> [root at LH20-GPFS1 ~]# mmlscluster
>
> GPFS cluster information
> ========================
>  GPFS cluster name:         MyCluster.LH20-GPFS2
>  GPFS cluster id:           10777108240438931454
>  GPFS UID domain:           MyCluster.LH20-GPFS2
>  Remote shell command:      /usr/bin/ssh
>  Remote file copy command:  /usr/bin/scp
>  Repository type:           CCR
>
> Node  Daemon node name  IP address    Admin node name  Designation
> --------------------------------------------------------------------
>   1   LH20-GPFS1        10.10.158.61  LH20-GPFS1       quorum-manager
>   3   LH20-GPFS2        10.10.158.62  LH20-GPFS2
>
> There is a file system:
> [root at LH20-GPFS1 ~]# mmlsnsd
>
> File system   Disk name    NSD servers
> 
---------------------------------------------------------------------------
> fs_gpfs01     nynsd1       (directly attached)
> fs_gpfs01     nynsd2       (directly attached)
>
> [root at LH20-GPFS1 ~]#
>
> On each Node, There is folder /fs_gpfs01
> The next step is to mount this fs_gpfs01 to be synced between the 2 
nodes.
> Whilte executing mmmount i get exception:
> [root at LH20-GPFS1 ~]# mmmount /fs_gpfs01
> Tue Jul  4 11:14:18 IDT 2017: mmmount: Mounting file systems ...
> mount: mount fs_gpfs01 on /fs_gpfs01 failed: Wrong medium type
> mmmount: Command failed. Examine previous error messages to determine 
cause.
>
>
> What am i doing wrong ?
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>
>
>
>


-- 


-
Ilan Schwarts


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20170704/d489eef5/attachment.htm>

From ilan84 at gmail.com  Wed Jul  5 08:02:19 2017
From: ilan84 at gmail.com (Ilan Schwarts)
Date: Wed, 5 Jul 2017 10:02:19 +0300
Subject: [gpfsug-discuss] Fail to mount file system
In-Reply-To: <OF3BE98366.CFEC8093-ON85258153.0062D382-65258153.006448CB@notes.na.collabserv.com>
References: <CAJUuSvEu0xw-tby+7CT8=v7fey5HpXqDuFacxupBUqH4eqx0Xw@mail.gmail.com>
	<OF784B68C1.5583C4A9-ON85258153.002DC7F6-65258153.002F463B@notes.na.collabserv.com>
	<CAJUuSvEAr41oB8ZSUNfViQzw18jhBP8TnSd+_CJ5WieW461X+Q@mail.gmail.com>
	<OF03A7AC27.735C97D1-ON85258153.0059A011-65258153.005A6C81@notes.na.collabserv.com>
	<CAJUuSvHx7GCi3oabg+XqvrSwr_5Ux8qf9jdGjNHS5Rx5CZEwqw@mail.gmail.com>
	<OF3BE98366.CFEC8093-ON85258153.0062D382-65258153.006448CB@notes.na.collabserv.com>
Message-ID: <CAJUuSvFMsWd+wSQ7zy5bGcKRMiRYNpTpyeRT9Zq1JTb8nPWXVQ@mail.gmail.com>

Hi,

[root at LH20-GPFS2 ~]# mmlsnsd -X

 Disk name    NSD volume ID      Device         Devtype  Node name
           Remarks
---------------------------------------------------------------------------------------------------
 nynsd1       0A0A9E3D594D5CA8   -              -        LH20-GPFS2
           (not found) directly attached
 nynsd2       0A0A9E3D594D5CA9   -              -        LH20-GPFS2
           (not found) directly attached

mmmount failed with -o rs
root at LH20-GPFS2 ~]# mmmount fs_gpfs01 -o rs
Wed Jul  5 09:58:29 IDT 2017: mmmount: Mounting file systems ...
mount: mount fs_gpfs01 on /fs_gpfs01 failed: Wrong medium type
mmmount: Command failed. Examine previous error messages to determine cause.

and in logs /var/adm/ras/mmfs.log.latest:
2017-07-05_09:58:30.009+0300: [I] Command: mount fs_gpfs01
2017-07-05_09:58:30.890+0300: Failed to open fs_gpfs01.
2017-07-05_09:58:30.890+0300: Wrong medium type
2017-07-05_09:58:30.890+0300: [E] Failed to open fs_gpfs01.
2017-07-05_09:58:30.890+0300: [W] Command: err 48: mount fs_gpfs01


From scale at us.ibm.com  Wed Jul  5 08:44:19 2017
From: scale at us.ibm.com (IBM Spectrum Scale)
Date: Wed, 5 Jul 2017 13:14:19 +0530
Subject: [gpfsug-discuss] Fail to mount file system
In-Reply-To: <CAJUuSvFMsWd+wSQ7zy5bGcKRMiRYNpTpyeRT9Zq1JTb8nPWXVQ@mail.gmail.com>
References: <CAJUuSvEu0xw-tby+7CT8=v7fey5HpXqDuFacxupBUqH4eqx0Xw@mail.gmail.com>
	<OF784B68C1.5583C4A9-ON85258153.002DC7F6-65258153.002F463B@notes.na.collabserv.com>
	<CAJUuSvEAr41oB8ZSUNfViQzw18jhBP8TnSd+_CJ5WieW461X+Q@mail.gmail.com>
	<OF03A7AC27.735C97D1-ON85258153.0059A011-65258153.005A6C81@notes.na.collabserv.com>
	<CAJUuSvHx7GCi3oabg+XqvrSwr_5Ux8qf9jdGjNHS5Rx5CZEwqw@mail.gmail.com>
	<OF3BE98366.CFEC8093-ON85258153.0062D382-65258153.006448CB@notes.na.collabserv.com>
	<CAJUuSvFMsWd+wSQ7zy5bGcKRMiRYNpTpyeRT9Zq1JTb8nPWXVQ@mail.gmail.com>
Message-ID: <OF7806A17D.6C55CB7B-ON85258154.002A47C2-65258154.002A77E3@notes.na.collabserv.com>

>From mmlsnsd output can see that the disks are not found by gpfs (maybe 
some connection issue or they have been changed/removed from backend)
Please open a PMR and work with IBM support to resolve this.

Regards, The Spectrum Scale (GPFS) team

------------------------------------------------------------------------------------------------------------------
If you feel that your question can benefit other users of  Spectrum Scale 
(GPFS), then please post it to the public IBM developerWroks Forum at 
https://www.ibm.com/developerworks/community/forums/html/forum?id=11111111-0000-0000-0000-000000000479
. 

If your query concerns a potential software error in Spectrum Scale (GPFS) 
and you have an IBM software maintenance contract please contact 
1-800-237-5511 in the United States or your local IBM Service Center in 
other countries. 

The forum is informally monitored as time permits and should not be used 
for priority messages to the Spectrum Scale (GPFS) team.


From:   Ilan Schwarts <ilan84 at gmail.com>
To:     IBM Spectrum Scale <scale at us.ibm.com>
Cc:     gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>, 
gpfsug-discuss-bounces at spectrumscale.org
Date:   07/05/2017 12:32 PM
Subject:        Re: [gpfsug-discuss] Fail to mount file system


Hi,

[root at LH20-GPFS2 ~]# mmlsnsd -X

 Disk name    NSD volume ID      Device         Devtype  Node name
           Remarks
---------------------------------------------------------------------------------------------------
 nynsd1       0A0A9E3D594D5CA8   -              -        LH20-GPFS2
           (not found) directly attached
 nynsd2       0A0A9E3D594D5CA9   -              -        LH20-GPFS2
           (not found) directly attached

mmmount failed with -o rs
root at LH20-GPFS2 ~]# mmmount fs_gpfs01 -o rs
Wed Jul  5 09:58:29 IDT 2017: mmmount: Mounting file systems ...
mount: mount fs_gpfs01 on /fs_gpfs01 failed: Wrong medium type
mmmount: Command failed. Examine previous error messages to determine 
cause.

and in logs /var/adm/ras/mmfs.log.latest:
2017-07-05_09:58:30.009+0300: [I] Command: mount fs_gpfs01
2017-07-05_09:58:30.890+0300: Failed to open fs_gpfs01.
2017-07-05_09:58:30.890+0300: Wrong medium type
2017-07-05_09:58:30.890+0300: [E] Failed to open fs_gpfs01.
2017-07-05_09:58:30.890+0300: [W] Command: err 48: mount fs_gpfs01


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20170705/a7a165aa/attachment.htm>

From UWEFALKE at de.ibm.com  Wed Jul  5 09:00:23 2017
From: UWEFALKE at de.ibm.com (Uwe Falke)
Date: Wed, 5 Jul 2017 10:00:23 +0200
Subject: [gpfsug-discuss] Fail to mount file system
In-Reply-To: <CAJUuSvFMsWd+wSQ7zy5bGcKRMiRYNpTpyeRT9Zq1JTb8nPWXVQ@mail.gmail.com>
References: <CAJUuSvEu0xw-tby+7CT8=v7fey5HpXqDuFacxupBUqH4eqx0Xw@mail.gmail.com><OF784B68C1.5583C4A9-ON85258153.002DC7F6-65258153.002F463B@notes.na.collabserv.com><CAJUuSvEAr41oB8ZSUNfViQzw18jhBP8TnSd+_CJ5WieW461X+Q@mail.gmail.com><OF03A7AC27.735C97D1-ON85258153.0059A011-65258153.005A6C81@notes.na.collabserv.com><CAJUuSvHx7GCi3oabg+XqvrSwr_5Ux8qf9jdGjNHS5Rx5CZEwqw@mail.gmail.com><OF3BE98366.CFEC8093-ON85258153.0062D382-65258153.006448CB@notes.na.collabserv.com>
	<CAJUuSvFMsWd+wSQ7zy5bGcKRMiRYNpTpyeRT9Zq1JTb8nPWXVQ@mail.gmail.com>
Message-ID: <OFD8C805FA.278D6574-ONC1258154.002BC354-C1258154.002BFB77@notes.na.collabserv.com>

Hi, maybe you need to specify your NSDs via the nsddevices user exit 
(Identifies local physical devices that are used as GPFS Network Shared 
Disks (NSDs).). 

script to list the NSDs , place it under 
/var/mmfs/etc/nsddevices.

There is a template under /usr/lpp/mmfs/samples/nsddevices.sample which 
should provide the necessary details. 


Mit freundlichen Gr??en / Kind regards

 
Dr. Uwe Falke
 
IT Specialist
High Performance Computing Services / Integrated Technology Services / 
Data Center Services
-------------------------------------------------------------------------------------------------------------------------------------------
IBM Deutschland
Rathausstr. 7
09111 Chemnitz
Phone: +49 371 6978 2165
Mobile: +49 175 575 2877
E-Mail: uwefalke at de.ibm.com
-------------------------------------------------------------------------------------------------------------------------------------------
IBM Deutschland Business & Technology Services GmbH / Gesch?ftsf?hrung: 
Andreas Hasse, Thomas Wolter
Sitz der Gesellschaft: Ehningen / Registergericht: Amtsgericht Stuttgart, 
HRB 17122 


From ilan84 at gmail.com  Wed Jul  5 13:12:14 2017
From: ilan84 at gmail.com (Ilan Schwarts)
Date: Wed, 5 Jul 2017 15:12:14 +0300
Subject: [gpfsug-discuss] update smb package ?
Message-ID: <CAJUuSvHQ4=0k68XD-1+36vMGOL-T4fWssg4n1Efo1R=UhsagmQ@mail.gmail.com>

Hi, while trying to enable SMB service i receive the following

root at LH20-GPFS1 ~]# mmces service enable smb
LH20-GPFS1:  Cannot enable SMB service on LH20-GPFS1
LH20-GPFS1:  mmcesop: Prerequisite libraries not found or correct version not
LH20-GPFS1:  installed. Ensure gpfs.smb is properly installed.
LH20-GPFS1:  mmcesop: Command failed. Examine previous error messages
to determine cause.
mmdsh: LH20-GPFS1 remote shell process had return code 1.

Do i use normal yum update ? how to solve this issue ?

Thanks


From ilan84 at gmail.com  Wed Jul  5 13:18:54 2017
From: ilan84 at gmail.com (Ilan Schwarts)
Date: Wed, 5 Jul 2017 15:18:54 +0300
Subject: [gpfsug-discuss] Fwd: update smb package ?
In-Reply-To: <CAJUuSvHQ4=0k68XD-1+36vMGOL-T4fWssg4n1Efo1R=UhsagmQ@mail.gmail.com>
References: <CAJUuSvHQ4=0k68XD-1+36vMGOL-T4fWssg4n1Efo1R=UhsagmQ@mail.gmail.com>
Message-ID: <CAJUuSvHC_NnsR4ryDMx=3fzT5c6r8Z+nGwcbfcAK17MHszxwwQ@mail.gmail.com>

[root at LH20-GPFS1 ~]# rpm -qa | grep gpfs
gpfs.ext-4.2.2-0.x86_64
gpfs.msg.en_US-4.2.2-0.noarch
gpfs.gui-4.2.2-0.noarch
gpfs.gpl-4.2.2-0.noarch
gpfs.gskit-8.0.50-57.x86_64
gpfs.gss.pmsensors-4.2.2-0.el7.x86_64
gpfs.adv-4.2.2-0.x86_64
gpfs.java-4.2.2-0.x86_64
gpfs.gss.pmcollector-4.2.2-0.el7.x86_64
gpfs.base-4.2.2-0.x86_64
gpfs.crypto-4.2.2-0.x86_64
[root at LH20-GPFS1 ~]# uname -a
Linux LH20-GPFS1.LH20.com 3.10.0-514.21.2.el7.x86_64 #1 SMP Tue Jun 20
12:24:47 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux
[root at LH20-GPFS1 ~]#


From r.sobey at imperial.ac.uk  Wed Jul  5 13:23:10 2017
From: r.sobey at imperial.ac.uk (Sobey, Richard A)
Date: Wed, 5 Jul 2017 12:23:10 +0000
Subject: [gpfsug-discuss] Fwd: update smb package ?
In-Reply-To: <CAJUuSvHC_NnsR4ryDMx=3fzT5c6r8Z+nGwcbfcAK17MHszxwwQ@mail.gmail.com>
References: <CAJUuSvHQ4=0k68XD-1+36vMGOL-T4fWssg4n1Efo1R=UhsagmQ@mail.gmail.com>
	<CAJUuSvHC_NnsR4ryDMx=3fzT5c6r8Z+nGwcbfcAK17MHszxwwQ@mail.gmail.com>
Message-ID: <VI1PR0602MB322985E19B627E5057105AD9DFD40@VI1PR0602MB3229.eurprd06.prod.outlook.com>

You don't have the gpfs.smb package installed.


Yum install gpfs.smb


Or install the package manually from /usr/lpp/mmfs/<version>/smb_rpms


[root at ces ~]# rpm -qa | grep gpfs

gpfs.smb-4.3.11_gpfs_21-8.el7.x86_64


-----Original Message-----
From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Ilan Schwarts
Sent: 05 July 2017 13:19
To: gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
Subject: [gpfsug-discuss] Fwd: update smb package ?


[root at LH20-GPFS1 ~]# rpm -qa | grep gpfs

gpfs.ext-4.2.2-0.x86_64

gpfs.msg.en_US-4.2.2-0.noarch

gpfs.gui-4.2.2-0.noarch

gpfs.gpl-4.2.2-0.noarch

gpfs.gskit-8.0.50-57.x86_64

gpfs.gss.pmsensors-4.2.2-0.el7.x86_64

gpfs.adv-4.2.2-0.x86_64

gpfs.java-4.2.2-0.x86_64

gpfs.gss.pmcollector-4.2.2-0.el7.x86_64

gpfs.base-4.2.2-0.x86_64

gpfs.crypto-4.2.2-0.x86_64

[root at LH20-GPFS1 ~]# uname -a

Linux LH20-GPFS1.LH20.com 3.10.0-514.21.2.el7.x86_64 #1 SMP Tue Jun 20

12:24:47 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux

[root at LH20-GPFS1 ~]#

_______________________________________________

gpfsug-discuss mailing list

gpfsug-discuss at spectrumscale.org

http://gpfsug.org/mailman/listinfo/gpfsug-discuss
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20170705/a182b930/attachment.htm>

From ilan84 at gmail.com  Wed Jul  5 13:29:11 2017
From: ilan84 at gmail.com (Ilan Schwarts)
Date: Wed, 5 Jul 2017 15:29:11 +0300
Subject: [gpfsug-discuss] Fwd: update smb package ?
In-Reply-To: <VI1PR0602MB322985E19B627E5057105AD9DFD40@VI1PR0602MB3229.eurprd06.prod.outlook.com>
References: <CAJUuSvHQ4=0k68XD-1+36vMGOL-T4fWssg4n1Efo1R=UhsagmQ@mail.gmail.com>
	<CAJUuSvHC_NnsR4ryDMx=3fzT5c6r8Z+nGwcbfcAK17MHszxwwQ@mail.gmail.com>
	<VI1PR0602MB322985E19B627E5057105AD9DFD40@VI1PR0602MB3229.eurprd06.prod.outlook.com>
Message-ID: <CAJUuSvEL9E0rNJZzF4dPK8R_DGrKtBi0Q63zs+gQCg7i1Bi2FQ@mail.gmail.com>

[root at LH20-GPFS1 ~]# yum install gpfs.smb
Loaded plugins: fastestmirror, langpacks
base

               | 3.6 kB  00:00:00
epel/x86_64/metalink

               |  24 kB  00:00:00
epel

               | 4.3 kB  00:00:00
extras

               | 3.4 kB  00:00:00
updates

               | 3.4 kB  00:00:00
(1/4): epel/x86_64/updateinfo

               | 789 kB  00:00:00
(2/4): extras/7/x86_64/primary_db

               | 188 kB  00:00:00
(3/4): epel/x86_64/primary_db

               | 4.8 MB  00:00:00
(4/4): updates/7/x86_64/primary_db

               | 7.7 MB  00:00:01
Loading mirror speeds from cached hostfile
 * base: centos.spd.co.il
 * epel: mirror.nonstop.co.il
 * extras: centos.spd.co.il
 * updates: centos.spd.co.il
No package gpfs.smb available.
Error: Nothing to do


[root at LH20-GPFS1 ~]# ls /usr/lpp/mmfs/4.2.2.0/
gpfs_rpms/  license/    manifest    zimon_debs/ zimon_rpms/


something is missing in my machine :)


On Wed, Jul 5, 2017 at 3:23 PM, Sobey, Richard A <r.sobey at imperial.ac.uk> wrote:
> You don't have the gpfs.smb package installed.
>
>
>
> Yum install gpfs.smb
>
>
>
> Or install the package manually from /usr/lpp/mmfs/<version>/smb_rpms
>
>
>
> [root at ces ~]# rpm -qa | grep gpfs
>
> gpfs.smb-4.3.11_gpfs_21-8.el7.x86_64
>
>
>
>
>
> -----Original Message-----
> From: gpfsug-discuss-bounces at spectrumscale.org
> [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Ilan Schwarts
> Sent: 05 July 2017 13:19
> To: gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
> Subject: [gpfsug-discuss] Fwd: update smb package ?
>
>
>
> [root at LH20-GPFS1 ~]# rpm -qa | grep gpfs
>
> gpfs.ext-4.2.2-0.x86_64
>
> gpfs.msg.en_US-4.2.2-0.noarch
>
> gpfs.gui-4.2.2-0.noarch
>
> gpfs.gpl-4.2.2-0.noarch
>
> gpfs.gskit-8.0.50-57.x86_64
>
> gpfs.gss.pmsensors-4.2.2-0.el7.x86_64
>
> gpfs.adv-4.2.2-0.x86_64
>
> gpfs.java-4.2.2-0.x86_64
>
> gpfs.gss.pmcollector-4.2.2-0.el7.x86_64
>
> gpfs.base-4.2.2-0.x86_64
>
> gpfs.crypto-4.2.2-0.x86_64
>
> [root at LH20-GPFS1 ~]# uname -a
>
> Linux LH20-GPFS1.LH20.com 3.10.0-514.21.2.el7.x86_64 #1 SMP Tue Jun 20
>
> 12:24:47 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux
>
> [root at LH20-GPFS1 ~]#
>
> _______________________________________________
>
> gpfsug-discuss mailing list
>
> gpfsug-discuss at spectrumscale.org
>
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>
>
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>


-- 


-
Ilan Schwarts


From r.sobey at imperial.ac.uk  Wed Jul  5 13:41:29 2017
From: r.sobey at imperial.ac.uk (Sobey, Richard A)
Date: Wed, 5 Jul 2017 12:41:29 +0000
Subject: [gpfsug-discuss] Fwd: update smb package ?
In-Reply-To: <CAJUuSvEL9E0rNJZzF4dPK8R_DGrKtBi0Q63zs+gQCg7i1Bi2FQ@mail.gmail.com>
References: <CAJUuSvHQ4=0k68XD-1+36vMGOL-T4fWssg4n1Efo1R=UhsagmQ@mail.gmail.com>
	<CAJUuSvHC_NnsR4ryDMx=3fzT5c6r8Z+nGwcbfcAK17MHszxwwQ@mail.gmail.com>
	<VI1PR0602MB322985E19B627E5057105AD9DFD40@VI1PR0602MB3229.eurprd06.prod.outlook.com>
	<CAJUuSvEL9E0rNJZzF4dPK8R_DGrKtBi0Q63zs+gQCg7i1Bi2FQ@mail.gmail.com>
Message-ID: <VI1PR0602MB32293120893A6D205CF0E9D0DFD40@VI1PR0602MB3229.eurprd06.prod.outlook.com>

Ah... yes you need to download the protocols version of gpfs from Fix Central. Same GPFS but with the SMB/Object etc packages.

-----Original Message-----
From: Ilan Schwarts [mailto:ilan84 at gmail.com] 
Sent: 05 July 2017 13:29
To: gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>; Sobey, Richard A <r.sobey at imperial.ac.uk>
Subject: Re: [gpfsug-discuss] Fwd: update smb package ?

[root at LH20-GPFS1 ~]# yum install gpfs.smb Loaded plugins: fastestmirror, langpacks base

               | 3.6 kB  00:00:00
epel/x86_64/metalink

               |  24 kB  00:00:00
epel

               | 4.3 kB  00:00:00
extras

               | 3.4 kB  00:00:00
updates

               | 3.4 kB  00:00:00
(1/4): epel/x86_64/updateinfo

               | 789 kB  00:00:00
(2/4): extras/7/x86_64/primary_db

               | 188 kB  00:00:00
(3/4): epel/x86_64/primary_db

               | 4.8 MB  00:00:00
(4/4): updates/7/x86_64/primary_db

               | 7.7 MB  00:00:01
Loading mirror speeds from cached hostfile
 * base: centos.spd.co.il
 * epel: mirror.nonstop.co.il
 * extras: centos.spd.co.il
 * updates: centos.spd.co.il
No package gpfs.smb available.
Error: Nothing to do


[root at LH20-GPFS1 ~]# ls /usr/lpp/mmfs/4.2.2.0/
gpfs_rpms/  license/    manifest    zimon_debs/ zimon_rpms/


something is missing in my machine :)


On Wed, Jul 5, 2017 at 3:23 PM, Sobey, Richard A <r.sobey at imperial.ac.uk> wrote:
> You don't have the gpfs.smb package installed.
>
>
>
> Yum install gpfs.smb
>
>
>
> Or install the package manually from /usr/lpp/mmfs/<version>/smb_rpms
>
>
>
> [root at ces ~]# rpm -qa | grep gpfs
>
> gpfs.smb-4.3.11_gpfs_21-8.el7.x86_64
>
>
>
>
>
> -----Original Message-----
> From: gpfsug-discuss-bounces at spectrumscale.org
> [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Ilan 
> Schwarts
> Sent: 05 July 2017 13:19
> To: gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
> Subject: [gpfsug-discuss] Fwd: update smb package ?
>
>
>
> [root at LH20-GPFS1 ~]# rpm -qa | grep gpfs
>
> gpfs.ext-4.2.2-0.x86_64
>
> gpfs.msg.en_US-4.2.2-0.noarch
>
> gpfs.gui-4.2.2-0.noarch
>
> gpfs.gpl-4.2.2-0.noarch
>
> gpfs.gskit-8.0.50-57.x86_64
>
> gpfs.gss.pmsensors-4.2.2-0.el7.x86_64
>
> gpfs.adv-4.2.2-0.x86_64
>
> gpfs.java-4.2.2-0.x86_64
>
> gpfs.gss.pmcollector-4.2.2-0.el7.x86_64
>
> gpfs.base-4.2.2-0.x86_64
>
> gpfs.crypto-4.2.2-0.x86_64
>
> [root at LH20-GPFS1 ~]# uname -a
>
> Linux LH20-GPFS1.LH20.com 3.10.0-514.21.2.el7.x86_64 #1 SMP Tue Jun 20
>
> 12:24:47 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux
>
> [root at LH20-GPFS1 ~]#
>
> _______________________________________________
>
> gpfsug-discuss mailing list
>
> gpfsug-discuss at spectrumscale.org
>
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>
>
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>


-- 


-
Ilan Schwarts

From ilan84 at gmail.com  Wed Jul  5 14:08:39 2017
From: ilan84 at gmail.com (Ilan Schwarts)
Date: Wed, 5 Jul 2017 16:08:39 +0300
Subject: [gpfsug-discuss] Fwd: update smb package ?
In-Reply-To: <VI1PR0602MB32293120893A6D205CF0E9D0DFD40@VI1PR0602MB3229.eurprd06.prod.outlook.com>
References: <CAJUuSvHQ4=0k68XD-1+36vMGOL-T4fWssg4n1Efo1R=UhsagmQ@mail.gmail.com>
	<CAJUuSvHC_NnsR4ryDMx=3fzT5c6r8Z+nGwcbfcAK17MHszxwwQ@mail.gmail.com>
	<VI1PR0602MB322985E19B627E5057105AD9DFD40@VI1PR0602MB3229.eurprd06.prod.outlook.com>
	<CAJUuSvEL9E0rNJZzF4dPK8R_DGrKtBi0Q63zs+gQCg7i1Bi2FQ@mail.gmail.com>
	<VI1PR0602MB32293120893A6D205CF0E9D0DFD40@VI1PR0602MB3229.eurprd06.prod.outlook.com>
Message-ID: <CAJUuSvHgcvuhaGizJd9Dzq0ERObZ-gATXrxKKizZ2f2zAG30iQ@mail.gmail.com>

Sorry for newbish question,
What do you mean by "from Fix Central",
Do i need to define another repository for the yum ? or download manually ?
its spectrum scale 4.2.2

On Wed, Jul 5, 2017 at 3:41 PM, Sobey, Richard A <r.sobey at imperial.ac.uk> wrote:
> Ah... yes you need to download the protocols version of gpfs from Fix Central. Same GPFS but with the SMB/Object etc packages.
>
> -----Original Message-----
> From: Ilan Schwarts [mailto:ilan84 at gmail.com]
> Sent: 05 July 2017 13:29
> To: gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>; Sobey, Richard A <r.sobey at imperial.ac.uk>
> Subject: Re: [gpfsug-discuss] Fwd: update smb package ?
>
> [root at LH20-GPFS1 ~]# yum install gpfs.smb Loaded plugins: fastestmirror, langpacks base
>
>                | 3.6 kB  00:00:00
> epel/x86_64/metalink
>
>                |  24 kB  00:00:00
> epel
>
>                | 4.3 kB  00:00:00
> extras
>
>                | 3.4 kB  00:00:00
> updates
>
>                | 3.4 kB  00:00:00
> (1/4): epel/x86_64/updateinfo
>
>                | 789 kB  00:00:00
> (2/4): extras/7/x86_64/primary_db
>
>                | 188 kB  00:00:00
> (3/4): epel/x86_64/primary_db
>
>                | 4.8 MB  00:00:00
> (4/4): updates/7/x86_64/primary_db
>
>                | 7.7 MB  00:00:01
> Loading mirror speeds from cached hostfile
>  * base: centos.spd.co.il
>  * epel: mirror.nonstop.co.il
>  * extras: centos.spd.co.il
>  * updates: centos.spd.co.il
> No package gpfs.smb available.
> Error: Nothing to do
>
>
> [root at LH20-GPFS1 ~]# ls /usr/lpp/mmfs/4.2.2.0/
> gpfs_rpms/  license/    manifest    zimon_debs/ zimon_rpms/
>
>
> something is missing in my machine :)
>
>
> On Wed, Jul 5, 2017 at 3:23 PM, Sobey, Richard A <r.sobey at imperial.ac.uk> wrote:
>> You don't have the gpfs.smb package installed.
>>
>>
>>
>> Yum install gpfs.smb
>>
>>
>>
>> Or install the package manually from /usr/lpp/mmfs/<version>/smb_rpms
>>
>>
>>
>> [root at ces ~]# rpm -qa | grep gpfs
>>
>> gpfs.smb-4.3.11_gpfs_21-8.el7.x86_64
>>
>>
>>
>>
>>
>> -----Original Message-----
>> From: gpfsug-discuss-bounces at spectrumscale.org
>> [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Ilan
>> Schwarts
>> Sent: 05 July 2017 13:19
>> To: gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
>> Subject: [gpfsug-discuss] Fwd: update smb package ?
>>
>>
>>
>> [root at LH20-GPFS1 ~]# rpm -qa | grep gpfs
>>
>> gpfs.ext-4.2.2-0.x86_64
>>
>> gpfs.msg.en_US-4.2.2-0.noarch
>>
>> gpfs.gui-4.2.2-0.noarch
>>
>> gpfs.gpl-4.2.2-0.noarch
>>
>> gpfs.gskit-8.0.50-57.x86_64
>>
>> gpfs.gss.pmsensors-4.2.2-0.el7.x86_64
>>
>> gpfs.adv-4.2.2-0.x86_64
>>
>> gpfs.java-4.2.2-0.x86_64
>>
>> gpfs.gss.pmcollector-4.2.2-0.el7.x86_64
>>
>> gpfs.base-4.2.2-0.x86_64
>>
>> gpfs.crypto-4.2.2-0.x86_64
>>
>> [root at LH20-GPFS1 ~]# uname -a
>>
>> Linux LH20-GPFS1.LH20.com 3.10.0-514.21.2.el7.x86_64 #1 SMP Tue Jun 20
>>
>> 12:24:47 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux
>>
>> [root at LH20-GPFS1 ~]#
>>
>> _______________________________________________
>>
>> gpfsug-discuss mailing list
>>
>> gpfsug-discuss at spectrumscale.org
>>
>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>>
>>
>> _______________________________________________
>> gpfsug-discuss mailing list
>> gpfsug-discuss at spectrumscale.org
>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>>
>
>
>
> --
>
>
> -
> Ilan Schwarts


-- 


-
Ilan Schwarts


From S.J.Thompson at bham.ac.uk  Wed Jul  5 14:40:46 2017
From: S.J.Thompson at bham.ac.uk (Simon Thompson (IT Research Support))
Date: Wed, 5 Jul 2017 13:40:46 +0000
Subject: [gpfsug-discuss] Fwd: update smb package ?
In-Reply-To: <CAJUuSvHgcvuhaGizJd9Dzq0ERObZ-gATXrxKKizZ2f2zAG30iQ@mail.gmail.com>
References: <CAJUuSvHQ4=0k68XD-1+36vMGOL-T4fWssg4n1Efo1R=UhsagmQ@mail.gmail.com>
	<CAJUuSvHC_NnsR4ryDMx=3fzT5c6r8Z+nGwcbfcAK17MHszxwwQ@mail.gmail.com>
	<VI1PR0602MB322985E19B627E5057105AD9DFD40@VI1PR0602MB3229.eurprd06.prod.outlook.com>
	<CAJUuSvEL9E0rNJZzF4dPK8R_DGrKtBi0Q63zs+gQCg7i1Bi2FQ@mail.gmail.com>
	<VI1PR0602MB32293120893A6D205CF0E9D0DFD40@VI1PR0602MB3229.eurprd06.prod.outlook.com>
	<CAJUuSvHgcvuhaGizJd9Dzq0ERObZ-gATXrxKKizZ2f2zAG30iQ@mail.gmail.com>
Message-ID: <D582A96D.3FBD4%s.j.thompson@bham.ac.uk>


IBM code comes from either IBM Passport Advantage (where you sign in with
a corporate account that lists your product associations), or from IBM Fix
Central (google it). Fix Central is supposed to be for service updates.

Give the lack of experience, you may want to look at the install toolkit
which ships with Spectrum Scale.

Simon

On 05/07/2017, 14:08, "gpfsug-discuss-bounces at spectrumscale.org on behalf
of ilan84 at gmail.com" <gpfsug-discuss-bounces at spectrumscale.org on behalf
of ilan84 at gmail.com> wrote:

>Sorry for newbish question,
>What do you mean by "from Fix Central",
>Do i need to define another repository for the yum ? or download manually
>?
>its spectrum scale 4.2.2
>
>On Wed, Jul 5, 2017 at 3:41 PM, Sobey, Richard A <r.sobey at imperial.ac.uk>
>wrote:
>> Ah... yes you need to download the protocols version of gpfs from Fix
>>Central. Same GPFS but with the SMB/Object etc packages.
>>
>> -----Original Message-----
>> From: Ilan Schwarts [mailto:ilan84 at gmail.com]
>> Sent: 05 July 2017 13:29
>> To: gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>;
>>Sobey, Richard A <r.sobey at imperial.ac.uk>
>> Subject: Re: [gpfsug-discuss] Fwd: update smb package ?
>>
>> [root at LH20-GPFS1 ~]# yum install gpfs.smb Loaded plugins:
>>fastestmirror, langpacks base
>>
>>                | 3.6 kB  00:00:00
>> epel/x86_64/metalink
>>
>>                |  24 kB  00:00:00
>> epel
>>
>>                | 4.3 kB  00:00:00
>> extras
>>
>>                | 3.4 kB  00:00:00
>> updates
>>
>>                | 3.4 kB  00:00:00
>> (1/4): epel/x86_64/updateinfo
>>
>>                | 789 kB  00:00:00
>> (2/4): extras/7/x86_64/primary_db
>>
>>                | 188 kB  00:00:00
>> (3/4): epel/x86_64/primary_db
>>
>>                | 4.8 MB  00:00:00
>> (4/4): updates/7/x86_64/primary_db
>>
>>                | 7.7 MB  00:00:01
>> Loading mirror speeds from cached hostfile
>>  * base: centos.spd.co.il
>>  * epel: mirror.nonstop.co.il
>>  * extras: centos.spd.co.il
>>  * updates: centos.spd.co.il
>> No package gpfs.smb available.
>> Error: Nothing to do
>>
>>
>> [root at LH20-GPFS1 ~]# ls /usr/lpp/mmfs/4.2.2.0/
>> gpfs_rpms/  license/    manifest    zimon_debs/ zimon_rpms/
>>
>>
>> something is missing in my machine :)
>>
>>
>> On Wed, Jul 5, 2017 at 3:23 PM, Sobey, Richard A
>><r.sobey at imperial.ac.uk> wrote:
>>> You don't have the gpfs.smb package installed.
>>>
>>>
>>>
>>> Yum install gpfs.smb
>>>
>>>
>>>
>>> Or install the package manually from /usr/lpp/mmfs/<version>/smb_rpms
>>>
>>>
>>>
>>> [root at ces ~]# rpm -qa | grep gpfs
>>>
>>> gpfs.smb-4.3.11_gpfs_21-8.el7.x86_64
>>>
>>>
>>>
>>>
>>>
>>> -----Original Message-----
>>> From: gpfsug-discuss-bounces at spectrumscale.org
>>> [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Ilan
>>> Schwarts
>>> Sent: 05 July 2017 13:19
>>> To: gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
>>> Subject: [gpfsug-discuss] Fwd: update smb package ?
>>>
>>>
>>>
>>> [root at LH20-GPFS1 ~]# rpm -qa | grep gpfs
>>>
>>> gpfs.ext-4.2.2-0.x86_64
>>>
>>> gpfs.msg.en_US-4.2.2-0.noarch
>>>
>>> gpfs.gui-4.2.2-0.noarch
>>>
>>> gpfs.gpl-4.2.2-0.noarch
>>>
>>> gpfs.gskit-8.0.50-57.x86_64
>>>
>>> gpfs.gss.pmsensors-4.2.2-0.el7.x86_64
>>>
>>> gpfs.adv-4.2.2-0.x86_64
>>>
>>> gpfs.java-4.2.2-0.x86_64
>>>
>>> gpfs.gss.pmcollector-4.2.2-0.el7.x86_64
>>>
>>> gpfs.base-4.2.2-0.x86_64
>>>
>>> gpfs.crypto-4.2.2-0.x86_64
>>>
>>> [root at LH20-GPFS1 ~]# uname -a
>>>
>>> Linux LH20-GPFS1.LH20.com 3.10.0-514.21.2.el7.x86_64 #1 SMP Tue Jun 20
>>>
>>> 12:24:47 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux
>>>
>>> [root at LH20-GPFS1 ~]#
>>>
>>> _______________________________________________
>>>
>>> gpfsug-discuss mailing list
>>>
>>> gpfsug-discuss at spectrumscale.org
>>>
>>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>>>
>>>
>>> _______________________________________________
>>> gpfsug-discuss mailing list
>>> gpfsug-discuss at spectrumscale.org
>>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>>>
>>
>>
>>
>> --
>>
>>
>> -
>> Ilan Schwarts
>
>
>
>-- 
>
>
>-
>Ilan Schwarts
>_______________________________________________
>gpfsug-discuss mailing list
>gpfsug-discuss at spectrumscale.org
>http://gpfsug.org/mailman/listinfo/gpfsug-discuss


From hpc-luke at uconn.edu  Wed Jul  5 15:52:52 2017
From: hpc-luke at uconn.edu (hpc-luke at uconn.edu)
Date: Wed, 05 Jul 2017 10:52:52 -0400
Subject: [gpfsug-discuss] Mass UID migration suggestions
Message-ID: <595cfd44.kc2G2OUXdgiX+srO%hpc-luke@uconn.edu>

Thank you both,

I was already using the c++ stl hash map to do the mapping of uid_t to
uid_t, but I will use that example to learn how to use the proper gpfs
apis. And thank you for the ACL suggestion, as that is likely the best
way to handle certain users who are logged in/running jobs constantly,
where we would not like to force them to logout. And thank you for the
reminder to re-run backups.

Thank you for your time,

Luke
Storrs-HPC
University of Connecticut


From mweil at wustl.edu  Wed Jul  5 16:51:50 2017
From: mweil at wustl.edu (Matt Weil)
Date: Wed, 5 Jul 2017 10:51:50 -0500
Subject: [gpfsug-discuss] pmcollector node
Message-ID: <390fca36-e49a-7255-1ecf-a467a6ee92a5@wustl.edu>

Hello all,

Question on the requirements on pmcollector node/s for a 500+ node
cluster.   Is there a sizing guide?  What specifics should we scale? 
CPU Disks memory?

Thanks

Matt


From kkr at lbl.gov  Wed Jul  5 17:23:38 2017
From: kkr at lbl.gov (Kristy Kallback-Rose)
Date: Wed, 5 Jul 2017 09:23:38 -0700
Subject: [gpfsug-discuss] Zimon checks, ability to use a subset of checks
Message-ID: <CAA9oNusfpmHm4D=YqDfbKsx5j9A-42c=Cos1PXJhr66UZzP-JA@mail.gmail.com>

 As I understand it, there is currently no way to collect just a subset of
stats in a category. For example, CPU stats are:

cpu_contexts
cpu_guest
cpu_guest_nice
cpu_hiq
cpu_idle
cpu_interrupts
cpu_iowait
cpu_nice
cpu_siq
cpu_steal
cpu_system
cpu_user

but I'm only interested in tracking a subset. The config file seems to want
the category "CPU" which seems like an all-or-nothing approach.

I am thinking about submitting an RFE to request the ability to pick and
choose checks within a category, but thought I'd float the idea here before
I submit. Would others find value in this?

Thanks,
Kristy
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20170705/fd402ec9/attachment.htm>

From Robert.Oesterlin at nuance.com  Wed Jul  5 18:00:44 2017
From: Robert.Oesterlin at nuance.com (Oesterlin, Robert)
Date: Wed, 5 Jul 2017 17:00:44 +0000
Subject: [gpfsug-discuss] Zimon checks, ability to use a subset of checks
Message-ID: <11A5144D-A5AF-4829-B7D4-4313F357C6CB@nuance.com>

Count me in!

Bob Oesterlin
Sr Principal Storage Engineer, Nuance


From: <gpfsug-discuss-bounces at spectrumscale.org> on behalf of Kristy Kallback-Rose <kkr at lbl.gov>
Reply-To: gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
Date: Wednesday, July 5, 2017 at 11:23 AM
To: gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
Subject: [EXTERNAL] [gpfsug-discuss] Zimon checks, ability to use a subset of checks

I am thinking about submitting an RFE to request the ability to pick and choose checks within a category, but thought I'd float the idea here before I submit. Would others find value in this?
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20170705/0b5d1b81/attachment.htm>

From kkr at lbl.gov  Wed Jul  5 19:22:14 2017
From: kkr at lbl.gov (Kristy Kallback-Rose)
Date: Wed, 5 Jul 2017 11:22:14 -0700
Subject: [gpfsug-discuss] Meaning of API Stats Category
In-Reply-To: <OF72A0322C.4F76EC54-ON8525813D.007362AE-8525813D.007380E5@us.ibm.com>
References: <C53BE956-5542-4A03-A006-BF3C6770FFAA@nuance.com>
	<OF72A0322C.4F76EC54-ON8525813D.007362AE-8525813D.007380E5@us.ibm.com>
Message-ID: <CAA9oNuvUtkqcWhhfumsA-MqZJF=JQEQXaUyOd+qX4nboCfQS2w@mail.gmail.com>

Thank you Eric. That did help.

On Mon, Jun 12, 2017 at 2:01 PM, IBM Spectrum Scale <scale at us.ibm.com>
wrote:

> Hello Kristy,
>
> The GPFSFileSystemAPI and GPFSNodeAPI sensor metrics are from the point of
> view of "applications" in the sense that they provide stats about I/O
> requests made to files in GPFS file systems from user level applications
> using POSIX interfaces like open(), close(), read(), write(), etc.
>
> This is in contrast to similarly named sensors without the "API" suffix,
> like GPFSFilesystem and GPFSNode.  Those sensors provide stats about I/O
> requests made by the GPFS code to NSDs (disks) making up GPFS file systems.
>
> The relationship between application I/O and disk I/O might or might not
> be obvious.  Consider some examples.  An application that starts
> sequentially reading a file might, at least initially, cause more disk I/O
> than expected because GPFS has decided to prefetch data.  An application
> write() might not immediately cause a the writing of disk blocks due to the
> operation of the pagepool.  Ultimately, application write()s might cause
> twice as much data written to disk due to the replication factor of the
> file system.  Application I/O concerns itself with user data; disk I/O
> might have to occur to handle the user data and associated file system
> metadata (like inodes and indirect blocks).
>
> The difference between GPFSFileSystemAPI and GPFSNodeAPI:
> GPFSFileSystemAPI reports stats for application I/O per filesystem per
> node; GPFSNodeAPI reports application I/O stats per node.  Similarly,
> GPFSFilesystem reports stats for disk I/O per filesystem per node; GPFSNode
> reports disk I/O stats per node.
>
> I hope this helps.
> Eric Agar
>
>
> Regards, The Spectrum Scale (GPFS) team
>
> ------------------------------------------------------------
> ------------------------------------------------------
> If you feel that your question can benefit other users of  Spectrum Scale
> (GPFS), then please post it to the public IBM developerWroks Forum at
> https://www.ibm.com/developerworks/community/
> forums/html/forum?id=11111111-0000-0000-0000-000000000479.
>
> If your query concerns a potential software error in Spectrum Scale (GPFS)
> and you have an IBM software maintenance contract please contact
> 1-800-237-5511 <(800)%20237-5511> in the United States or your local IBM
> Service Center in other countries.
>
> The forum is informally monitored as time permits and should not be used
> for priority messages to the Spectrum Scale (GPFS) team.
>
>
>
> From:        "Oesterlin, Robert" <Robert.Oesterlin at nuance.com>
> To:        gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
> Date:        06/12/2017 04:43 PM
> Subject:        Re: [gpfsug-discuss] Meaning of API Stats Category
> Sent by:        gpfsug-discuss-bounces at spectrumscale.org
> ------------------------------
>
>
>
> Hi Kristy
>
> What I *think* the difference is:
>
> gpfs_fis: - calls to the GPFS file system interface
> gpfs_fs: calls from the node that actually make it to the NSD
> server/metadata
>
> The difference being what?s served out of the local node pagepool.
>
> Bob Oesterlin
> Sr Principal Storage Engineer, Nuance
>
>
> *From: *<gpfsug-discuss-bounces at spectrumscale.org> on behalf of Kristy
> Kallback-Rose <kkr at lbl.gov>
> * Reply-To: *gpfsug main discussion list <gpfsug-discuss at spectrumscale.org
> >
> * Date: *Monday, June 12, 2017 at 3:17 PM
> * To: *gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
> * Subject: *[EXTERNAL] [gpfsug-discuss] Meaning of API Stats Category
>
> Hi,
>
>   Can anyone provide more detail about what is meant by the following two
> categories of stats? The PDG has a limited description as far as I could
> see. I'm not sure what is meant by Application PoV. Would the Grafana
> bridge count as an "application"?
>
>  _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>
>
>
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20170705/fb81e77c/attachment.htm>

From sfadden at us.ibm.com  Wed Jul  5 19:50:24 2017
From: sfadden at us.ibm.com (Scott Fadden)
Date: Wed, 5 Jul 2017 18:50:24 +0000
Subject: [gpfsug-discuss] Zimon checks, ability to use a subset of checks
In-Reply-To: <11A5144D-A5AF-4829-B7D4-4313F357C6CB@nuance.com>
Message-ID: <OF04B5289A.315E84FF-ON00258154.00677DFE-1499280624772@notes.na.collabserv.com>

What do 
   You mean by category? Node class, metric type or something else?
  
  
  On Jul 5, 2017, 10:01:33 AM, Robert.Oesterlin at nuance.com wrote:
  
  From: Robert.Oesterlin at nuance.com
  To: gpfsug-discuss at spectrumscale.org
  Cc: 
  Date: Jul 5, 2017 10:01:33 AM
  Subject: Re: [gpfsug-discuss] Zimon checks, ability to use a subset of checks
  
  
    Count me in! 
      
     
     Bob Oesterlin Sr Principal Storage Engineer, Nuance 
       
     
     From: <gpfsug-discuss-bounces at spectrumscale.org> on behalf of Kristy Kallback-Rose <kkr at lbl.gov> Reply-To: gpfsug main discussion list <gpfsug-discuss at spectrumscale.org> Date: Wednesday, July 5, 2017 at 11:23 AM To: gpfsug main discussion list <gpfsug-discuss at spectrumscale.org> Subject: [EXTERNAL] [gpfsug-discuss] Zimon checks, ability to use a subset of checks 
     
     
    I am thinking about submitting an RFE to request the ability to pick and choose checks within a category, but thought I'd float the idea here before I submit. Would others find value in this?
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20170705/0fbb0c6c/attachment.htm>

From sfadden at us.ibm.com  Wed Jul  5 19:51:46 2017
From: sfadden at us.ibm.com (Scott Fadden)
Date: Wed, 5 Jul 2017 18:51:46 +0000
Subject: [gpfsug-discuss] Zimon checks, ability to use a subset of checks
In-Reply-To: <OF04B5289A.315E84FF-ON00258154.00677DFE-1499280624772@LocalDomain>
Message-ID: <OF11D5912B.7DA32B98-ON00258154.00679DE9-1499280706471@notes.na.collabserv.com>

Never mind just saw your earlier email
 
 
 On Jul 5, 2017, 11:50:24 AM, sfadden at us.ibm.com wrote:
 
 From: sfadden at us.ibm.com
 To: gpfsug-discuss at spectrumscale.org
 Cc: gpfsug-discuss at spectrumscale.org
 Date: Jul 5, 2017 11:50:24 AM
 Subject: Re: [gpfsug-discuss] Zimon checks, ability to use a subset of checks
 
 
      What do 
       You mean by category? Node class, metric type or something else?
    
    
    On Jul 5, 2017, 10:01:33 AM, Robert.Oesterlin at nuance.com wrote:
    
    From: Robert.Oesterlin at nuance.com
    To: gpfsug-discuss at spectrumscale.org
    Cc: 
    Date: Jul 5, 2017 10:01:33 AM
    Subject: Re: [gpfsug-discuss] Zimon checks, ability to use a subset of checks
    
    
      Count me in! 
        
       
       Bob Oesterlin Sr Principal Storage Engineer, Nuance 
         
       
       From: <gpfsug-discuss-bounces at spectrumscale.org> on behalf of Kristy Kallback-Rose <kkr at lbl.gov> Reply-To: gpfsug main discussion list <gpfsug-discuss at spectrumscale.org> Date: Wednesday, July 5, 2017 at 11:23 AM To: gpfsug main discussion list <gpfsug-discuss at spectrumscale.org> Subject: [EXTERNAL] [gpfsug-discuss] Zimon checks, ability to use a subset of checks 
       
       
      I am thinking about submitting an RFE to request the ability to pick and choose checks within a category, but thought I'd float the idea here before I submit. Would others find value in this?
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20170705/292ea39c/attachment.htm>

From scale at us.ibm.com  Thu Jul  6 06:37:33 2017
From: scale at us.ibm.com (IBM Spectrum Scale)
Date: Thu, 6 Jul 2017 11:07:33 +0530
Subject: [gpfsug-discuss] pmcollector node
In-Reply-To: <390fca36-e49a-7255-1ecf-a467a6ee92a5@wustl.edu>
References: <390fca36-e49a-7255-1ecf-a467a6ee92a5@wustl.edu>
Message-ID: <OFB7F0A73D.FB46CC3A-ON85258155.001EC9C4-65258155.001EDBD5@notes.na.collabserv.com>

Hi Anna,

Can you please check if you can answer this.
Or else let me know who to contact for this.

Regards, The Spectrum Scale (GPFS) team

------------------------------------------------------------------------------------------------------------------
If you feel that your question can benefit other users of  Spectrum Scale 
(GPFS), then please post it to the public IBM developerWroks Forum at 
https://www.ibm.com/developerworks/community/forums/html/forum?id=11111111-0000-0000-0000-000000000479
. 

If your query concerns a potential software error in Spectrum Scale (GPFS) 
and you have an IBM software maintenance contract please contact 
1-800-237-5511 in the United States or your local IBM Service Center in 
other countries. 

The forum is informally monitored as time permits and should not be used 
for priority messages to the Spectrum Scale (GPFS) team.


From:   Matt Weil <mweil at wustl.edu>
To:     gpfsug-discuss at spectrumscale.org
Date:   07/05/2017 09:22 PM
Subject:        [gpfsug-discuss] pmcollector node
Sent by:        gpfsug-discuss-bounces at spectrumscale.org


Hello all,

Question on the requirements on pmcollector node/s for a 500+ node
cluster.   Is there a sizing guide?  What specifics should we scale? 
CPU Disks memory?

Thanks

Matt

_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20170706/abd2234b/attachment.htm>

From Wei1.Guo at UTSouthwestern.edu  Thu Jul  6 18:49:32 2017
From: Wei1.Guo at UTSouthwestern.edu (Wei Guo)
Date: Thu, 6 Jul 2017 17:49:32 +0000
Subject: [gpfsug-discuss] depmod: ERROR: fstatat(4,
	mmfs26.ko): No such file or directory
Message-ID: <7304eaf93aa74265ae45a214288dfe4c@SWMS13MAIL10.swmed.org>

Hi, All,

We are testing to upgrade our clients to new RHEL 7.3 kernel with GPFS 4.2.1.0. When we have 3.10.0-514.26.2.el7, installing the gplbin has the following errors:

# ./mmbuildgpl --build-package -v

# cd /root/rpmbuild/RPMS/x86_64/

# rpm -ivh gpfs.gplbin-3.10.0-514.26.2.el7.x86_64-4.2.1-0.x86_64.rpm
Running transaction
  Installing : gpfs.gplbin-3.10.0-514.26.2.el7.x86_64-4.2.1-0.x86_64        1/1
depmod: ERROR: fstatat(4, mmfs26.ko): No such file or directory
depmod: ERROR: fstatat(4, mmfslinux.ko): No such file or directory
depmod: ERROR: fstatat(4, tracedev.ko): No such file or directory


depmod -a also show the three kernel extension not found.

However, in the following directory, they are there.
# pwd
/lib/modules/3.10.0-514.26.2.el7.x86_64/extra

# ls
kernel  mmfs26.ko  mmfslinux.ko  tracedev.ko

The error does not show in a slightly older kernel -3.10.0-514.21.2 version. From https://www.ibm.com/support/knowledgecenter/en/STXKQY/gpfsclustersfaq.html Table 29<https://www.ibm.com/support/knowledgecenter/en/STXKQY/gpfsclustersfaq.html%20Table%2029>, both versions should be supported.
RHEL Distribution

Latest Kernel Level Tested1

Minimum Kernel Level Required2

Minimum IBM Spectrum Scale Level Tested3

Minimum IBM Spectrum Scale Level Supported4

7.3

3.10.0-514

3.10.0-514

V4.1.1.11/V4.2.2.1

V4.1.1.11/V4.2.1.2


For technical reasons, this test node will not be added to production. A previous thread  http://www.spectrumscale.org/pipermail/gpfsug-discuss/2016-April/001529.html  indicated that this will be OK. However, it is better to get a clear conclusion before we update other client nodes. Shall we recompile the kernel? Thanks all.


Wei Guo


________________________________

UT Southwestern


Medical Center


The future of medicine, today.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20170706/b2db45fc/attachment.htm>

From S.J.Thompson at bham.ac.uk  Thu Jul  6 18:52:44 2017
From: S.J.Thompson at bham.ac.uk (Simon Thompson (IT Research Support))
Date: Thu, 6 Jul 2017 17:52:44 +0000
Subject: [gpfsug-discuss] depmod: ERROR: fstatat(4,
 mmfs26.ko): No such file or directory
In-Reply-To: <7304eaf93aa74265ae45a214288dfe4c@SWMS13MAIL10.swmed.org>
References: <7304eaf93aa74265ae45a214288dfe4c@SWMS13MAIL10.swmed.org>
Message-ID: <CF45EE16DEF2FE4B9AA7FF2B6EE26545F5924322@EX13.adf.bham.ac.uk>

Look in the kernel weak-updates directory, you will probably find some broken files in there. These come from things trying to update the kernel modules when you do the kernel upgrade.

Just delete the three gpfs related ones and run depmod 

The safest way is to remove the gpfs.gplbin packages, then upgrade the kernel, reboot and add the new gpfs.gplbin packages for the new kernel.

Simon
________________________________________
From: gpfsug-discuss-bounces at spectrumscale.org [gpfsug-discuss-bounces at spectrumscale.org] on behalf of Wei Guo [Wei1.Guo at UTSouthwestern.edu]
Sent: 06 July 2017 18:49
To: gpfsug-discuss at spectrumscale.org
Subject: [gpfsug-discuss] depmod: ERROR: fstatat(4,     mmfs26.ko): No such file or directory

Hi, All,

We are testing to upgrade our clients to new RHEL 7.3 kernel with GPFS 4.2.1.0. When we have 3.10.0-514.26.2.el7, installing the gplbin has the following errors:

# ./mmbuildgpl --build-package ?v

# cd /root/rpmbuild/RPMS/x86_64/

# rpm -ivh gpfs.gplbin-3.10.0-514.26.2.el7.x86_64-4.2.1-0.x86_64.rpm
Running transaction
  Installing : gpfs.gplbin-3.10.0-514.26.2.el7.x86_64-4.2.1-0.x86_64        1/1
depmod: ERROR: fstatat(4, mmfs26.ko): No such file or directory
depmod: ERROR: fstatat(4, mmfslinux.ko): No such file or directory
depmod: ERROR: fstatat(4, tracedev.ko): No such file or directory


depmod -a also show the three kernel extension not found.

However, in the following directory, they are there.
# pwd
/lib/modules/3.10.0-514.26.2.el7.x86_64/extra

# ls
kernel  mmfs26.ko  mmfslinux.ko  tracedev.ko

The error does not show in a slightly older kernel -3.10.0-514.21.2 version. From https://www.ibm.com/support/knowledgecenter/en/STXKQY/gpfsclustersfaq.html Table 29<https://www.ibm.com/support/knowledgecenter/en/STXKQY/gpfsclustersfaq.html%20Table%2029>, both versions should be supported.
RHEL Distribution

Latest Kernel Level Tested1

Minimum Kernel Level Required2

Minimum IBM Spectrum Scale Level Tested3

Minimum IBM Spectrum Scale Level Supported4

7.3

3.10.0-514

3.10.0-514

V4.1.1.11/V4.2.2.1

V4.1.1.11/V4.2.1.2


For technical reasons, this test node will not be added to production. A previous thread  http://www.spectrumscale.org/pipermail/gpfsug-discuss/2016-April/001529.html  indicated that this will be OK. However, it is better to get a clear conclusion before we update other client nodes. Shall we recompile the kernel? Thanks all.


Wei Guo


________________________________

UT Southwestern


Medical Center


The future of medicine, today.


From abeattie at au1.ibm.com  Thu Jul  6 06:07:07 2017
From: abeattie at au1.ibm.com (Andrew Beattie)
Date: Thu, 6 Jul 2017 05:07:07 +0000
Subject: [gpfsug-discuss] Scale / Perfmon / Grafana - services running but
	no data
Message-ID: <OF6A8EFA7C.5958941C-ON00258155.00193257-00258155.001C1E4B@notes.na.collabserv.com>

An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20170706/bf82bbc7/attachment.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: Image.14992893800360.png
Type: image/png
Size: 431718 bytes
Desc: not available
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20170706/bf82bbc7/attachment.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: Image.14992893800362.png
Type: image/png
Size: 1001127 bytes
Desc: not available
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20170706/bf82bbc7/attachment-0001.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: Image.14993172756190.png
Type: image/png
Size: 381651 bytes
Desc: not available
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20170706/bf82bbc7/attachment-0002.png>

From neil.wilson at metoffice.gov.uk  Fri Jul  7 10:18:40 2017
From: neil.wilson at metoffice.gov.uk (Wilson, Neil)
Date: Fri, 7 Jul 2017 09:18:40 +0000
Subject: [gpfsug-discuss] Scale / Perfmon / Grafana - services running
 but	no data
In-Reply-To: <OF6A8EFA7C.5958941C-ON00258155.00193257-00258155.001C1E4B@notes.na.collabserv.com>
References: <OF6A8EFA7C.5958941C-ON00258155.00193257-00258155.001C1E4B@notes.na.collabserv.com>
Message-ID: <DB2074C221438C4A873D0FBFA3DF5E0B0E6629F5@EXXCMPD1DAG2.cmpd1.metoffice.gov.uk>

Hi Andrew,

Have you created new dashboards for GPFS?

This shows you how to do it https://www.ibm.com/developerworks/community/wikis/home?lang=en#!/wiki/General%20Parallel%20File%20System%20(GPFS)/page/Creating%20Grafana%20dashboard

Alternatively there are some predefined dashboards here https://www.ibm.com/developerworks/community/wikis/home?lang=en#!/wiki/General%20Parallel%20File%20System%20%28GPFS%29/page/Importing%20predefined%20Grafana%20dashboards that you can import and have a play around with?

Regards
Neil


Neil Wilson  Senior IT Practitioner
Storage Team   IT Services
Met Office FitzRoy Road Exeter Devon EX1 3PB United Kingdom
Tel: +44 (0)1392 885959
Email: neil.wilson at metoffice.gov.uk<mailto:neil.wilson at metoffice.gov.uk>   Website www.metoffice.gov.uk<http://www.metoffice.gov.uk/>
Our magazine Barometer is now available online at http://www.metoffice.gov.uk/barometer/
P Please consider the environment before printing this e-mail. Thank you.


From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Andrew Beattie
Sent: 06 July 2017 06:07
To: gpfsug-discuss at spectrumscale.org
Subject: [gpfsug-discuss] Scale / Perfmon / Grafana - services running but no data

Greetings,


I'm currently setting up Grafana to interact with one of our Scale Clusters
and i've followed the knowledge centre link in terms of setup.

https://www.ibm.com/support/knowledgecenter/en/STXKQY_4.2.3/com.ibm.spectrum.scale.v4r23.doc/bl1adv_setuppmbridgeforgrafana.htm

However while everything appears to be working i'm not seeing any data coming through the reports within the grafana server, even though I can see data in the Scale GUI

The current environment:

[root at sc01n02 ~]# mmlscluster
GPFS cluster information
========================
  GPFS cluster name:         sc01.spectrum
  GPFS cluster id:           18085710661892594990
  GPFS UID domain:           sc01.spectrum
  Remote shell command:      /usr/bin/ssh
  Remote file copy command:  /usr/bin/scp
  Repository type:           CCR
 Node  Daemon node name  IP address  Admin node name  Designation
------------------------------------------------------------------
   1   sc01n01           10.2.12.11  sc01n01          quorum-manager-perfmon
   2   sc01n02           10.2.12.12  sc01n02          quorum-manager-perfmon
   3   sc01n03           10.2.12.13  sc01n03          quorum-manager-perfmon
[root at sc01n02 ~]#


[root at sc01n02 ~]# mmlsconfig
Configuration data for cluster sc01.spectrum:
---------------------------------------------
clusterName sc01.spectrum
clusterId 18085710661892594990
autoload yes
profile gpfsProtocolDefaults
dmapiFileHandleSize 32
minReleaseLevel 4.2.2.0
ccrEnabled yes
cipherList AUTHONLY
maxblocksize 16M
[cesNodes]
maxMBpS 5000
numaMemoryInterleave yes
enforceFilesetQuotaOnRoot yes
workerThreads 512
[common]
tscCmdPortRange 60000-61000
cesSharedRoot /ibm/cesSharedRoot/ces
cifsBypassTraversalChecking yes
syncSambaMetadataOps yes
cifsBypassShareLocksOnRename yes
adminMode central
File systems in cluster sc01.spectrum:
--------------------------------------
/dev/cesSharedRoot
/dev/icos_demo
/dev/scale01
[root at sc01n02 ~]#


[root at sc01n02 ~]# systemctl status pmcollector
? pmcollector.service - LSB: Start the ZIMon performance monitor collector.
   Loaded: loaded (/etc/rc.d/init.d/pmcollector)
   Active: active (running) since Tue 2017-05-30 08:46:32 AEST; 1 months 6 days ago
     Docs: man:systemd-sysv-generator(8)
 Main PID: 2693 (ZIMonCollector)
   CGroup: /system.slice/pmcollector.service
           ??2693 /opt/IBM/zimon/ZIMonCollector -C /opt/IBM/zimon/ZIMonCollector.cfg...
           ??2698 python /opt/IBM/zimon/bin/pmmonitor.py -f /opt/IBM/zimon/syshealth...
May 30 08:46:32 sc01n02 systemd[1]: Starting LSB: Start the ZIMon performance mon......
May 30 08:46:32 sc01n02 pmcollector[2584]: Starting performance monitor collector...
May 30 08:46:32 sc01n02 systemd[1]: Started LSB: Start the ZIMon performance moni...r..
Hint: Some lines were ellipsized, use -l to show in full.

From Grafana Server:

[cid:image002.jpg at 01D2F70A.17F595F0]


when I send a set of files to the cluster (3.8GB)  I can see performance metrics within the Scale GUI

[cid:image004.jpg at 01D2F70A.17F595F0]

yet from the Grafana Dashboard im not seeing any data points

[cid:image006.jpg at 01D2F70A.17F595F0]

Can anyone provide some hints as to what might be happening?


Regards,


Andrew Beattie
Software Defined Storage  - IT Specialist
Phone: 614-2133-7927
E-mail: abeattie at au1.ibm.com<mailto:abeattie at au1.ibm.com>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20170707/20617575/attachment.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image002.jpg
Type: image/jpeg
Size: 14522 bytes
Desc: image002.jpg
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20170707/20617575/attachment.jpg>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image004.jpg
Type: image/jpeg
Size: 60060 bytes
Desc: image004.jpg
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20170707/20617575/attachment-0001.jpg>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image006.jpg
Type: image/jpeg
Size: 25781 bytes
Desc: image006.jpg
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20170707/20617575/attachment-0002.jpg>

From olaf.weiser at de.ibm.com  Fri Jul  7 10:18:13 2017
From: olaf.weiser at de.ibm.com (Olaf Weiser)
Date: Fri, 7 Jul 2017 09:18:13 +0000
Subject: [gpfsug-discuss] Scale / Perfmon / Grafana - services running
 but	no data
In-Reply-To: <OFD43984F8.9D0E15C2-ON00258156.00325D03@LocalDomain>
References: <OFD43984F8.9D0E15C2-ON00258156.00325D03@LocalDomain>
Message-ID: <OFD17DD761.B6B58347-ONC1258156.00330CCC-C1258156.00331B20@notes.na.collabserv.com>

An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20170707/0f3ebc43/attachment.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: image/png
Size: 431718 bytes
Desc: not available
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20170707/0f3ebc43/attachment.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: image/png
Size: 1001127 bytes
Desc: not available
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20170707/0f3ebc43/attachment-0001.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: image/png
Size: 381651 bytes
Desc: not available
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20170707/0f3ebc43/attachment-0002.png>

From r.sobey at imperial.ac.uk  Fri Jul  7 13:01:39 2017
From: r.sobey at imperial.ac.uk (Sobey, Richard A)
Date: Fri, 7 Jul 2017 12:01:39 +0000
Subject: [gpfsug-discuss] Requirement to keep gpfs.smb the same on all
 nodes
In-Reply-To: <VI1PR0602MB322977D4A4CE40B89C164ED4DFD70@VI1PR0602MB3229.eurprd06.prod.outlook.com>
References: <D581357C.3F1B0%s.j.thompson@bham.ac.uk>
	<VI1PR0602MB322977D4A4CE40B89C164ED4DFD70@VI1PR0602MB3229.eurprd06.prod.outlook.com>
Message-ID: <HE1PR0602MB3225A2B3ED09D461BDB7AAAFDFAA0@HE1PR0602MB3225.eurprd06.prod.outlook.com>

Just following up on this, has anyone successfully deployed Protocols (SMB) on RHEL 7.3 with the 4.2.3-2 packages?

Richard

From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Sobey, Richard A
Sent: 04 July 2017 12:12
To: gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
Subject: Re: [gpfsug-discuss] Requirement to keep gpfs.smb the same on all nodes

OK Simon, thanks. I suppose we're all in the same boat having to get change management approval etc!

From: gpfsug-discuss-bounces at spectrumscale.org<mailto:gpfsug-discuss-bounces at spectrumscale.org> [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Simon Thompson (IT Research Support)
Sent: 04 July 2017 12:09
To: gpfsug main discussion list <gpfsug-discuss at spectrumscale.org<mailto:gpfsug-discuss at spectrumscale.org>>
Subject: Re: [gpfsug-discuss] Requirement to keep gpfs.smb the same on all nodes

AFAIK. Always.

We have had the service eat itself BTW by having different code releases and trying this.

Yes its a PITA that we have to get a change approval for it (so we don't do it as often as we should)...

The upgrade process upgrades the SMB registry, we have also seen the CTDB lock stuff break when they are not running the same code release, so now we just don't do this.

Simon

From: <gpfsug-discuss-bounces at spectrumscale.org<mailto:gpfsug-discuss-bounces at spectrumscale.org>> on behalf of "Sobey, Richard A" <r.sobey at imperial.ac.uk<mailto:r.sobey at imperial.ac.uk>>
Reply-To: "gpfsug-discuss at spectrumscale.org<mailto:gpfsug-discuss at spectrumscale.org>" <gpfsug-discuss at spectrumscale.org<mailto:gpfsug-discuss at spectrumscale.org>>
Date: Tuesday, 4 July 2017 at 11:54
To: "gpfsug-discuss at spectrumscale.org<mailto:gpfsug-discuss at spectrumscale.org>" <gpfsug-discuss at spectrumscale.org<mailto:gpfsug-discuss at spectrumscale.org>>
Subject: [gpfsug-discuss] Requirement to keep gpfs.smb the same on all nodes

Hi all,

For how long has this requirement been in force, and why?

https://www.ibm.com/support/knowledgecenter/en/STXKQY_4.2.3/com.ibm.spectrum.scale.v4r23.doc/bl1ins_updatingsmb.htm

All protocol nodes running the SMB service must have the same version of gpfs.smb installed at any time. This requires a brief outage of the SMB service to upgrade gpfs.smb to the newer version across all protocol nodes. The procedure outlined here is intended to reduce the outage to a minimum.

Previously I've upgraded nodes one at a time over the course of a few days.

Is the impact just that we won't be supported, or will a hole open up beneath my feet and swallow me whole?

I really don't fancy the headache of getting approvals to get an outage of even 5 minutes at 6am....

Cheers
Richard

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20170707/55ee02c4/attachment.htm>

From kkr at lbl.gov  Fri Jul  7 23:32:40 2017
From: kkr at lbl.gov (Kristy Kallback-Rose)
Date: Fri, 7 Jul 2017 15:32:40 -0700
Subject: [gpfsug-discuss] Hold the Date - Spectrum Scale Day @ HPCXXL (Sept
	2017, NYC)
Message-ID: <A52508C1-582D-4024-9313-F06EE774D0E9@lbl.gov>

Hello,

  More details will be provided as they become available, but just so you can make a placeholder on your calendar, there will be a Spectrum Scale Day the week of September 25th - 29th, likely Thursday, September 28th. 

  This will be a part of the larger HPCXXL meeting (https://www.spxxl.org/?q=New-York-City-2017 <https://www.spxxl.org/?q=New-York-City-2017>). You may recall this group was formerly called SPXXL and the website is in the process of transitioning to the new name (and getting a new certificate). You will be able to attend *just* the Spectrum Scale day if that is the only portion of the event you would like to attend. 

  The NYC location is great for Spectrum Scale events because many IBMers, including developers, can come in from Poughkeepsie.

  More as we get closer to the date and details are settled.

Cheers,
Kristy
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20170707/21fa76e4/attachment.htm>

From a.khiredine at meteo.dz  Sun Jul  9 08:26:44 2017
From: a.khiredine at meteo.dz (Atmane)
Date: Sun, 9 Jul 2017 08:26:44 +0100
Subject: [gpfsug-discuss] GPFS Storage Server (GSS)
Message-ID: <op.y23lmuj4ubgvfu@pc-atm>


From a.khiredine at meteo.dz  Sun Jul  9 09:00:07 2017
From: a.khiredine at meteo.dz (Atmane)
Date: Sun, 9 Jul 2017 09:00:07 +0100
Subject: [gpfsug-discuss] get free space in GSS
Message-ID: <op.y23m6hwlubgvfu@pc-atm>

Dear all,

My name is Khiredine Atmane and I am a HPC system administrator at the  
National Office of Meteorology Algeria . We have a GSS24 running  
gss2.5.10.3-3b and gpfs-4.2.0.3.

GSS configuration: 4 enclosures, 6 SSDs, 1 empty slots, 239 disks total, 0  
NVRAM partitions

disks = 3Tb
SSD = 200 Gb
df -h
Filesystem             Size  Used Avail Use% Mounted on

/dev/gpfs1              49T   18T   31T  38% /gpfs1
/dev/gpfs2              53T   13T   40T  25% /gpfs2
/dev/gpfs3              25T  4.9T   20T  21% /gpfs3
/dev/gpfs4              11T  133M   11T   1% /gpfs4
/dev/gpfs5             323T   34T  290T  11% /gpfs5

Total Is 461 To

I think we have more space
Could anyone make recommendation to troubleshoot find how many free space  
in GSS ?
How to find the available space ?
Thank you!

Atmane


-- 
Atmane Khiredine
HPC System Admin | Office National de la M?t?orologie
T?l : +213 21 50 73 93 Poste 303 | Fax : +213 21 50 79 40 | E-mail :  
a.khiredine at meteo.dz


From laurence at qsplace.co.uk  Sun Jul  9 09:58:05 2017
From: laurence at qsplace.co.uk (Laurence Horrocks-Barlow)
Date: Sun, 09 Jul 2017 09:58:05 +0100
Subject: [gpfsug-discuss] get free space in GSS
In-Reply-To: <op.y23m6hwlubgvfu@pc-atm>
References: <op.y23m6hwlubgvfu@pc-atm>
Message-ID: <D6DD7F9F-D7E4-47EF-8575-67CB20D3C31D@qsplace.co.uk>

You can check the recovery groups to see if there is any remaining space.

I don't have access to my test system to confirm the syntax however if memory serves.

Run mmlsrecoverygroup to get a list of all the recovery groups then:

mmlsrecoverygroup <YOURRECOVERYGROUP> -L

This will list all your declustered arrays and their free space.

Their might be another method, however this way has always worked well for me.

-- Lauz


On 9 July 2017 09:00:07 BST, Atmane <a.khiredine at meteo.dz> wrote:
>Dear all,
>
>My name is Khiredine Atmane and I am a HPC system administrator at the 
>
>National Office of Meteorology Algeria . We have a GSS24 running  
>gss2.5.10.3-3b and gpfs-4.2.0.3.
>
>GSS configuration: 4 enclosures, 6 SSDs, 1 empty slots, 239 disks
>total, 0  
>NVRAM partitions
>
>disks = 3Tb
>SSD = 200 Gb
>df -h
>Filesystem             Size  Used Avail Use% Mounted on
>
>/dev/gpfs1              49T   18T   31T  38% /gpfs1
>/dev/gpfs2              53T   13T   40T  25% /gpfs2
>/dev/gpfs3              25T  4.9T   20T  21% /gpfs3
>/dev/gpfs4              11T  133M   11T   1% /gpfs4
>/dev/gpfs5             323T   34T  290T  11% /gpfs5
>
>Total Is 461 To
>
>I think we have more space
>Could anyone make recommendation to troubleshoot find how many free
>space  
>in GSS ?
>How to find the available space ?
>Thank you!
>
>Atmane
>
>
>
>-- 
>Atmane Khiredine
>HPC System Admin | Office National de la M?t?orologie
>T?l : +213 21 50 73 93 Poste 303 | Fax : +213 21 50 79 40 | E-mail :  
>a.khiredine at meteo.dz
>_______________________________________________
>gpfsug-discuss mailing list
>gpfsug-discuss at spectrumscale.org
>http://gpfsug.org/mailman/listinfo/gpfsug-discuss

-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20170709/65ce244e/attachment.htm>

From a.khiredine at meteo.dz  Sun Jul  9 13:26:26 2017
From: a.khiredine at meteo.dz (atmane khiredine)
Date: Sun, 9 Jul 2017 12:26:26 +0000
Subject: [gpfsug-discuss] get free space in GSS
In-Reply-To: <D6DD7F9F-D7E4-47EF-8575-67CB20D3C31D@qsplace.co.uk>
References: <op.y23m6hwlubgvfu@pc-atm>,
	<D6DD7F9F-D7E4-47EF-8575-67CB20D3C31D@qsplace.co.uk>
Message-ID: <4B32CB5C696F2849BDEF7DF9EACE884B5B41A465@SDEB-EXC02.meteo.dz>

thank you very much for replying. I can not find the free space

Here is the output of mmlsrecoverygroup

[root at server1 ~]#mmlsrecoverygroup

                     declustered
                     arrays with
 recovery group        vdisks     vdisks  servers
 ------------------  -----------  ------  -------
 BB1RGL                        3      18  server1,server2
 BB1RGR                        3      18  server2,server1
--------------------------------------------------------------
[root at server ~]# mmlsrecoverygroup BB1RGL -L

                    declustered
 recovery group       arrays     vdisks  pdisks  format version
 -----------------  -----------  ------  ------  --------------
 BB1RGL                       3      18     119  4.2.0.1

 declustered   needs                            replace                scrub       background activity
    array     service  vdisks  pdisks  spares  threshold  free space  duration  task   progress  priority
 -----------  -------  ------  ------  ------  ---------  ----------  --------  -------------------------
 LOG          no            1       3     0,0          1     558 GiB   14 days  scrub       51%  low
 DA1          no           11      58    2,31          2      12 GiB   14 days  scrub       78%  low
 DA2          no            6      58    2,31          2    4096 MiB   14 days  scrub       10%  low

                                         declustered                           checksum
 vdisk               RAID code              array     vdisk size  block size  granularity  state remarks
 ------------------  ------------------  -----------  ----------  ----------  -----------  ----- -------
 gss0_logtip         3WayReplication     LOG             128 MiB      1 MiB       512      ok    logTip
 gss0_loghome        4WayReplication     DA1              40 GiB      1 MiB       512      ok    log
 BB1RGL_GPFS4_META1    4WayReplication     DA1             451 GiB      1 MiB     32 KiB     ok
 BB1RGL_GPFS4_DATA1    8+2p                DA1            5133 GiB      1 MiB     32 KiB     ok
 BB1RGL_GPFS1_META1    4WayReplication     DA1             451 GiB      1 MiB     32 KiB     ok
 BB1RGL_GPFS1_DATA1    8+2p                DA1              12 TiB      1 MiB     32 KiB     ok
 BB1RGL_GPFS3_META1   4WayReplication     DA1             451 GiB      1 MiB     32 KiB     ok
 BB1RGL_GPFS3_DATA1   8+2p                DA1              12 TiB      1 MiB     32 KiB     ok
 BB1RGL_GPFS2_META1   4WayReplication     DA1             451 GiB      1 MiB     32 KiB     ok
 BB1RGL_GPFS2_DATA1   8+2p                DA1              13 TiB      2 MiB     32 KiB     ok
 BB1RGL_GPFS2_META2   4WayReplication     DA2             451 GiB      1 MiB     32 KiB     ok
 BB1RGL_GPFS2_DATA2   8+2p                DA2              13 TiB      2 MiB     32 KiB     ok
 BB1RGL_GPFS1_META2    4WayReplication     DA2             451 GiB      1 MiB     32 KiB     ok
 BB1RGL_GPFS1_DATA2    8+2p                DA2              12 TiB      1 MiB     32 KiB     ok
 BB1RGL_GPFS5_META1   4WayReplication     DA1             750 GiB      1 MiB     32 KiB     ok
 BB1RGL_GPFS5_DATA1   8+2p                DA1              70 TiB     16 MiB     32 KiB     ok
 BB1RGL_GPFS5_META2   4WayReplication     DA2             750 GiB      1 MiB     32 KiB     ok
 BB1RGL_GPFS5_DATA2   8+2p                DA2              90 TiB     16 MiB     32 KiB     ok

 config data         declustered array   VCD spares     actual rebuild spare space         remarks
 ------------------  ------------------  -------------  ---------------------------------  ----------------
 rebuild space       DA1                 31             34 pdisk
 rebuild space       DA2                 31             35 pdisk


 config data         max disk group fault tolerance     actual disk group fault tolerance  remarks
 ------------------  ---------------------------------  ---------------------------------  ----------------
 rg descriptor       1 enclosure + 1 drawer             1 enclosure + 1 drawer             limiting fault tolerance
 system index        2 enclosure                        1 enclosure + 1 drawer             limited by rg descriptor

 vdisk               max disk group fault tolerance     actual disk group fault tolerance  remarks
 ------------------  ---------------------------------  ---------------------------------  ----------------
 gss0_logtip         2 enclosure                        1 enclosure + 1 drawer             limited by rg descriptor
 gss0_loghome        1 enclosure + 1 drawer             1 enclosure + 1 drawer
 BB1RGL_GPFS4_META1    1 enclosure + 1 drawer             1 enclosure + 1 drawer
 BB1RGL_GPFS4_DATA1    2 drawer                           2 drawer
 BB1RGL_GPFS1_META1    1 enclosure + 1 drawer             1 enclosure + 1 drawer
 BB1RGL_GPFS1_DATA1    2 drawer                           2 drawer
 BB1RGL_GPFS3_META1   1 enclosure + 1 drawer             1 enclosure + 1 drawer
 BB1RGL_GPFS3_DATA1   2 drawer                           2 drawer
 BB1RGL_GPFS2_META1   1 enclosure + 1 drawer             1 enclosure + 1 drawer
 BB1RGL_GPFS2_DATA1   2 drawer                           2 drawer
 BB1RGL_GPFS2_META2   3 enclosure                        1 enclosure + 1 drawer             limited by rg descriptor
 BB1RGL_GPFS2_DATA2   2 drawer                           2 drawer
 BB1RGL_GPFS1_META2    3 enclosure                        1 enclosure + 1 drawer             limited by rg descriptor
 BB1RGL_GPFS1_DATA2    2 drawer                           2 drawer
 BB1RGL_GPFS5_META1   1 enclosure + 1 drawer             1 enclosure + 1 drawer
 BB1RGL_GPFS5_DATA1   2 drawer                           2 drawer
 BB1RGL_GPFS5_META2   3 enclosure                        1 enclosure + 1 drawer             limited by rg descriptor
 BB1RGL_GPFS5_DATA2   2 drawer                           2 drawer

 active recovery group server                     servers
 -----------------------------------------------  -------
 server1                                         server1,server2


Atmane Khiredine
HPC System Administrator | Office National de la M?t?orologie
T?l : +213 21 50 73 93 # 303 | Fax : +213 21 50 79 40 | E-mail : a.khiredine at meteo.dz
________________________________
De : Laurence Horrocks-Barlow [laurence at qsplace.co.uk]
Envoy? : dimanche 9 juillet 2017 09:58
? : gpfsug main discussion list; atmane khiredine; gpfsug-discuss at spectrumscale.org
Objet : Re: [gpfsug-discuss] get free space in GSS

You can check the recovery groups to see if there is any remaining space.

I don't have access to my test system to confirm the syntax however if memory serves.

Run mmlsrecoverygroup to get a list of all the recovery groups then:

mmlsrecoverygroup <YOURRECOVERYGROUP> -L

This will list all your declustered arrays and their free space.

Their might be another method, however this way has always worked well for me.

-- Lauz


On 9 July 2017 09:00:07 BST, Atmane <a.khiredine at meteo.dz> wrote:

Dear all,

My name is Khiredine Atmane and I am a HPC system administrator at the
National Office of Meteorology Algeria . We have a GSS24 running
gss2.5.10.3<http://2.5.10.3>-3b and gpfs-4.2.0.3<http://4.2.0.3>.

GSS configuration: 4 enclosures, 6 SSDs, 1 empty slots, 239 disks total, 0
NVRAM partitions

disks = 3Tb
SSD = 200 Gb
df -h
Filesystem             Size  Used Avail Use% Mounted on

/dev/gpfs1              49T   18T   31T  38% /gpfs1
/dev/gpfs2              53T   13T   40T  25% /gpfs2
/dev/gpfs3              25T  4.9T   20T  21% /gpfs3
/dev/gpfs4              11T  133M   11T   1% /gpfs4
/dev/gpfs5             323T   34T  290T  11% /gpfs5

Total Is 461 To

I think we have more space
Could anyone make recommendation to troubleshoot find how many free space
in GSS ?
How to find the available space ?
Thank you!

Atmane


--
Sent from my Android device with K-9 Mail. Please excuse my brevity.


From janfrode at tanso.net  Sun Jul  9 17:45:32 2017
From: janfrode at tanso.net (Jan-Frode Myklebust)
Date: Sun, 09 Jul 2017 16:45:32 +0000
Subject: [gpfsug-discuss] get free space in GSS
In-Reply-To: <4B32CB5C696F2849BDEF7DF9EACE884B5B41A465@SDEB-EXC02.meteo.dz>
References: <op.y23m6hwlubgvfu@pc-atm>
	<D6DD7F9F-D7E4-47EF-8575-67CB20D3C31D@qsplace.co.uk>
	<4B32CB5C696F2849BDEF7DF9EACE884B5B41A465@SDEB-EXC02.meteo.dz>
Message-ID: <CAHwPath=hV9pOMefeiHK7ke6moFNZd-6fYfcEA8+XUxPSQTjWg@mail.gmail.com>

You had it here:


[root at server ~]# mmlsrecoverygroup BB1RGL -L

declustered
recovery group arrays vdisks pdisks format version
----------------- ----------- ------ ------ --------------
BB1RGL 3 18 119 4.2.0.1

declustered needs replace scrub background activity
array service vdisks pdisks spares threshold free space duration task
progress priority
----------- ------- ------ ------ ------ --------- ---------- --------
-------------------------
LOG no 1 3 0,0 1 558 GiB 14 days scrub 51% low
DA1 no 11 58 2,31 2 12 GiB 14 days scrub 78% low
DA2 no 6 58 2,31 2 4096 MiB 14 days scrub 10% low


12 GiB in DA1, and 4096 MiB i DA2, but effectively you'll get less when you
add a raidCode to the vdisk. Best way to use it id to just don't specify a
size to the vdisk, and max possible size will be used.


-jf
s?n. 9. jul. 2017 kl. 14.26 skrev atmane khiredine <a.khiredine at meteo.dz>:

> thank you very much for replying. I can not find the free space
>
> Here is the output of mmlsrecoverygroup
>
> [root at server1 ~]#mmlsrecoverygroup
>
>                      declustered
>                      arrays with
>  recovery group        vdisks     vdisks  servers
>  ------------------  -----------  ------  -------
>  BB1RGL                        3      18  server1,server2
>  BB1RGR                        3      18  server2,server1
> --------------------------------------------------------------
> [root at server ~]# mmlsrecoverygroup BB1RGL -L
>
>                     declustered
>  recovery group       arrays     vdisks  pdisks  format version
>  -----------------  -----------  ------  ------  --------------
>  BB1RGL                       3      18     119  4.2.0.1
>
>  declustered   needs                            replace
> scrub       background activity
>     array     service  vdisks  pdisks  spares  threshold  free space
> duration  task   progress  priority
>  -----------  -------  ------  ------  ------  ---------  ----------
> --------  -------------------------
>  LOG          no            1       3     0,0          1     558 GiB   14
> days  scrub       51%  low
>  DA1          no           11      58    2,31          2      12 GiB   14
> days  scrub       78%  low
>  DA2          no            6      58    2,31          2    4096 MiB   14
> days  scrub       10%  low
>
>                                          declustered
>      checksum
>  vdisk               RAID code              array     vdisk size  block
> size  granularity  state remarks
>  ------------------  ------------------  -----------  ----------
> ----------  -----------  ----- -------
>  gss0_logtip         3WayReplication     LOG             128 MiB      1
> MiB       512      ok    logTip
>  gss0_loghome        4WayReplication     DA1              40 GiB      1
> MiB       512      ok    log
>  BB1RGL_GPFS4_META1    4WayReplication     DA1             451 GiB      1
> MiB     32 KiB     ok
>  BB1RGL_GPFS4_DATA1    8+2p                DA1            5133 GiB      1
> MiB     32 KiB     ok
>  BB1RGL_GPFS1_META1    4WayReplication     DA1             451 GiB      1
> MiB     32 KiB     ok
>  BB1RGL_GPFS1_DATA1    8+2p                DA1              12 TiB      1
> MiB     32 KiB     ok
>  BB1RGL_GPFS3_META1   4WayReplication     DA1             451 GiB      1
> MiB     32 KiB     ok
>  BB1RGL_GPFS3_DATA1   8+2p                DA1              12 TiB      1
> MiB     32 KiB     ok
>  BB1RGL_GPFS2_META1   4WayReplication     DA1             451 GiB      1
> MiB     32 KiB     ok
>  BB1RGL_GPFS2_DATA1   8+2p                DA1              13 TiB      2
> MiB     32 KiB     ok
>  BB1RGL_GPFS2_META2   4WayReplication     DA2             451 GiB      1
> MiB     32 KiB     ok
>  BB1RGL_GPFS2_DATA2   8+2p                DA2              13 TiB      2
> MiB     32 KiB     ok
>  BB1RGL_GPFS1_META2    4WayReplication     DA2             451 GiB      1
> MiB     32 KiB     ok
>  BB1RGL_GPFS1_DATA2    8+2p                DA2              12 TiB      1
> MiB     32 KiB     ok
>  BB1RGL_GPFS5_META1   4WayReplication     DA1             750 GiB      1
> MiB     32 KiB     ok
>  BB1RGL_GPFS5_DATA1   8+2p                DA1              70 TiB     16
> MiB     32 KiB     ok
>  BB1RGL_GPFS5_META2   4WayReplication     DA2             750 GiB      1
> MiB     32 KiB     ok
>  BB1RGL_GPFS5_DATA2   8+2p                DA2              90 TiB     16
> MiB     32 KiB     ok
>
>  config data         declustered array   VCD spares     actual rebuild
> spare space         remarks
>  ------------------  ------------------  -------------
> ---------------------------------  ----------------
>  rebuild space       DA1                 31             34 pdisk
>  rebuild space       DA2                 31             35 pdisk
>
>
>  config data         max disk group fault tolerance     actual disk group
> fault tolerance  remarks
>  ------------------  ---------------------------------
> ---------------------------------  ----------------
>  rg descriptor       1 enclosure + 1 drawer             1 enclosure + 1
> drawer             limiting fault tolerance
>  system index        2 enclosure                        1 enclosure + 1
> drawer             limited by rg descriptor
>
>  vdisk               max disk group fault tolerance     actual disk group
> fault tolerance  remarks
>  ------------------  ---------------------------------
> ---------------------------------  ----------------
>  gss0_logtip         2 enclosure                        1 enclosure + 1
> drawer             limited by rg descriptor
>  gss0_loghome        1 enclosure + 1 drawer             1 enclosure + 1
> drawer
>  BB1RGL_GPFS4_META1    1 enclosure + 1 drawer             1 enclosure + 1
> drawer
>  BB1RGL_GPFS4_DATA1    2 drawer                           2 drawer
>  BB1RGL_GPFS1_META1    1 enclosure + 1 drawer             1 enclosure + 1
> drawer
>  BB1RGL_GPFS1_DATA1    2 drawer                           2 drawer
>  BB1RGL_GPFS3_META1   1 enclosure + 1 drawer             1 enclosure + 1
> drawer
>  BB1RGL_GPFS3_DATA1   2 drawer                           2 drawer
>  BB1RGL_GPFS2_META1   1 enclosure + 1 drawer             1 enclosure + 1
> drawer
>  BB1RGL_GPFS2_DATA1   2 drawer                           2 drawer
>  BB1RGL_GPFS2_META2   3 enclosure                        1 enclosure + 1
> drawer             limited by rg descriptor
>  BB1RGL_GPFS2_DATA2   2 drawer                           2 drawer
>  BB1RGL_GPFS1_META2    3 enclosure                        1 enclosure + 1
> drawer             limited by rg descriptor
>  BB1RGL_GPFS1_DATA2    2 drawer                           2 drawer
>  BB1RGL_GPFS5_META1   1 enclosure + 1 drawer             1 enclosure + 1
> drawer
>  BB1RGL_GPFS5_DATA1   2 drawer                           2 drawer
>  BB1RGL_GPFS5_META2   3 enclosure                        1 enclosure + 1
> drawer             limited by rg descriptor
>  BB1RGL_GPFS5_DATA2   2 drawer                           2 drawer
>
>  active recovery group server                     servers
>  -----------------------------------------------  -------
>  server1                                         server1,server2
>
>
> Atmane Khiredine
> HPC System Administrator | Office National de la M?t?orologie
> T?l : +213 21 50 73 93 # 303 | Fax : +213 21 50 79 40 | E-mail :
> a.khiredine at meteo.dz
> ________________________________
> De : Laurence Horrocks-Barlow [laurence at qsplace.co.uk]
> Envoy? : dimanche 9 juillet 2017 09:58
> ? : gpfsug main discussion list; atmane khiredine;
> gpfsug-discuss at spectrumscale.org
> Objet : Re: [gpfsug-discuss] get free space in GSS
>
> You can check the recovery groups to see if there is any remaining space.
>
> I don't have access to my test system to confirm the syntax however if
> memory serves.
>
> Run mmlsrecoverygroup to get a list of all the recovery groups then:
>
> mmlsrecoverygroup <YOURRECOVERYGROUP> -L
>
> This will list all your declustered arrays and their free space.
>
> Their might be another method, however this way has always worked well for
> me.
>
> -- Lauz
>
>
>
> On 9 July 2017 09:00:07 BST, Atmane <a.khiredine at meteo.dz> wrote:
>
> Dear all,
>
> My name is Khiredine Atmane and I am a HPC system administrator at the
> National Office of Meteorology Algeria . We have a GSS24 running
> gss2.5.10.3<http://2.5.10.3>-3b and gpfs-4.2.0.3<http://4.2.0.3>.
>
> GSS configuration: 4 enclosures, 6 SSDs, 1 empty slots, 239 disks total, 0
> NVRAM partitions
>
> disks = 3Tb
> SSD = 200 Gb
> df -h
> Filesystem             Size  Used Avail Use% Mounted on
>
> /dev/gpfs1              49T   18T   31T  38% /gpfs1
> /dev/gpfs2              53T   13T   40T  25% /gpfs2
> /dev/gpfs3              25T  4.9T   20T  21% /gpfs3
> /dev/gpfs4              11T  133M   11T   1% /gpfs4
> /dev/gpfs5             323T   34T  290T  11% /gpfs5
>
> Total Is 461 To
>
> I think we have more space
> Could anyone make recommendation to troubleshoot find how many free space
> in GSS ?
> How to find the available space ?
> Thank you!
>
> Atmane
>
>
>
> --
> Sent from my Android device with K-9 Mail. Please excuse my brevity.
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20170709/4e01a2df/attachment.htm>

From kums at us.ibm.com  Sun Jul  9 17:52:02 2017
From: kums at us.ibm.com (Kumaran Rajaram)
Date: Sun, 9 Jul 2017 12:52:02 -0400
Subject: [gpfsug-discuss] get free space in GSS
In-Reply-To: <4B32CB5C696F2849BDEF7DF9EACE884B5B41A465@SDEB-EXC02.meteo.dz>
References: <op.y23m6hwlubgvfu@pc-atm>,
	<D6DD7F9F-D7E4-47EF-8575-67CB20D3C31D@qsplace.co.uk>
	<4B32CB5C696F2849BDEF7DF9EACE884B5B41A465@SDEB-EXC02.meteo.dz>
Message-ID: <OFD17E90C0.8444C66F-ON00258158.005B7EE9-85258158.005CA7C7@notes.na.collabserv.com>

Hi Atmane,

>> I can not find the free space

Based on your output below, your setup currently has two recovery groups 
BB1RGL and BB1RGR.

Issue "mmlsrecoverygroup BB1RGL -L" and "mmlsrecoverygroup BB1RGR -L" to 
obtain free space in each DA.

Based on your "mmlsrecoverygroup BB1RGL -L" output below, BB1RGL "DA1" has 
12GiB and "DA2" has 4GiB free space. The metadataOnly and dataOnly 
vdisk/NSD are created from DA1 and DA2. 

 declustered   needs                            replace scrub background 
activity
    array     service  vdisks  pdisks  spares  threshold  free space 
duration  task   progress  priority
 -----------  -------  ------  ------  ------  ---------  ---------- 
--------  -------------------------
 LOG          no            1       3     0,0          1     558 GiB   14 
days  scrub       51%  low
 DA1          no           11      58    2,31          2      12 GiB   14 
days  scrub       78%  low
 DA2          no            6      58    2,31          2    4096 MiB   14 
days  scrub       10%  low

In addition, you may use "mmlsnsd" to obtain mapping of file-system to 
vdisk/NSD + use "mmdf <fs>" command to query user or available capacity on 
a GPFS file system.

Hope this helps,
-Kums


From:   atmane khiredine <a.khiredine at meteo.dz>
To:     Laurence Horrocks-Barlow <laurence at qsplace.co.uk>, "gpfsug main 
discussion list" <gpfsug-discuss at spectrumscale.org>
Date:   07/09/2017 08:27 AM
Subject:        Re: [gpfsug-discuss] get free space in GSS
Sent by:        gpfsug-discuss-bounces at spectrumscale.org


thank you very much for replying. I can not find the free space

Here is the output of mmlsrecoverygroup

[root at server1 ~]#mmlsrecoverygroup

                     declustered
                     arrays with
 recovery group        vdisks     vdisks  servers
 ------------------  -----------  ------  -------
 BB1RGL                        3      18  server1,server2
 BB1RGR                        3      18  server2,server1
--------------------------------------------------------------
[root at server ~]# mmlsrecoverygroup BB1RGL -L

                    declustered
 recovery group       arrays     vdisks  pdisks  format version
 -----------------  -----------  ------  ------  --------------
 BB1RGL                       3      18     119  4.2.0.1

 declustered   needs                            replace scrub background 
activity
    array     service  vdisks  pdisks  spares  threshold  free space 
duration  task   progress  priority
 -----------  -------  ------  ------  ------  ---------  ---------- 
--------  -------------------------
 LOG          no            1       3     0,0          1     558 GiB   14 
days  scrub       51%  low
 DA1          no           11      58    2,31          2      12 GiB   14 
days  scrub       78%  low
 DA2          no            6      58    2,31          2    4096 MiB   14 
days  scrub       10%  low

                                         declustered     checksum
 vdisk               RAID code              array     vdisk size  block 
size  granularity  state remarks
 ------------------  ------------------  -----------  ---------- 
----------  -----------  ----- -------
 gss0_logtip         3WayReplication     LOG             128 MiB      1 
MiB       512      ok    logTip
 gss0_loghome        4WayReplication     DA1              40 GiB      1 
MiB       512      ok    log
 BB1RGL_GPFS4_META1    4WayReplication     DA1             451 GiB      1 
MiB     32 KiB     ok
 BB1RGL_GPFS4_DATA1    8+2p                DA1            5133 GiB      1 
MiB     32 KiB     ok
 BB1RGL_GPFS1_META1    4WayReplication     DA1             451 GiB      1 
MiB     32 KiB     ok
 BB1RGL_GPFS1_DATA1    8+2p                DA1              12 TiB      1 
MiB     32 KiB     ok
 BB1RGL_GPFS3_META1   4WayReplication     DA1             451 GiB      1 
MiB     32 KiB     ok
 BB1RGL_GPFS3_DATA1   8+2p                DA1              12 TiB      1 
MiB     32 KiB     ok
 BB1RGL_GPFS2_META1   4WayReplication     DA1             451 GiB      1 
MiB     32 KiB     ok
 BB1RGL_GPFS2_DATA1   8+2p                DA1              13 TiB      2 
MiB     32 KiB     ok
 BB1RGL_GPFS2_META2   4WayReplication     DA2             451 GiB      1 
MiB     32 KiB     ok
 BB1RGL_GPFS2_DATA2   8+2p                DA2              13 TiB      2 
MiB     32 KiB     ok
 BB1RGL_GPFS1_META2    4WayReplication     DA2             451 GiB      1 
MiB     32 KiB     ok
 BB1RGL_GPFS1_DATA2    8+2p                DA2              12 TiB      1 
MiB     32 KiB     ok
 BB1RGL_GPFS5_META1   4WayReplication     DA1             750 GiB      1 
MiB     32 KiB     ok
 BB1RGL_GPFS5_DATA1   8+2p                DA1              70 TiB     16 
MiB     32 KiB     ok
 BB1RGL_GPFS5_META2   4WayReplication     DA2             750 GiB      1 
MiB     32 KiB     ok
 BB1RGL_GPFS5_DATA2   8+2p                DA2              90 TiB     16 
MiB     32 KiB     ok

 config data         declustered array   VCD spares     actual rebuild 
spare space         remarks
 ------------------  ------------------  ------------- 
---------------------------------  ----------------
 rebuild space       DA1                 31             34 pdisk
 rebuild space       DA2                 31             35 pdisk


 config data         max disk group fault tolerance     actual disk group 
fault tolerance  remarks
 ------------------  --------------------------------- 
---------------------------------  ----------------
 rg descriptor       1 enclosure + 1 drawer             1 enclosure + 1 
drawer             limiting fault tolerance
 system index        2 enclosure                        1 enclosure + 1 
drawer             limited by rg descriptor

 vdisk               max disk group fault tolerance     actual disk group 
fault tolerance  remarks
 ------------------  --------------------------------- 
---------------------------------  ----------------
 gss0_logtip         2 enclosure                        1 enclosure + 1 
drawer             limited by rg descriptor
 gss0_loghome        1 enclosure + 1 drawer             1 enclosure + 1 
drawer
 BB1RGL_GPFS4_META1    1 enclosure + 1 drawer             1 enclosure + 1 
drawer
 BB1RGL_GPFS4_DATA1    2 drawer                           2 drawer
 BB1RGL_GPFS1_META1    1 enclosure + 1 drawer             1 enclosure + 1 
drawer
 BB1RGL_GPFS1_DATA1    2 drawer                           2 drawer
 BB1RGL_GPFS3_META1   1 enclosure + 1 drawer             1 enclosure + 1 
drawer
 BB1RGL_GPFS3_DATA1   2 drawer                           2 drawer
 BB1RGL_GPFS2_META1   1 enclosure + 1 drawer             1 enclosure + 1 
drawer
 BB1RGL_GPFS2_DATA1   2 drawer                           2 drawer
 BB1RGL_GPFS2_META2   3 enclosure                        1 enclosure + 1 
drawer             limited by rg descriptor
 BB1RGL_GPFS2_DATA2   2 drawer                           2 drawer
 BB1RGL_GPFS1_META2    3 enclosure                        1 enclosure + 1 
drawer             limited by rg descriptor
 BB1RGL_GPFS1_DATA2    2 drawer                           2 drawer
 BB1RGL_GPFS5_META1   1 enclosure + 1 drawer             1 enclosure + 1 
drawer
 BB1RGL_GPFS5_DATA1   2 drawer                           2 drawer
 BB1RGL_GPFS5_META2   3 enclosure                        1 enclosure + 1 
drawer             limited by rg descriptor
 BB1RGL_GPFS5_DATA2   2 drawer                           2 drawer

 active recovery group server                     servers
 -----------------------------------------------  -------
 server1                                         server1,server2


Atmane Khiredine
HPC System Administrator | Office National de la M?t?orologie
T?l : +213 21 50 73 93 # 303 | Fax : +213 21 50 79 40 | E-mail : 
a.khiredine at meteo.dz
________________________________
De : Laurence Horrocks-Barlow [laurence at qsplace.co.uk]
Envoy? : dimanche 9 juillet 2017 09:58
? : gpfsug main discussion list; atmane khiredine; 
gpfsug-discuss at spectrumscale.org
Objet : Re: [gpfsug-discuss] get free space in GSS

You can check the recovery groups to see if there is any remaining space.

I don't have access to my test system to confirm the syntax however if 
memory serves.

Run mmlsrecoverygroup to get a list of all the recovery groups then:

mmlsrecoverygroup <YOURRECOVERYGROUP> -L

This will list all your declustered arrays and their free space.

Their might be another method, however this way has always worked well for 
me.

-- Lauz


On 9 July 2017 09:00:07 BST, Atmane <a.khiredine at meteo.dz> wrote:

Dear all,

My name is Khiredine Atmane and I am a HPC system administrator at the
National Office of Meteorology Algeria . We have a GSS24 running
gss2.5.10.3<http://2.5.10.3>-3b and gpfs-4.2.0.3<http://4.2.0.3>.

GSS configuration: 4 enclosures, 6 SSDs, 1 empty slots, 239 disks total, 0
NVRAM partitions

disks = 3Tb
SSD = 200 Gb
df -h
Filesystem             Size  Used Avail Use% Mounted on

/dev/gpfs1              49T   18T   31T  38% /gpfs1
/dev/gpfs2              53T   13T   40T  25% /gpfs2
/dev/gpfs3              25T  4.9T   20T  21% /gpfs3
/dev/gpfs4              11T  133M   11T   1% /gpfs4
/dev/gpfs5             323T   34T  290T  11% /gpfs5

Total Is 461 To

I think we have more space
Could anyone make recommendation to troubleshoot find how many free space
in GSS ?
How to find the available space ?
Thank you!

Atmane


--
Sent from my Android device with K-9 Mail. Please excuse my brevity.
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20170709/e33b779f/attachment.htm>

From a.khiredine at meteo.dz  Mon Jul 10 10:39:27 2017
From: a.khiredine at meteo.dz (Atmane)
Date: Mon, 10 Jul 2017 10:39:27 +0100
Subject: [gpfsug-discuss] New Version Of GSS 3.1b 16-Feb-2017
Message-ID: <op.y25mf5w6ubgvfu@pc-atm>

Dear all,

There is a new version of GSS

Is there someone who made the update ?

thanks

Lenovo System x GPFS Storage Server (GSS) Version 3.1b 16-Feb-2017


What?s new in Lenovo GSS, Version 3.1
? New features:
- RHEL 7.2
? GSS Expandability
? Online addition of more JBODs to an existing GSS building block (max. 6  
JBOD total)
? Must be same JBOD type and drive type as in the existing building block
? Selectable Spectrum Scale (GPFS) software version and edition
?Four GSS tarballs, for Spectrum Scale {Standard or Advanced Edition} @  
{v4.1.1 or v4.2.1}
? Hardware news:
? 10TB drive support: two JBOD MTMs (0796-HCJ/16X and 0796-HCK/17X), drive  
FRU (01GV110), no drive option
? Withdrawal of the 3TB drive models (0796-HC3/07X and 0796-HC4/08X)
? GSS22 in xConfig (no more need for special-bid)
? Software and firmware news:
? Update of IBM Spectrum Scale v4.2.1 to latest PTF level
? Update of Intel OPA from 10.1 to 10.2 (incl. performance fixes)
? Refresh of server and adapter FW levels to Scalable Infrastructure ?16C?  
recommended levels
? Not much news this time, as ?16C? FW is almost identical to ?16B
- List GPFS RPM
gpfs.adv-4.2.1-2.12.x86_64.rpm
gpfs.base-4.2.1-2.12.x86_64.rpm
gpfs.callhome-4.2.1-1.000.el7.noarch.rpm
gpfs.callhome-ecc-client-4.2.1-1.000.noarch.rpm
gpfs.crypto-4.2.1-2.12.x86_64.rpm
gpfs.docs-4.2.1-2.12.noarch.rpm
gpfs.ext-4.2.1-2.12.x86_64.rpm
gpfs.gnr-4.2.1-2.12.x86_64.rpm
gpfs.gnr.base-1.0.0-0.x86_64.rpm
gpfs.gpl-4.2.1-2.12.noarch.rpm
gpfs.gskit-8.0.50-57.x86_64.rpm
gpfs.gss.firmware-4.2.0-5.x86_64.rpm
gpfs.gss.pmcollector-4.2.2-2.el7.x86_64.rpm
gpfs.gss.pmsensors-4.2.2-2.el7.x86_64.rpm
gpfs.gui-4.2.1-2.3.noarch.rpm
gpfs.java-4.2.2-2.x86_64.rpm
gpfs.msg.en_US-4.2.1-2.12.noarch.rpm


-- 
Atmane Khiredine
HPC System Admin | Office National de la M?t?orologie
T?l : +213 21 50 73 93 Poste 303 | Fax : +213 21 50 79 40 | E-mail :  
a.khiredine at meteo.dz


From Greg.Lehmann at csiro.au  Tue Jul 11 05:54:39 2017
From: Greg.Lehmann at csiro.au (Greg.Lehmann at csiro.au)
Date: Tue, 11 Jul 2017 04:54:39 +0000
Subject: [gpfsug-discuss] Requirement to keep gpfs.smb the same on all
 nodes
In-Reply-To: <HE1PR0602MB3225A2B3ED09D461BDB7AAAFDFAA0@HE1PR0602MB3225.eurprd06.prod.outlook.com>
References: <D581357C.3F1B0%s.j.thompson@bham.ac.uk>
	<VI1PR0602MB322977D4A4CE40B89C164ED4DFD70@VI1PR0602MB3229.eurprd06.prod.outlook.com>
	<HE1PR0602MB3225A2B3ED09D461BDB7AAAFDFAA0@HE1PR0602MB3225.eurprd06.prod.outlook.com>
Message-ID: <4c9ae144c1114b85b7f2cdc27eefd749@exch1-cdc.nexus.csiro.au>

Yes, although it is early days for us and I would not say we have finished testing as yet. We have upgraded twice to get there from 4.2.3-0. It seems OK and I have not noticed any changes from 4.2.3.0.

Greg

From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Sobey, Richard A
Sent: Friday, 7 July 2017 10:02 PM
To: gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
Subject: Re: [gpfsug-discuss] Requirement to keep gpfs.smb the same on all nodes

Just following up on this, has anyone successfully deployed Protocols (SMB) on RHEL 7.3 with the 4.2.3-2 packages?

Richard

From: gpfsug-discuss-bounces at spectrumscale.org<mailto:gpfsug-discuss-bounces at spectrumscale.org> [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Sobey, Richard A
Sent: 04 July 2017 12:12
To: gpfsug main discussion list <gpfsug-discuss at spectrumscale.org<mailto:gpfsug-discuss at spectrumscale.org>>
Subject: Re: [gpfsug-discuss] Requirement to keep gpfs.smb the same on all nodes

OK Simon, thanks. I suppose we're all in the same boat having to get change management approval etc!

From: gpfsug-discuss-bounces at spectrumscale.org<mailto:gpfsug-discuss-bounces at spectrumscale.org> [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Simon Thompson (IT Research Support)
Sent: 04 July 2017 12:09
To: gpfsug main discussion list <gpfsug-discuss at spectrumscale.org<mailto:gpfsug-discuss at spectrumscale.org>>
Subject: Re: [gpfsug-discuss] Requirement to keep gpfs.smb the same on all nodes

AFAIK. Always.

We have had the service eat itself BTW by having different code releases and trying this.

Yes its a PITA that we have to get a change approval for it (so we don't do it as often as we should)...

The upgrade process upgrades the SMB registry, we have also seen the CTDB lock stuff break when they are not running the same code release, so now we just don't do this.

Simon

From: <gpfsug-discuss-bounces at spectrumscale.org<mailto:gpfsug-discuss-bounces at spectrumscale.org>> on behalf of "Sobey, Richard A" <r.sobey at imperial.ac.uk<mailto:r.sobey at imperial.ac.uk>>
Reply-To: "gpfsug-discuss at spectrumscale.org<mailto:gpfsug-discuss at spectrumscale.org>" <gpfsug-discuss at spectrumscale.org<mailto:gpfsug-discuss at spectrumscale.org>>
Date: Tuesday, 4 July 2017 at 11:54
To: "gpfsug-discuss at spectrumscale.org<mailto:gpfsug-discuss at spectrumscale.org>" <gpfsug-discuss at spectrumscale.org<mailto:gpfsug-discuss at spectrumscale.org>>
Subject: [gpfsug-discuss] Requirement to keep gpfs.smb the same on all nodes

Hi all,

For how long has this requirement been in force, and why?

https://www.ibm.com/support/knowledgecenter/en/STXKQY_4.2.3/com.ibm.spectrum.scale.v4r23.doc/bl1ins_updatingsmb.htm

All protocol nodes running the SMB service must have the same version of gpfs.smb installed at any time. This requires a brief outage of the SMB service to upgrade gpfs.smb to the newer version across all protocol nodes. The procedure outlined here is intended to reduce the outage to a minimum.

Previously I've upgraded nodes one at a time over the course of a few days.

Is the impact just that we won't be supported, or will a hole open up beneath my feet and swallow me whole?

I really don't fancy the headache of getting approvals to get an outage of even 5 minutes at 6am....

Cheers
Richard

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20170711/7ef5002a/attachment.htm>

From heiner.billich at psi.ch  Tue Jul 11 10:36:39 2017
From: heiner.billich at psi.ch (Billich Heinrich Rainer (PSI))
Date: Tue, 11 Jul 2017 09:36:39 +0000
Subject: [gpfsug-discuss] does AFM support NFS via RDMA
Message-ID: <3478917F-7864-4D84-847F-1ED249098AD9@psi.ch>

Hello,

We run AFM using NFS as transport between home and cache. Using IP-over-Infiniband we see a throughput between 1 and 2 GB/s. This is not bad but far from what a native IB link provides ? 6GB/s . Does AFM?s nfs client on gateway nodes support NFS using RDMA? I would like to try.  Or should we try to tune nfs and the IP stack ? I wonder if anybody got throughput above 2 GB/s using IPoIB and FDR between two nodes?

We can?t use a native gpfs multicluster mount ? this links home and cache much too strong: If home fails cache will unmount the cache fileset ? this is what I get from the manuals. 

We run spectrum scale 4.2.2/4.2.3 on Redhat 7.

Thank you,

Heiner Billich

--
Paul Scherrer Institut
Heiner Billich
WHGA 106
CH 5232  Villigen
056 310 36 02
 

From abeattie at au1.ibm.com  Tue Jul 11 11:14:37 2017
From: abeattie at au1.ibm.com (Andrew Beattie)
Date: Tue, 11 Jul 2017 10:14:37 +0000
Subject: [gpfsug-discuss] does AFM support NFS via RDMA
In-Reply-To: <3478917F-7864-4D84-847F-1ED249098AD9@psi.ch>
References: <3478917F-7864-4D84-847F-1ED249098AD9@psi.ch>
Message-ID: <OF87483B3C.780C6F0B-ON0025815A.003833F9-0025815A.0038454B@notes.na.collabserv.com>

An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20170711/19ca5304/attachment.htm>

From bbanister at jumptrading.com  Tue Jul 11 15:46:42 2017
From: bbanister at jumptrading.com (Bryan Banister)
Date: Tue, 11 Jul 2017 14:46:42 +0000
Subject: [gpfsug-discuss] does AFM support NFS via RDMA
In-Reply-To: <OF87483B3C.780C6F0B-ON0025815A.003833F9-0025815A.0038454B@notes.na.collabserv.com>
References: <3478917F-7864-4D84-847F-1ED249098AD9@psi.ch>
	<OF87483B3C.780C6F0B-ON0025815A.003833F9-0025815A.0038454B@notes.na.collabserv.com>
Message-ID: <44d78fdff53f457a998f240cdf4510d0@jumptrading.com>

Sounds like a very interesting topic for an upcoming GPFS UG meeting? say SC?17?
-B

From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Andrew Beattie
Sent: Tuesday, July 11, 2017 5:15 AM
To: gpfsug-discuss at spectrumscale.org
Cc: gpfsug-discuss at spectrumscale.org; jake.carroll at uq.edu.au
Subject: Re: [gpfsug-discuss] does AFM support NFS via RDMA

Bilich,

Reach out to Jake Carrol at Uni of QLD

UQ have been playing with NFS over 10GB / 40GB and 100GB Ethernet
and there is LOTS of tuning that you can do to improve how things work

Regards,
Andrew Beattie
Software Defined Storage  - IT Specialist
Phone: 614-2133-7927
E-mail: abeattie at au1.ibm.com<mailto:abeattie at au1.ibm.com>


----- Original message -----
From: "Billich Heinrich Rainer (PSI)" <heiner.billich at psi.ch<mailto:heiner.billich at psi.ch>>
Sent by: gpfsug-discuss-bounces at spectrumscale.org<mailto:gpfsug-discuss-bounces at spectrumscale.org>
To: "gpfsug-discuss at spectrumscale.org<mailto:gpfsug-discuss at spectrumscale.org>" <gpfsug-discuss at spectrumscale.org<mailto:gpfsug-discuss at spectrumscale.org>>
Cc:
Subject: [gpfsug-discuss] does AFM support NFS via RDMA
Date: Tue, Jul 11, 2017 7:36 PM

Hello,

We run AFM using NFS as transport between home and cache. Using IP-over-Infiniband we see a throughput between 1 and 2 GB/s. This is not bad but far from what a native IB link provides ? 6GB/s . Does AFM?s nfs client on gateway nodes support NFS using RDMA? I would like to try.  Or should we try to tune nfs and the IP stack ? I wonder if anybody got throughput above 2 GB/s using IPoIB and FDR between two nodes?

We can?t use a native gpfs multicluster mount ? this links home and cache much too strong: If home fails cache will unmount the cache fileset ? this is what I get from the manuals.

We run spectrum scale 4.2.2/4.2.3 on Redhat 7.

Thank you,

Heiner Billich

--
Paul Scherrer Institut
Heiner Billich
WHGA 106
CH 5232  Villigen
056 310 36 02


_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


________________________________

Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you are hereby notified that any review, dissemination or copying of this email is strictly prohibited, and to please notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request or solicitation of any kind to buy, sell, subscribe, redeem or perform any type of transaction of a financial product.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20170711/5d266b79/attachment.htm>

From jake.carroll at uq.edu.au  Tue Jul 11 22:38:43 2017
From: jake.carroll at uq.edu.au (Jake Carroll)
Date: Tue, 11 Jul 2017 21:38:43 +0000
Subject: [gpfsug-discuss] does AFM support NFS via RDMA
In-Reply-To: <44d78fdff53f457a998f240cdf4510d0@jumptrading.com>
References: <3478917F-7864-4D84-847F-1ED249098AD9@psi.ch>
	<OF87483B3C.780C6F0B-ON0025815A.003833F9-0025815A.0038454B@notes.na.collabserv.com>
	<44d78fdff53f457a998f240cdf4510d0@jumptrading.com>
Message-ID: <72D0CC62-8663-4072-AFA1-735D75EEBBE1@uq.edu.au>

I?ll be there!

From: Bryan Banister <bbanister at jumptrading.com>
Date: Wednesday, 12 July 2017 at 12:46 am
To: gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
Cc: Jake Carroll <jake.carroll at uq.edu.au>
Subject: RE: [gpfsug-discuss] does AFM support NFS via RDMA

Sounds like a very interesting topic for an upcoming GPFS UG meeting? say SC?17?
-B

From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Andrew Beattie
Sent: Tuesday, July 11, 2017 5:15 AM
To: gpfsug-discuss at spectrumscale.org
Cc: gpfsug-discuss at spectrumscale.org; jake.carroll at uq.edu.au
Subject: Re: [gpfsug-discuss] does AFM support NFS via RDMA

Bilich,

Reach out to Jake Carrol at Uni of QLD

UQ have been playing with NFS over 10GB / 40GB and 100GB Ethernet
and there is LOTS of tuning that you can do to improve how things work

Regards,
Andrew Beattie
Software Defined Storage  - IT Specialist
Phone: 614-2133-7927
E-mail: abeattie at au1.ibm.com<mailto:abeattie at au1.ibm.com>


----- Original message -----
From: "Billich Heinrich Rainer (PSI)" <heiner.billich at psi.ch<mailto:heiner.billich at psi.ch>>
Sent by: gpfsug-discuss-bounces at spectrumscale.org<mailto:gpfsug-discuss-bounces at spectrumscale.org>
To: "gpfsug-discuss at spectrumscale.org<mailto:gpfsug-discuss at spectrumscale.org>" <gpfsug-discuss at spectrumscale.org<mailto:gpfsug-discuss at spectrumscale.org>>
Cc:
Subject: [gpfsug-discuss] does AFM support NFS via RDMA
Date: Tue, Jul 11, 2017 7:36 PM

Hello,

We run AFM using NFS as transport between home and cache. Using IP-over-Infiniband we see a throughput between 1 and 2 GB/s. This is not bad but far from what a native IB link provides ? 6GB/s . Does AFM?s nfs client on gateway nodes support NFS using RDMA? I would like to try.  Or should we try to tune nfs and the IP stack ? I wonder if anybody got throughput above 2 GB/s using IPoIB and FDR between two nodes?

We can?t use a native gpfs multicluster mount ? this links home and cache much too strong: If home fails cache will unmount the cache fileset ? this is what I get from the manuals.

We run spectrum scale 4.2.2/4.2.3 on Redhat 7.

Thank you,

Heiner Billich

--
Paul Scherrer Institut
Heiner Billich
WHGA 106
CH 5232  Villigen
056 310 36 02


_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


________________________________

Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you are hereby notified that any review, dissemination or copying of this email is strictly prohibited, and to please notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request or solicitation of any kind to buy, sell, subscribe, redeem or perform any type of transaction of a financial product.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20170711/e1274a7b/attachment.htm>

From kkr at lbl.gov  Tue Jul 11 23:07:49 2017
From: kkr at lbl.gov (Kristy Kallback-Rose)
Date: Tue, 11 Jul 2017 15:07:49 -0700
Subject: [gpfsug-discuss] does AFM support NFS via RDMA
In-Reply-To: <44d78fdff53f457a998f240cdf4510d0@jumptrading.com>
References: <3478917F-7864-4D84-847F-1ED249098AD9@psi.ch>
	<OF87483B3C.780C6F0B-ON0025815A.003833F9-0025815A.0038454B@notes.na.collabserv.com>
	<44d78fdff53f457a998f240cdf4510d0@jumptrading.com>
Message-ID: <9BA6A8E3-D633-4DFF-826F-5ACE49361694@lbl.gov>

Sounds good. Is someone willing to take on this talk? User-driven talks on real experiences are always welcome.

Cheers,
Kristy

> On Jul 11, 2017, at 7:46 AM, Bryan Banister <bbanister at jumptrading.com> wrote:
> 
> Sounds like a very interesting topic for an upcoming GPFS UG meeting? say SC?17?
> -B
>  
> From: gpfsug-discuss-bounces at spectrumscale.org <mailto:gpfsug-discuss-bounces at spectrumscale.org> [mailto:gpfsug-discuss-bounces at spectrumscale.org <mailto:gpfsug-discuss-bounces at spectrumscale.org>] On Behalf Of Andrew Beattie
> Sent: Tuesday, July 11, 2017 5:15 AM
> To: gpfsug-discuss at spectrumscale.org <mailto:gpfsug-discuss at spectrumscale.org>
> Cc: gpfsug-discuss at spectrumscale.org <mailto:gpfsug-discuss at spectrumscale.org>; jake.carroll at uq.edu.au <mailto:jake.carroll at uq.edu.au>
> Subject: Re: [gpfsug-discuss] does AFM support NFS via RDMA
>  
> Bilich,
>  
> Reach out to Jake Carrol at Uni of QLD
>  
> UQ have been playing with NFS over 10GB / 40GB and 100GB Ethernet
> and there is LOTS of tuning that you can do to improve how things work
>  
> Regards,
> Andrew Beattie
> Software Defined Storage  - IT Specialist
> Phone: 614-2133-7927
> E-mail: abeattie at au1.ibm.com <mailto:abeattie at au1.ibm.com>
>  
>  
> ----- Original message -----
> From: "Billich Heinrich Rainer (PSI)" <heiner.billich at psi.ch <mailto:heiner.billich at psi.ch>>
> Sent by: gpfsug-discuss-bounces at spectrumscale.org <mailto:gpfsug-discuss-bounces at spectrumscale.org>
> To: "gpfsug-discuss at spectrumscale.org <mailto:gpfsug-discuss at spectrumscale.org>" <gpfsug-discuss at spectrumscale.org <mailto:gpfsug-discuss at spectrumscale.org>>
> Cc:
> Subject: [gpfsug-discuss] does AFM support NFS via RDMA
> Date: Tue, Jul 11, 2017 7:36 PM
>   
> Hello,
> 
> We run AFM using NFS as transport between home and cache. Using IP-over-Infiniband we see a throughput between 1 and 2 GB/s. This is not bad but far from what a native IB link provides ? 6GB/s . Does AFM?s nfs client on gateway nodes support NFS using RDMA? I would like to try.  Or should we try to tune nfs and the IP stack ? I wonder if anybody got throughput above 2 GB/s using IPoIB and FDR between two nodes?
> 
> We can?t use a native gpfs multicluster mount ? this links home and cache much too strong: If home fails cache will unmount the cache fileset ? this is what I get from the manuals.
> 
> We run spectrum scale 4.2.2/4.2.3 on Redhat 7.
> 
> Thank you,
> 
> Heiner Billich
> 
> --
> Paul Scherrer Institut
> Heiner Billich
> WHGA 106
> CH 5232  Villigen
> 056 310 36 02
>  
> 
> 
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org <http://spectrumscale.org/>
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss <http://gpfsug.org/mailman/listinfo/gpfsug-discuss>
>  
>  
> 
> 
> Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you are hereby notified that any review, dissemination or copying of this email is strictly prohibited, and to please notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request or solicitation of any kind to buy, sell, subscribe, redeem or perform any type of transaction of a financial product.
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org <http://spectrumscale.org/>
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss <http://gpfsug.org/mailman/listinfo/gpfsug-discuss>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20170711/7e335cdd/attachment.htm>

From Robert.Oesterlin at nuance.com  Wed Jul 12 17:06:40 2017
From: Robert.Oesterlin at nuance.com (Oesterlin, Robert)
Date: Wed, 12 Jul 2017 16:06:40 +0000
Subject: [gpfsug-discuss] WekaIO Unveils Cloud-Native Scalable File System
Message-ID: <DBD7C57A-07B3-4869-9FC3-0555B0A4CB5B@nuance.com>

Interesting. Performance is one thing, but how usable. IBM, watch your back :-)

?WekaIO is the world?s fastest distributed file system, processing four times the workload compared to IBM Spectrum Scale measured on Standard Performance Evaluation Corp. (SPEC) SFS 2014, an independent industry benchmark. Utilizing only 120 cloud compute instances with locally attached storage, WekaIO completed 1,000 simultaneous software builds compared to 240 on IBM?s high-end FlashSystem 900.?

https://www.hpcwire.com/off-the-wire/wekaio-unveils-cloud-native-scalable-file-system/

Bob Oesterlin
Sr Principal Storage Engineer, Nuance
507-269-0413


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20170712/373dc5b5/attachment.htm>

From oehmes at gmail.com  Wed Jul 12 18:24:19 2017
From: oehmes at gmail.com (Sven Oehme)
Date: Wed, 12 Jul 2017 17:24:19 +0000
Subject: [gpfsug-discuss] WekaIO Unveils Cloud-Native Scalable File
	System
In-Reply-To: <DBD7C57A-07B3-4869-9FC3-0555B0A4CB5B@nuance.com>
References: <DBD7C57A-07B3-4869-9FC3-0555B0A4CB5B@nuance.com>
Message-ID: <CALssuR13Hiri-O+jdWicB5dbfYurOjMsM6iXWcOTkXyAquW7ng@mail.gmail.com>

while i really like competition on SpecSFS, the claims from the WekaIO
people are lets say 'alternative facts' at best
The Spectrum Scale results were done on 4 Nodes with 2 Flash Storage
devices attached, they compare this to a WekaIO system with 14 times more
memory (14 TB vs 1TB) , 120 SSD's (vs 64 Flashcore Modules) across 15 times
more compute nodes (60 vs 4) .
said all this, the article claims 1000 builds, while the actual submission
only delivers 500 --> https://www.spec.org/sfs2014/results/sfs2014.html
so they need 14 times more memory and cores and 2 times flash to show twice
as many builds at double the response time, i leave this to everybody who
understands this facts to judge how great that result really is.
Said all this, Spectrum Scale scales almost linear if you double the nodes
, network and storage accordingly, so there is no reason to believe we
couldn't easily beat this, its just a matter of assemble the HW in a lab
and run the test. btw we scale to 10k+ nodes , 2500 times the number we
used in our publication :-D

Sven

On Wed, Jul 12, 2017 at 9:06 AM Oesterlin, Robert <
Robert.Oesterlin at nuance.com> wrote:

> Interesting. Performance is one thing, but how usable. IBM, watch your
> back :-)
>
>
>
> *?WekaIO is the world?s fastest distributed file system, processing four
> times the workload compared to IBM Spectrum Scale measured on Standard
> Performance Evaluation Corp. (SPEC) SFS 2014, an independent industry
> benchmark. Utilizing only 120 cloud compute instances with locally attached
> storage, WekaIO completed 1,000 simultaneous software builds compared to
> 240 on IBM?s high-end FlashSystem 900.?*
>
>
>
>
> https://www.hpcwire.com/off-the-wire/wekaio-unveils-cloud-native-scalable-file-system/
>
>
>
> Bob Oesterlin
> Sr Principal Storage Engineer, Nuance
> 507-269-0413 <(507)%20269-0413>
>
>
>
>
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20170712/6514125a/attachment.htm>

From ewahl at osc.edu  Wed Jul 12 19:20:06 2017
From: ewahl at osc.edu (Edward Wahl)
Date: Wed, 12 Jul 2017 14:20:06 -0400
Subject: [gpfsug-discuss] WekaIO Unveils Cloud-Native Scalable File
 System
In-Reply-To: <DBD7C57A-07B3-4869-9FC3-0555B0A4CB5B@nuance.com>
References: <DBD7C57A-07B3-4869-9FC3-0555B0A4CB5B@nuance.com>
Message-ID: <20170712142006.297cc9f2@osc.edu>


Ah benchmarks...

There are Lies, damn Lies, and then benchmarks. 

 I've been in HPC a while on both the vendor (Cray) and customer side, and until
 I see Lustre, BeeGFS, Spectrum Scale, StorNext, OrangeFS, CEPH, Gluster,
 'Flash in the pan v1', etc. all run on the EXACT same hardware I take ALL
 benchmarks with a POUND of salt.  Too easy to finagle whatever result you
 want.  Besides, benchmarks and real world performance are vastly different
 unless you are using IO kernels based on your local apps as your benchmark.

I have a feeling MANY of the folks on this list feel similarly. ;)

I recall when we figured out how someone cheated a SPEC test once by only using
the inner-track of drives.  ^_^  

Ed


On Wed, 12 Jul 2017 16:06:40 +0000
"Oesterlin, Robert" <Robert.Oesterlin at nuance.com> wrote:

> Interesting. Performance is one thing, but how usable. IBM, watch your
> back :-)
> 
> ?WekaIO is the world?s fastest distributed file system, processing four times
> the workload compared to IBM Spectrum Scale measured on Standard Performance
> Evaluation Corp. (SPEC) SFS 2014, an independent industry benchmark.
> Utilizing only 120 cloud compute instances with locally attached storage,
> WekaIO completed 1,000 simultaneous software builds compared to 240 on IBM?s
> high-end FlashSystem 900.?
> 
> https://www.hpcwire.com/off-the-wire/wekaio-unveils-cloud-native-scalable-file-system/
> 
> Bob Oesterlin
> Sr Principal Storage Engineer, Nuance
> 507-269-0413


-- 

Ed Wahl
Ohio Supercomputer Center
614-292-9302


From r.sobey at imperial.ac.uk  Wed Jul 12 19:20:32 2017
From: r.sobey at imperial.ac.uk (Sobey, Richard A)
Date: Wed, 12 Jul 2017 18:20:32 +0000
Subject: [gpfsug-discuss] WekaIO Unveils Cloud-Native Scalable File
	System
In-Reply-To: <DBD7C57A-07B3-4869-9FC3-0555B0A4CB5B@nuance.com>
References: <DBD7C57A-07B3-4869-9FC3-0555B0A4CB5B@nuance.com>
Message-ID: <HE1PR0602MB322544691B17D8899B0C3FFFDFAF0@HE1PR0602MB3225.eurprd06.prod.outlook.com>

I'm reading it as "WeakIO" which probably isn't a good thing.. both in the context of my eyesight and the negative connotation of the product :)


________________________________
From: gpfsug-discuss-bounces at spectrumscale.org <gpfsug-discuss-bounces at spectrumscale.org> on behalf of Oesterlin, Robert <Robert.Oesterlin at nuance.com>
Sent: 12 July 2017 17:06
To: gpfsug main discussion list
Subject: [gpfsug-discuss] WekaIO Unveils Cloud-Native Scalable File System


Interesting. Performance is one thing, but how usable. IBM, watch your back :-)


?WekaIO is the world?s fastest distributed file system, processing four times the workload compared to IBM Spectrum Scale measured on Standard Performance Evaluation Corp. (SPEC) SFS 2014, an independent industry benchmark. Utilizing only 120 cloud compute instances with locally attached storage, WekaIO completed 1,000 simultaneous software builds compared to 240 on IBM?s high-end FlashSystem 900.?


https://www.hpcwire.com/off-the-wire/wekaio-unveils-cloud-native-scalable-file-system/


Bob Oesterlin
Sr Principal Storage Engineer, Nuance
507-269-0413


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20170712/f97533f5/attachment.htm>

From Robert.Oesterlin at nuance.com  Wed Jul 12 19:27:12 2017
From: Robert.Oesterlin at nuance.com (Oesterlin, Robert)
Date: Wed, 12 Jul 2017 18:27:12 +0000
Subject: [gpfsug-discuss] WekaIO Unveils Cloud-Native Scalable File
	System
Message-ID: <92349D18-3614-4235-B30C-ADCCE3782CDD@nuance.com>

Ah yes - Sven keeping us honest!

Bob Oesterlin
Sr Principal Storage Engineer, Nuance


From: <gpfsug-discuss-bounces at spectrumscale.org> on behalf of Sven Oehme <oehmes at gmail.com>
Reply-To: gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
Date: Wednesday, July 12, 2017 at 12:24 PM
To: gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
Subject: [EXTERNAL] Re: [gpfsug-discuss] WekaIO Unveils Cloud-Native Scalable File System

while i really like competition on SpecSFS, the claims from the WekaIO people are lets say 'alternative facts' at best
The Spectrum Scale results were done on 4 Nodes with 2 Flash Storage devices attached, they compare this to a WekaIO system with 14 times more memory (14 TB vs 1TB) , 120 SSD's (vs 64 Flashcore Modules) across 15 times more compute nodes (60 vs 4) .
said all this, the article claims 1000 builds, while the actual submission only delivers 500 --> https://www.spec.org/sfs2014/results/sfs2014.html<https://urldefense.proofpoint.com/v2/url?u=https-3A__www.spec.org_sfs2014_results_sfs2014.html&d=DwMFaQ&c=djjh8EKwHtOepW4Bjau0lKhLlu-DxM1dlgP0rrLsOzY&r=LPDewt1Z4o9eKc86MXmhqX-45Cz1yz1ylYELF9olLKU&m=zcBZFHy-xVdLvgryp8EWMGnM08kl3lXfbmjrBY9obTo&s=QY5jFI43JPS9xpsQNhZZe0qteBYsdqwRHNM1biXUCQU&e=>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20170712/25005844/attachment.htm>

From sannaik2 at in.ibm.com  Fri Jul 14 06:55:30 2017
From: sannaik2 at in.ibm.com (Sandeep Naik1)
Date: Fri, 14 Jul 2017 11:25:30 +0530
Subject: [gpfsug-discuss] get free space in GSS
In-Reply-To: <OF7FCC872A.E4B14195-ON00258158.005CB16E@LocalDomain>
References: <op.y23m6hwlubgvfu@pc-atm>,
	<D6DD7F9F-D7E4-47EF-8575-67CB20D3C31D@qsplace.co.uk><4B32CB5C696F2849BDEF7DF9EACE884B5B41A465@SDEB-EXC02.meteo.dz>
	<OF7FCC872A.E4B14195-ON00258158.005CB16E@LocalDomain>
Message-ID: <OF73D80CDA.4CFC4BD2-ON6525815D.002019AA-6525815D.00208BE8@notes.na.collabserv.com>

Hi Atmane, 

There can be two meaning of available free space?
One what is available on existing filesystem. For this you rightly 
referred to df -h command o/p. This is the actual free space available in 
already created filesystem.

Filesystem             Size  Used Avail Use% Mounted on

/dev/gpfs1              49T   18T   31T  38% /gpfs1
/dev/gpfs2              53T   13T   40T  25% /gpfs2
/dev/gpfs3              25T  4.9T   20T  21% /gpfs3
/dev/gpfs4              11T  133M   11T   1% /gpfs4
/dev/gpfs5             323T   34T  290T  11% /gpfs5

The other is free space available in DA. For which as every one said use 
mmlsrecoverygroup <recovery group name> -L
Please note that is will give you raw free capacity. For usable free 
capacity in DA you have to add RAID over head.
But based on your o/p you have very little/no free space left in DA. 

[root at server1 ~]#mmlsrecoverygroup

                    declustered
                    arrays with
recovery group        vdisks     vdisks  servers
------------------  -----------  ------  -------
BB1RGL                        3      18  server1,server2
BB1RGR                        3      18  server2,server1
--------------------------------------------------------------
[root at server ~]# mmlsrecoverygroup BB1RGL -L

                   declustered
recovery group       arrays     vdisks  pdisks  format version
-----------------  -----------  ------  ------  --------------
BB1RGL                       3      18     119  4.2.0.1

declustered   needs                            replace scrub background 
activity
   array     service  vdisks  pdisks  spares  threshold  free space 
duration  task   progress  priority
-----------  -------  ------  ------  ------  ---------  ---------- 
--------  -------------------------
LOG          no            1       3     0,0          1     558 GiB   14 
days  scrub       51%  low
DA1          no           11      58    2,31          2      12 GiB   14 
days  scrub       78%  low
DA2          no            6      58    2,31          2    4096 MiB   14 
days  scrub       10%  low


Thanks,

Sandeep Naik
Elastic Storage server / GPFS Test 
ETZ-B, Hinjewadi Pune India
(+91) 8600994314


From:   "Kumaran Rajaram" <kums at us.ibm.com>
To:     gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>, 
atmane  khiredine <a.khiredine at meteo.dz>
Date:   09/07/2017 10:22 PM
Subject:        Re: [gpfsug-discuss] get free space in GSS
Sent by:        gpfsug-discuss-bounces at spectrumscale.org


Hi Atmane,

>> I can not find the free space

Based on your output below, your setup currently has two recovery groups 
BB1RGL and BB1RGR.

Issue "mmlsrecoverygroup BB1RGL -L" and "mmlsrecoverygroup BB1RGR -L" to 
obtain free space in each DA.

Based on your "mmlsrecoverygroup BB1RGL -L" output below, BB1RGL "DA1" has 
12GiB and "DA2" has 4GiB free space. The metadataOnly and dataOnly 
vdisk/NSD are created from DA1 and DA2. 

 declustered   needs                            replace scrub background 
activity
   array     service  vdisks  pdisks  spares  threshold  free space 
duration  task   progress  priority
-----------  -------  ------  ------  ------  ---------  ---------- 
--------  -------------------------
LOG          no            1       3     0,0          1     558 GiB   14 
days  scrub       51%  low
DA1          no           11      58    2,31          2      12 GiB   14 
days  scrub       78%  low
DA2          no            6      58    2,31          2    4096 MiB   14 
days  scrub       10%  low

In addition, you may use "mmlsnsd" to obtain mapping of file-system to 
vdisk/NSD + use "mmdf <fs>" command to query user or available capacity on 
a GPFS file system.

Hope this helps,
-Kums


From:        atmane khiredine <a.khiredine at meteo.dz>
To:        Laurence Horrocks-Barlow <laurence at qsplace.co.uk>, "gpfsug main 
       discussion list" <gpfsug-discuss at spectrumscale.org>
Date:        07/09/2017 08:27 AM
Subject:        Re: [gpfsug-discuss] get free space in GSS
Sent by:        gpfsug-discuss-bounces at spectrumscale.org


thank you very much for replying. I can not find the free space

Here is the output of mmlsrecoverygroup

[root at server1 ~]#mmlsrecoverygroup

                    declustered
                    arrays with
recovery group        vdisks     vdisks  servers
------------------  -----------  ------  -------
BB1RGL                        3      18  server1,server2
BB1RGR                        3      18  server2,server1
--------------------------------------------------------------
[root at server ~]# mmlsrecoverygroup BB1RGL -L

                   declustered
recovery group       arrays     vdisks  pdisks  format version
-----------------  -----------  ------  ------  --------------
BB1RGL                       3      18     119  4.2.0.1

declustered   needs                            replace scrub background 
activity
   array     service  vdisks  pdisks  spares  threshold  free space 
duration  task   progress  priority
-----------  -------  ------  ------  ------  ---------  ---------- 
--------  -------------------------
LOG          no            1       3     0,0          1     558 GiB   14 
days  scrub       51%  low
DA1          no           11      58    2,31          2      12 GiB   14 
days  scrub       78%  low
DA2          no            6      58    2,31          2    4096 MiB   14 
days  scrub       10%  low

                                        declustered    checksum
vdisk               RAID code              array     vdisk size  block 
size  granularity  state remarks
------------------  ------------------  -----------  ---------- ---------- 
 -----------  ----- -------
gss0_logtip         3WayReplication     LOG             128 MiB      1 MiB 
      512      ok    logTip
gss0_loghome        4WayReplication     DA1              40 GiB      1 MiB 
      512      ok    log
BB1RGL_GPFS4_META1    4WayReplication     DA1             451 GiB      1 
MiB     32 KiB     ok
BB1RGL_GPFS4_DATA1    8+2p                DA1            5133 GiB      1 
MiB     32 KiB     ok
BB1RGL_GPFS1_META1    4WayReplication     DA1             451 GiB      1 
MiB     32 KiB     ok
BB1RGL_GPFS1_DATA1    8+2p                DA1              12 TiB      1 
MiB     32 KiB     ok
BB1RGL_GPFS3_META1   4WayReplication     DA1             451 GiB      1 
MiB     32 KiB     ok
BB1RGL_GPFS3_DATA1   8+2p                DA1              12 TiB      1 
MiB     32 KiB     ok
BB1RGL_GPFS2_META1   4WayReplication     DA1             451 GiB      1 
MiB     32 KiB     ok
BB1RGL_GPFS2_DATA1   8+2p                DA1              13 TiB      2 
MiB     32 KiB     ok
BB1RGL_GPFS2_META2   4WayReplication     DA2             451 GiB      1 
MiB     32 KiB     ok
BB1RGL_GPFS2_DATA2   8+2p                DA2              13 TiB      2 
MiB     32 KiB     ok
BB1RGL_GPFS1_META2    4WayReplication     DA2             451 GiB      1 
MiB     32 KiB     ok
BB1RGL_GPFS1_DATA2    8+2p                DA2              12 TiB      1 
MiB     32 KiB     ok
BB1RGL_GPFS5_META1   4WayReplication     DA1             750 GiB      1 
MiB     32 KiB     ok
BB1RGL_GPFS5_DATA1   8+2p                DA1              70 TiB     16 
MiB     32 KiB     ok
BB1RGL_GPFS5_META2   4WayReplication     DA2             750 GiB      1 
MiB     32 KiB     ok
BB1RGL_GPFS5_DATA2   8+2p                DA2              90 TiB     16 
MiB     32 KiB     ok

config data         declustered array   VCD spares     actual rebuild 
spare space         remarks
------------------  ------------------  ------------- 
---------------------------------  ----------------
rebuild space       DA1                 31             34 pdisk
rebuild space       DA2                 31             35 pdisk


config data         max disk group fault tolerance     actual disk group 
fault tolerance  remarks
------------------  --------------------------------- 
---------------------------------  ----------------
rg descriptor       1 enclosure + 1 drawer             1 enclosure + 1 
drawer             limiting fault tolerance
system index        2 enclosure                        1 enclosure + 1 
drawer             limited by rg descriptor

vdisk               max disk group fault tolerance     actual disk group 
fault tolerance  remarks
------------------  --------------------------------- 
---------------------------------  ----------------
gss0_logtip         2 enclosure                        1 enclosure + 1 
drawer             limited by rg descriptor
gss0_loghome        1 enclosure + 1 drawer             1 enclosure + 1 
drawer
BB1RGL_GPFS4_META1    1 enclosure + 1 drawer             1 enclosure + 1 
drawer
BB1RGL_GPFS4_DATA1    2 drawer                           2 drawer
BB1RGL_GPFS1_META1    1 enclosure + 1 drawer             1 enclosure + 1 
drawer
BB1RGL_GPFS1_DATA1    2 drawer                           2 drawer
BB1RGL_GPFS3_META1   1 enclosure + 1 drawer             1 enclosure + 1 
drawer
BB1RGL_GPFS3_DATA1   2 drawer                           2 drawer
BB1RGL_GPFS2_META1   1 enclosure + 1 drawer             1 enclosure + 1 
drawer
BB1RGL_GPFS2_DATA1   2 drawer                           2 drawer
BB1RGL_GPFS2_META2   3 enclosure                        1 enclosure + 1 
drawer             limited by rg descriptor
BB1RGL_GPFS2_DATA2   2 drawer                           2 drawer
BB1RGL_GPFS1_META2    3 enclosure                        1 enclosure + 1 
drawer             limited by rg descriptor
BB1RGL_GPFS1_DATA2    2 drawer                           2 drawer
BB1RGL_GPFS5_META1   1 enclosure + 1 drawer             1 enclosure + 1 
drawer
BB1RGL_GPFS5_DATA1   2 drawer                           2 drawer
BB1RGL_GPFS5_META2   3 enclosure                        1 enclosure + 1 
drawer             limited by rg descriptor
BB1RGL_GPFS5_DATA2   2 drawer                           2 drawer

active recovery group server                     servers
-----------------------------------------------  -------
server1                                         server1,server2


Atmane Khiredine
HPC System Administrator | Office National de la M?t?orologie
T?l : +213 21 50 73 93 # 303 | Fax : +213 21 50 79 40 | E-mail : 
a.khiredine at meteo.dz
________________________________
De : Laurence Horrocks-Barlow [laurence at qsplace.co.uk]
Envoy? : dimanche 9 juillet 2017 09:58
? : gpfsug main discussion list; atmane khiredine; 
gpfsug-discuss at spectrumscale.org
Objet : Re: [gpfsug-discuss] get free space in GSS

You can check the recovery groups to see if there is any remaining space.

I don't have access to my test system to confirm the syntax however if 
memory serves.

Run mmlsrecoverygroup to get a list of all the recovery groups then:

mmlsrecoverygroup <YOURRECOVERYGROUP> -L

This will list all your declustered arrays and their free space.

Their might be another method, however this way has always worked well for 
me.

-- Lauz


On 9 July 2017 09:00:07 BST, Atmane <a.khiredine at meteo.dz> wrote:

Dear all,

My name is Khiredine Atmane and I am a HPC system administrator at the
National Office of Meteorology Algeria . We have a GSS24 running
gss2.5.10.3<http://2.5.10.3>-3b and gpfs-4.2.0.3<http://4.2.0.3>.

GSS configuration: 4 enclosures, 6 SSDs, 1 empty slots, 239 disks total, 0
NVRAM partitions

disks = 3Tb
SSD = 200 Gb
df -h
Filesystem             Size  Used Avail Use% Mounted on

/dev/gpfs1              49T   18T   31T  38% /gpfs1
/dev/gpfs2              53T   13T   40T  25% /gpfs2
/dev/gpfs3              25T  4.9T   20T  21% /gpfs3
/dev/gpfs4              11T  133M   11T   1% /gpfs4
/dev/gpfs5             323T   34T  290T  11% /gpfs5

Total Is 461 To

I think we have more space
Could anyone make recommendation to troubleshoot find how many free space
in GSS ?
How to find the available space ?
Thank you!

Atmane


--
Sent from my Android device with K-9 Mail. Please excuse my brevity.
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20170714/dc78d454/attachment.htm>

From S.J.Thompson at bham.ac.uk  Mon Jul 17 13:13:58 2017
From: S.J.Thompson at bham.ac.uk (Simon Thompson (IT Research Support))
Date: Mon, 17 Jul 2017 12:13:58 +0000
Subject: [gpfsug-discuss] Job Vacancy: Research Storage Systems Senior
	Specialist/Specialist
Message-ID: <D5926895.40AB8%s.j.thompson@bham.ac.uk>

Hi all,

Members of this group may be particularly interested in the role "Research
Storage Systems Senior Specialist/Specialist"...


As part of the University of Birmingham's investment in our ability to
support outstanding research by providing technical computing facilities,
we are expanding the team and currently have 6 vacancies. I've provided a
short description of each post, but please do follow the links where you
will find the full job description attached at the bottom of the page.

For some of the posts, they are graded either at 7 or 8 and will be
appointed based upon skills and experience, the expectation is that if the
appointment is made at grade 7 that as the successful candidate grows into
the role, we should be able to regrade up.

Research Storage Systems Senior Specialist/Specialist:
https://goo.gl/NsL1EG
Responsible for the delivery and maintenance of research storage systems,
focussed on the delivery of Spectrum Scale storage systems and data
protection.
(this is available either as a grade 8 or grade 7 post depending on skills
and experience so may suit someone wishing to grow into the senior role)

HPC Specialist post (Research Systems Administrator / Senior Research
Systems Administrator):
https://goo.gl/1SxM4j
Helping to deliver and operationally support the technical computing
environments, with a focus on supporting and delivery of HPC and HTC
services.
(this is available either as a grade 7 or grade 8 post depending on skills
and experience so may suit someone wishing to grow into the senior role)

Research Computing (Analytics):
https://goo.gl/uCNdMH
Helping our researchers to understand data analytics and supporting their
research

Senior Research Software Engineer:
https://goo.gl/dcGgAz
Working with research groups to develop and deliver bespoke software
solutions to support their research

Research Training and Engagement Officer:
https://goo.gl/U48m7z
Helping with the delivery and coordination of training and engagement
works to support users helping ensure they are able to use the facilities
to
support their research.

Research IT Partner in the College of Arts and Law:
https://goo.gl/A7czEA
Providing technical knowledge and skills to support project delivery
through research bid preparation to successful solution delivery.

Simon


From cgirda at wustl.edu  Mon Jul 17 20:40:42 2017
From: cgirda at wustl.edu (Chakravarthy Girda)
Date: Mon, 17 Jul 2017 14:40:42 -0500
Subject: [gpfsug-discuss] Setting up IBM Spectrum Scale performance
 monitoring bridge for Grafana
Message-ID: <b5c889d6-4c1e-1e00-7c60-8a38753f676f@wustl.edu>

Hello Team,

 This is Chakri from Washu at STL. Thank you for the great opportunity
to join this group. I am trying to setup performance monitoring for our
GPFS cluster. As part of the project configured pmcollector and
pmsensors on our GPFS cluster.

1. Created a 'spectrumscale' data-source bridge on our grafana ( NOT SET
TO DEFAULT )
https://www.ibm.com/support/knowledgecenter/en/STXKQY_4.2.3/com.ibm.spectrum.scale.v4r23.doc/bl1adv_setuppmbridgeforgrafana.htm

2. Created a new dash-board by importing the pre-built dashboard.
https://www.ibm.com/developerworks/community/wikis/home?lang=en#!/wiki/General%20Parallel%20File%20System%20%28GPFS%29/page/Importing%20predefined%20Grafana%20dashboards

 
Here is the issue.
              I don't get any graph updates if I don't set
"spectrumscale" as DEFAULT data-source but that is breaking rest of the
graphs ( we have ton of dashboards). So I had to uncheck the
"spectrumscale" as default data-source.

  If I go and set the "data-source" manually to "spectrumscale" on the
pre-built dashboard graphs. I see the wheel spinning but no updates. Any
ideas?


Thank you
Chakri


From Robert.Oesterlin at nuance.com  Tue Jul 18 12:45:38 2017
From: Robert.Oesterlin at nuance.com (Oesterlin, Robert)
Date: Tue, 18 Jul 2017 11:45:38 +0000
Subject: [gpfsug-discuss] Setting up IBM Spectrum Scale performance
 monitoring bridge for Grafana
Message-ID: <FEBED150-15A3-47F4-82EE-3859D3055860@nuance.com>

Hi Chakri

If you?re getting the ?ole ?spinning wheel? on your dashboard, then it?s one of two things:

1) The Grafana bridge is not running
2) The dashboard is requesting a metric that isn?t available.

Assuming that you?ve verified that the pmcollector/pmsensor setup is work right in your cluster, I?d then start looking at the log files for the Grafana Bridge and the pmcollector to see if you can determine if either is producing an error - like the metric wasn?t found. The other thing to try is setup a small test graph with a known metric being collected by you pmsensor configuration, rather than try one of Helene?s default dashboards, which are fairly complex.

Drop me a note directly if you need to.

Bob Oesterlin
Sr Principal Storage Engineer, Nuance

 
From cgirda at wustl.edu  Tue Jul 18 15:57:05 2017
From: cgirda at wustl.edu (Chakravarthy Girda)
Date: Tue, 18 Jul 2017 09:57:05 -0500
Subject: [gpfsug-discuss] Setting up IBM Spectrum Scale performance
 monitoring bridge for Grafana
In-Reply-To: <FEBED150-15A3-47F4-82EE-3859D3055860@nuance.com>
References: <FEBED150-15A3-47F4-82EE-3859D3055860@nuance.com>
Message-ID: <cafc331f-3d19-de03-65e3-70e2aab6bb4d@wustl.edu>

Bob,

 Found the issue to be with https is getting blocked with "direct"
connection. Switched it to proxy on the bridge-port. That helped and now
I can see graphs.


Thank you
Chakri


On 7/18/17 6:45 AM, Oesterlin, Robert wrote:
> Hi Chakri
>
> If you?re getting the ?ole ?spinning wheel? on your dashboard, then it?s one of two things:
>
> 1) The Grafana bridge is not running
> 2) The dashboard is requesting a metric that isn?t available.
>
> Assuming that you?ve verified that the pmcollector/pmsensor setup is work right in your cluster, I?d then start looking at the log files for the Grafana Bridge and the pmcollector to see if you can determine if either is producing an error - like the metric wasn?t found. The other thing to try is setup a small test graph with a known metric being collected by you pmsensor configuration, rather than try one of Helene?s default dashboards, which are fairly complex.
>
> Drop me a note directly if you need to.
>
> Bob Oesterlin
> Sr Principal Storage Engineer, Nuance
>
>  
>
>  
>
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss


From david_johnson at brown.edu  Tue Jul 18 18:21:06 2017
From: david_johnson at brown.edu (David Johnson)
Date: Tue, 18 Jul 2017 13:21:06 -0400
Subject: [gpfsug-discuss] mmsysmon.py revisited
Message-ID: <C4930E75-F866-4ACD-8DC3-9BA8423DF55C@brown.edu>

We also noticed a fair amount of CPU time accumulated by mmsysmon.py on
our diskless compute nodes. I read the earlier query, where it was answered:

> ces == Cluster Export Services,  mmsysmon.py comes from mmcesmon. It is used for managing export services of GPFS. If it is killed,  your nfs/smb etc will be out of work.
> Their overhead is small and they are very important. Don't attempt to kill them.
> 

Our question is this ? we don?t run the latest ?protocols", our NFS is CNFS, and our CIFS is clustered CIFS.
I can understand it might be needed with Ganesha, but on every node? 

Why in the world would I be getting this daemon running on all client nodes, when I didn?t install the ?protocols" version 
of the distribution?   We have release 4.2.2 at the moment.  How can we disable this?

Thanks,
 ? ddj
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20170718/83a497f4/attachment.htm>

From jonathon.anderson at colorado.edu  Tue Jul 18 18:51:21 2017
From: jonathon.anderson at colorado.edu (Jonathon A Anderson)
Date: Tue, 18 Jul 2017 17:51:21 +0000
Subject: [gpfsug-discuss] mmsysmon.py revisited
In-Reply-To: <C4930E75-F866-4ACD-8DC3-9BA8423DF55C@brown.edu>
References: <C4930E75-F866-4ACD-8DC3-9BA8423DF55C@brown.edu>
Message-ID: <EE6EA194-2F89-4E20-9EDD-A9694AAB58F7@colorado.edu>

There?s no official way to cleanly disable it so far as I know yet; but you can defacto disable it by deleting /var/mmfs/mmsysmon/mmsysmonitor.conf.

It?s a huge problem. I don?t understand why it hasn?t been given much credit by dev or support.

~jonathon


On 7/18/17, 11:21 AM, "gpfsug-discuss-bounces at spectrumscale.org on behalf of David Johnson" <gpfsug-discuss-bounces at spectrumscale.org on behalf of david_johnson at brown.edu> wrote:

    
    We also noticed a fair amount of CPU time accumulated by mmsysmon.py on
    our diskless compute nodes. I read the earlier query, where it was answered:
    
    
    ces == Cluster Export Services,  mmsysmon.py comes from mmcesmon. It is used for managing export services of GPFS. If it is killed,  your nfs/smb etc will be out of work.
    Their overhead is small and they are very important. Don't attempt to kill them.
    
    
    Our question is this ? we don?t run the latest ?protocols", our NFS is CNFS, and our CIFS is clustered CIFS.
    I can understand it might be needed with Ganesha, but on every node? 
    
    
    Why in the world would I be getting this daemon running on all client nodes, when I didn?t install the ?protocols" version 
    of the distribution?   We have release 4.2.2 at the moment.  How can we disable this?
    
    
    Thanks,
     ? ddj
    

From S.J.Thompson at bham.ac.uk  Tue Jul 18 20:21:46 2017
From: S.J.Thompson at bham.ac.uk (Simon Thompson (IT Research Support))
Date: Tue, 18 Jul 2017 19:21:46 +0000
Subject: [gpfsug-discuss] SOBAR questions
In-Reply-To: <OFDD668A84.8AEF7D25-ON852580AE.005C792A-852580AE.005D3262@notes.na.collabserv.com>
References: <D4A7E0BB.353FE%s.j.thompson@bham.ac.uk>
	<OFDD668A84.8AEF7D25-ON852580AE.005C792A-852580AE.005D3262@notes.na.collabserv.com>
Message-ID: <D5941D3E.40BDD%s.j.thompson@bham.ac.uk>

So just following up on my questions from January.

We tried to do 2. I.e. Restore to a new file-system with different block sizes. It got part way through creating the file-sets on the new SOBAR file-system and then GPFS asserts and crashes... We weren't actually intentionally trying to move block sizes, but because we were restoring from a traditional SAN based system to a shiny new GNR based system, we'd manually done the FS create steps.

I have a PMR open now. I don't know if someone internally in IBM actually tried this after my emails, as apparently there is a similar internal defect which is ~6 months old...

Simon

From: <gpfsug-discuss-bounces at spectrumscale.org<mailto:gpfsug-discuss-bounces at spectrumscale.org>> on behalf of Marc A Kaplan <makaplan at us.ibm.com<mailto:makaplan at us.ibm.com>>
Reply-To: "gpfsug-discuss at spectrumscale.org<mailto:gpfsug-discuss at spectrumscale.org>" <gpfsug-discuss at spectrumscale.org<mailto:gpfsug-discuss at spectrumscale.org>>
Date: Friday, 20 January 2017 at 17:57
To: "gpfsug-discuss at spectrumscale.org<mailto:gpfsug-discuss at spectrumscale.org>" <gpfsug-discuss at spectrumscale.org<mailto:gpfsug-discuss at spectrumscale.org>>
Subject: Re: [gpfsug-discuss] SOBAR questions

I worked on some aspects of SOBAR, but without studying and testing the commands - I'm not in a position right now to give simple definitive answers -
having said that....

Generally your questions are reasonable and the answer is: "Yes it should be possible to do that, but you might be going a bit beyond the design point..,
so you'll need to try it out on a (smaller) test system with some smaller tedst files.

Point by point.

1. If SOBAR is unable to restore a particular file, perhaps because the premigration did not complete -- you should only lose that particular file,
and otherwise "keep going".

2. I think SOBAR helps you build a similar file system to the original, including block sizes.  So you'd have to go in and tweak the file system creation step(s).
I think this is reasonable... If you hit a problem... IMO that would be a fair APAR.

3. Similar to 2.


From:        "Simon Thompson (Research Computing - IT Services)" <S.J.Thompson at bham.ac.uk<mailto:S.J.Thompson at bham.ac.uk>>
To:        "gpfsug-discuss at spectrumscale.org<mailto:gpfsug-discuss at spectrumscale.org>" <gpfsug-discuss at spectrumscale.org<mailto:gpfsug-discuss at spectrumscale.org>>
Date:        01/20/2017 10:44 AM
Subject:        [gpfsug-discuss] SOBAR questions
Sent by:        gpfsug-discuss-bounces at spectrumscale.org<mailto:gpfsug-discuss-bounces at spectrumscale.org>
________________________________


We've recently been looking at deploying SOBAR to support DR of some of
our file-systems, I have some questions (as ever!) that I can't see are
clearly documented, so was wondering if anyone has any insight on this.

1. If we elect not to premigrate certain files, are we still able to use
SOBAR? We are happy to take a hit that those files will never be available
again, but some are multi TB files which change daily and we can't stream
to tape effectively.

2. When doing a restore, does the block size of the new SOBAR'd to
file-system have to match? For example the old FS was 1MB blocks, the new
FS we create with 2MB blocks. Will this work (this strikes me as one way
we might be able to migrate an FS to a new block size?)?

3. If the file-system was originally created with an older GPFS code but
has since been upgraded, does restore work, and does it matter what client
code? E.g. We have a file-system that was originally 3.5.x, its been
upgraded over time to 4.2.2.0. Will this work if the client code was say
4.2.2.5 (with an appropriate FS version). E.g. Mmlsfs lists, "13.01
(3.5.0.0) Original file system version" and "16.00 (4.2.2.0) Current file
system version". Say there was 4.2.2.5 which created version 16.01
file-system as the new FS, what would happen?

This sort of detail is missing from:
https://www.ibm.com/support/knowledgecenter/STXKQY_4.2.2/com.ibm.spectrum.s
cale.v4r22.doc/bl1adv_sobarrestore.htm

But is probably quite important for us to know!

Thanks

Simon

_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20170718/d9406474/attachment.htm>

From leslie.james.elliott at gmail.com  Wed Jul 19 08:22:49 2017
From: leslie.james.elliott at gmail.com (leslie elliott)
Date: Wed, 19 Jul 2017 17:22:49 +1000
Subject: [gpfsug-discuss] AFM over NFS
Message-ID: <CANBv+tstWM5174bmgvbB7DXm7uB_GbA6qwn1xoKrLrbmp=nJqw@mail.gmail.com>

we are having a problem linking a target to a fileset

we are able to manually connect with NFSv4 to the correct path on an NFS
export down a particular subdirectory path, but when when we create a
fileset with this same path as an afmTarget it connects with NFSv3 and
actually connects to the top of the export even though mmafmctl displays
the extended path information

are we able to tell AFM to connect with NFSv4 in any way to work around
this problem

the NFS comes from a closed system, we can not change the configuration on
it to fix the problem on the target

thanks

leslie
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20170719/8be0da62/attachment.htm>

From Greg.Lehmann at csiro.au  Wed Jul 19 08:53:58 2017
From: Greg.Lehmann at csiro.au (Greg.Lehmann at csiro.au)
Date: Wed, 19 Jul 2017 07:53:58 +0000
Subject: [gpfsug-discuss] Scale / Perfmon / Grafana - services running
	but	no data
In-Reply-To: <OF6A8EFA7C.5958941C-ON00258155.00193257-00258155.001C1E4B@notes.na.collabserv.com>
References: <OF6A8EFA7C.5958941C-ON00258155.00193257-00258155.001C1E4B@notes.na.collabserv.com>
Message-ID: <d821af5f137b42bd8c0be3585adcdd55@exch1-cdc.nexus.csiro.au>

I?m having a play with this now too. Has anybody coded a systemd unit to handle step 2b in the knowledge centre article ? bridge creation on the gpfs side? It would save me a bit of effort.

I?m also wondering about the CherryPy version. It looks like this has been developed on SLES which has the newer version mentioned as a standard package and yet RHEL with an older version of CherryPy is perhaps more common as it seems to have the best support for features of GPFS, like object and block protocols. Maybe SLES is in favour now?

Cheers,

Greg

From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Andrew Beattie
Sent: Thursday, 6 July 2017 3:07 PM
To: gpfsug-discuss at spectrumscale.org
Subject: [gpfsug-discuss] Scale / Perfmon / Grafana - services running but no data

Greetings,


I'm currently setting up Grafana to interact with one of our Scale Clusters
and i've followed the knowledge centre link in terms of setup.

https://www.ibm.com/support/knowledgecenter/en/STXKQY_4.2.3/com.ibm.spectrum.scale.v4r23.doc/bl1adv_setuppmbridgeforgrafana.htm

However while everything appears to be working i'm not seeing any data coming through the reports within the grafana server, even though I can see data in the Scale GUI

The current environment:

[root at sc01n02 ~]# mmlscluster
GPFS cluster information
========================
  GPFS cluster name:         sc01.spectrum
  GPFS cluster id:           18085710661892594990
  GPFS UID domain:           sc01.spectrum
  Remote shell command:      /usr/bin/ssh
  Remote file copy command:  /usr/bin/scp
  Repository type:           CCR
 Node  Daemon node name  IP address  Admin node name  Designation
------------------------------------------------------------------
   1   sc01n01           10.2.12.11  sc01n01          quorum-manager-perfmon
   2   sc01n02           10.2.12.12  sc01n02          quorum-manager-perfmon
   3   sc01n03           10.2.12.13  sc01n03          quorum-manager-perfmon
[root at sc01n02 ~]#


[root at sc01n02 ~]# mmlsconfig
Configuration data for cluster sc01.spectrum:
---------------------------------------------
clusterName sc01.spectrum
clusterId 18085710661892594990
autoload yes
profile gpfsProtocolDefaults
dmapiFileHandleSize 32
minReleaseLevel 4.2.2.0
ccrEnabled yes
cipherList AUTHONLY
maxblocksize 16M
[cesNodes]
maxMBpS 5000
numaMemoryInterleave yes
enforceFilesetQuotaOnRoot yes
workerThreads 512
[common]
tscCmdPortRange 60000-61000
cesSharedRoot /ibm/cesSharedRoot/ces
cifsBypassTraversalChecking yes
syncSambaMetadataOps yes
cifsBypassShareLocksOnRename yes
adminMode central
File systems in cluster sc01.spectrum:
--------------------------------------
/dev/cesSharedRoot
/dev/icos_demo
/dev/scale01
[root at sc01n02 ~]#


[root at sc01n02 ~]# systemctl status pmcollector
? pmcollector.service - LSB: Start the ZIMon performance monitor collector.
   Loaded: loaded (/etc/rc.d/init.d/pmcollector)
   Active: active (running) since Tue 2017-05-30 08:46:32 AEST; 1 months 6 days ago
     Docs: man:systemd-sysv-generator(8)
 Main PID: 2693 (ZIMonCollector)
   CGroup: /system.slice/pmcollector.service
           ??2693 /opt/IBM/zimon/ZIMonCollector -C /opt/IBM/zimon/ZIMonCollector.cfg...
           ??2698 python /opt/IBM/zimon/bin/pmmonitor.py -f /opt/IBM/zimon/syshealth...
May 30 08:46:32 sc01n02 systemd[1]: Starting LSB: Start the ZIMon performance mon......
May 30 08:46:32 sc01n02 pmcollector[2584]: Starting performance monitor collector...
May 30 08:46:32 sc01n02 systemd[1]: Started LSB: Start the ZIMon performance moni...r..
Hint: Some lines were ellipsized, use -l to show in full.

From Grafana Server:

[cid:image002.jpg at 01D300B7.CFE73E50]


when I send a set of files to the cluster (3.8GB)  I can see performance metrics within the Scale GUI

[cid:image004.jpg at 01D300B7.CFE73E50]

yet from the Grafana Dashboard im not seeing any data points

[cid:image006.jpg at 01D300B7.CFE73E50]

Can anyone provide some hints as to what might be happening?


Regards,


Andrew Beattie
Software Defined Storage  - IT Specialist
Phone: 614-2133-7927
E-mail: abeattie at au1.ibm.com<mailto:abeattie at au1.ibm.com>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20170719/92608e46/attachment.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image002.jpg
Type: image/jpeg
Size: 19427 bytes
Desc: image002.jpg
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20170719/92608e46/attachment.jpg>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image004.jpg
Type: image/jpeg
Size: 84412 bytes
Desc: image004.jpg
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20170719/92608e46/attachment-0001.jpg>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image006.jpg
Type: image/jpeg
Size: 37285 bytes
Desc: image006.jpg
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20170719/92608e46/attachment-0002.jpg>

From janfrode at tanso.net  Wed Jul 19 12:09:48 2017
From: janfrode at tanso.net (Jan-Frode Myklebust)
Date: Wed, 19 Jul 2017 11:09:48 +0000
Subject: [gpfsug-discuss] SOBAR questions
In-Reply-To: <D5941D3E.40BDD%s.j.thompson@bham.ac.uk>
References: <D4A7E0BB.353FE%s.j.thompson@bham.ac.uk>
	<OFDD668A84.8AEF7D25-ON852580AE.005C792A-852580AE.005D3262@notes.na.collabserv.com>
	<D5941D3E.40BDD%s.j.thompson@bham.ac.uk>
Message-ID: <CAHwPathEpRPGCeyRzqbJCsWBErLnfF1Bc4NvjNEaqroyFwwhRw@mail.gmail.com>

Nils Haustein did such a migration from v7000 Unified to ESS last year.
Used SOBAR to avoid recalls from HSM. I believe he wrote a whitepaper on
the process..


-jf
tir. 18. jul. 2017 kl. 21.21 skrev Simon Thompson (IT Research Support) <
S.J.Thompson at bham.ac.uk>:

> So just following up on my questions from January.
>
> We tried to do 2. I.e. Restore to a new file-system with different block
> sizes. It got part way through creating the file-sets on the new SOBAR
> file-system and then GPFS asserts and crashes... We weren't actually
> intentionally trying to move block sizes, but because we were restoring
> from a traditional SAN based system to a shiny new GNR based system, we'd
> manually done the FS create steps.
>
> I have a PMR open now. I don't know if someone internally in IBM actually
> tried this after my emails, as apparently there is a similar internal
> defect which is ~6 months old...
>
> Simon
>
> From: <gpfsug-discuss-bounces at spectrumscale.org> on behalf of Marc A
> Kaplan <makaplan at us.ibm.com>
> Reply-To: "gpfsug-discuss at spectrumscale.org" <
> gpfsug-discuss at spectrumscale.org>
> Date: Friday, 20 January 2017 at 17:57
>
> To: "gpfsug-discuss at spectrumscale.org" <gpfsug-discuss at spectrumscale.org>
> Subject: Re: [gpfsug-discuss] SOBAR questions
>
> I worked on some aspects of SOBAR, but without studying and testing the
> commands - I'm not in a position right now to give simple definitive
> answers -
> having said that....
>
> Generally your questions are reasonable and the answer is: "Yes it should
> be possible to do that, but you might be going a bit beyond the design
> point..,
> so you'll need to try it out on a (smaller) test system with some smaller
> tedst files.
>
> Point by point.
>
> 1. If SOBAR is unable to restore a particular file, perhaps because the
> premigration did not complete -- you should only lose that particular file,
> and otherwise "keep going".
>
> 2. I think SOBAR helps you build a similar file system to the original,
> including block sizes.  So you'd have to go in and tweak the file system
> creation step(s).
> I think this is reasonable... If you hit a problem... IMO that would be a
> fair APAR.
>
> 3. Similar to 2.
>
>
>
>
>
> From:        "Simon Thompson (Research Computing - IT Services)" <
> S.J.Thompson at bham.ac.uk>
> To:        "gpfsug-discuss at spectrumscale.org" <
> gpfsug-discuss at spectrumscale.org>
> Date:        01/20/2017 10:44 AM
> Subject:        [gpfsug-discuss] SOBAR questions
> Sent by:        gpfsug-discuss-bounces at spectrumscale.org
> ------------------------------
>
>
>
> We've recently been looking at deploying SOBAR to support DR of some of
> our file-systems, I have some questions (as ever!) that I can't see are
> clearly documented, so was wondering if anyone has any insight on this.
>
> 1. If we elect not to premigrate certain files, are we still able to use
> SOBAR? We are happy to take a hit that those files will never be available
> again, but some are multi TB files which change daily and we can't stream
> to tape effectively.
>
> 2. When doing a restore, does the block size of the new SOBAR'd to
> file-system have to match? For example the old FS was 1MB blocks, the new
> FS we create with 2MB blocks. Will this work (this strikes me as one way
> we might be able to migrate an FS to a new block size?)?
>
> 3. If the file-system was originally created with an older GPFS code but
> has since been upgraded, does restore work, and does it matter what client
> code? E.g. We have a file-system that was originally 3.5.x, its been
> upgraded over time to 4.2.2.0. Will this work if the client code was say
> 4.2.2.5 (with an appropriate FS version). E.g. Mmlsfs lists, "13.01
> (3.5.0.0) Original file system version" and "16.00 (4.2.2.0) Current file
> system version". Say there was 4.2.2.5 which created version 16.01
> file-system as the new FS, what would happen?
>
> This sort of detail is missing from:
> https://www.ibm.com/support/knowledgecenter/STXKQY_4.2.2/com.ibm.spectrum.s
> cale.v4r22.doc/bl1adv_sobarrestore.htm
>
> But is probably quite important for us to know!
>
> Thanks
>
> Simon
>
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>
>
>
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20170719/9c42a70b/attachment.htm>

From Robert.Oesterlin at nuance.com  Wed Jul 19 12:26:43 2017
From: Robert.Oesterlin at nuance.com (Oesterlin, Robert)
Date: Wed, 19 Jul 2017 11:26:43 +0000
Subject: [gpfsug-discuss] Scale / Perfmon / Grafana - services running
 but no data
In-Reply-To: <d821af5f137b42bd8c0be3585adcdd55@exch1-cdc.nexus.csiro.au>
References: <OF6A8EFA7C.5958941C-ON00258155.00193257-00258155.001C1E4B@notes.na.collabserv.com>
	<d821af5f137b42bd8c0be3585adcdd55@exch1-cdc.nexus.csiro.au>
Message-ID: <C5F922A1-E71E-4BC0-BED1-F9CB32D07A25@nuance.com>

Getting this: python zimonGrafanaIntf.py ?s < pmcollector host>

via system is a bit of a tricky process, since this process will abort unless the pmcollector is fully up. With a large database, I?ve seen it take 3-5 mins for pmcollector to fully initialize. I?m sure a simple ?sleep and try again? wrapper would take care of that. It?s on my lengthy to-do list!

On the CherryPy version - I run the bridge on my RH/Centos system with python 3.4 and used ?pip install cherrypy? and it picked up the latest version. Seems to work just fine.

Bob Oesterlin
Sr Principal Storage Engineer, Nuance


From: <gpfsug-discuss-bounces at spectrumscale.org> on behalf of "Greg.Lehmann at csiro.au" <Greg.Lehmann at csiro.au>
Reply-To: gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
Date: Wednesday, July 19, 2017 at 2:54 AM
To: "gpfsug-discuss at spectrumscale.org" <gpfsug-discuss at spectrumscale.org>
Subject: [EXTERNAL] Re: [gpfsug-discuss] Scale / Perfmon / Grafana - services running but no data

I?m having a play with this now too. Has anybody coded a systemd unit to handle step 2b in the knowledge centre article ? bridge creation on the gpfs side? It would save me a bit of effort.

I?m also wondering about the CherryPy version. It looks like this has been developed on SLES which has the newer version mentioned as a standard package and yet RHEL with an older version of CherryPy is perhaps more common as it seems to have the best support for features of GPFS, like object and block protocols. Maybe SLES is in favour now?
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20170719/b6421062/attachment.htm>

From MDIETZ at de.ibm.com  Wed Jul 19 14:05:49 2017
From: MDIETZ at de.ibm.com (Mathias Dietz)
Date: Wed, 19 Jul 2017 15:05:49 +0200
Subject: [gpfsug-discuss] mmsysmon.py revisited
In-Reply-To: <EE6EA194-2F89-4E20-9EDD-A9694AAB58F7@colorado.edu>
References: <C4930E75-F866-4ACD-8DC3-9BA8423DF55C@brown.edu>
	<EE6EA194-2F89-4E20-9EDD-A9694AAB58F7@colorado.edu>
Message-ID: <OFCA7D9A5E.C7B3505A-ONC1258162.00420361-C1258162.0047F174@notes.na.collabserv.com>

thanks for the feedback. 

Let me clarify what mmsysmon is doing.
Since IBM Spectrum Scale 4.2.1 the mmsysmon process is used for the 
overall health monitoring and CES failover handling.
Even without CES it is an essential part of the system because it monitors 
the individual components and provides health state information and error 
events. 
This information is needed by other Spectrum Scale components (mmhealth 
command, the IBM Spectrum Scale GUI, Support tools, Install Toolkit,..) 
and therefore disabling mmsysmon will impact them. 

> It?s a huge problem. I don?t understand why it hasn?t been given 
> much credit by dev or support.

Over the last couple of month, the development team has put a strong focus 
on this topic. 
In order to monitor the health of the individual components, mmsysmon 
listens for notifications/callback but also has to do some polling.
We are trying to reduce the polling overhead constantly and replace 
polling with notifications when possible. 

Several improvements have been added to 4.2.3, including the ability to 
configure the polling frequency to reduce the overhead. (mmhealth config 
interval) 
See 
https://www.ibm.com/support/knowledgecenter/en/STXKQY_4.2.3/com.ibm.spectrum.scale.v4r23.doc/bl1adm_mmhealth.htm
In addition a new option has been introduced to clock align the monitoring 
threads in order to reduce CPU jitter. 

Nevertheless, we don't see significant CPU consumption by mmsysmon on our 
test systems. 
It might be a problem specific to your system environment or a wrong 
configuration therefore please get in contact with IBM support to analyze 
the root cause of the high usage.

Kind regards

Mathias Dietz

IBM Spectrum Scale - Release Lead Architect and RAS Architect 


gpfsug-discuss-bounces at spectrumscale.org wrote on 07/18/2017 07:51:21 PM:

> From: Jonathon A Anderson <jonathon.anderson at colorado.edu>
> To: gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
> Date: 07/18/2017 07:51 PM
> Subject: Re: [gpfsug-discuss] mmsysmon.py revisited
> Sent by: gpfsug-discuss-bounces at spectrumscale.org
> 
> There?s no official way to cleanly disable it so far as I know yet; 
> but you can defacto disable it by deleting /var/mmfs/mmsysmon/
> mmsysmonitor.conf.
> 
> It?s a huge problem. I don?t understand why it hasn?t been given 
> much credit by dev or support.
> 
> ~jonathon
> 
> 
> On 7/18/17, 11:21 AM, "gpfsug-discuss-bounces at spectrumscale.org on 
> behalf of David Johnson" <gpfsug-discuss-bounces at spectrumscale.org 
> on behalf of david_johnson at brown.edu> wrote:
> 
> 
> 
> 
>     We also noticed a fair amount of CPU time accumulated by mmsysmon.py 
on
>     our diskless compute nodes. I read the earlier query, where it 
> was answered:
> 
> 
> 
> 
>     ces == Cluster Export Services,  mmsysmon.py comes from 
> mmcesmon. It is used for managing export services of GPFS. If it is 
> killed,  your nfs/smb etc will be out of work.
>     Their overhead is small and they are very important. Don't 
> attempt to kill them.
> 
> 
> 
> 
> 
> 
>     Our question is this ? we don?t run the latest ?protocols", our 
> NFS is CNFS, and our CIFS is clustered CIFS.
>     I can understand it might be needed with Ganesha, but on every node? 

> 
> 
>     Why in the world would I be getting this daemon running on all 
> client nodes, when I didn?t install the ?protocols" version 
>     of the distribution?   We have release 4.2.2 at the moment.  How
> can we disable this?
> 
> 
>     Thanks,
>      ? ddj
> 
> 
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20170719/8c0e33e9/attachment.htm>

From david_johnson at brown.edu  Wed Jul 19 14:28:23 2017
From: david_johnson at brown.edu (David Johnson)
Date: Wed, 19 Jul 2017 09:28:23 -0400
Subject: [gpfsug-discuss] mmsysmon.py revisited
In-Reply-To: <OFCA7D9A5E.C7B3505A-ONC1258162.00420361-C1258162.0047F174@notes.na.collabserv.com>
References: <C4930E75-F866-4ACD-8DC3-9BA8423DF55C@brown.edu>
	<EE6EA194-2F89-4E20-9EDD-A9694AAB58F7@colorado.edu>
	<OFCA7D9A5E.C7B3505A-ONC1258162.00420361-C1258162.0047F174@notes.na.collabserv.com>
Message-ID: <BA818B33-1758-469C-9CE3-D8F870F6CAF5@brown.edu>

I have opened a PMR, and the official response reflects what you just posted.
In addition, it seems there are some performance issues with Python 2 that will be 
improved with eventual migration to Python 3.  I was unaware of the mmhealth
functions that the mmsysmon daemon provides. The impact we were seeing 
was some variation in MPI benchmark results when the nodes were fully loaded.
I suppose it would be possible to turn off mmsysmon during the benchmarking,
but I appreciate the effort at streamlining the monitor service.  Cutting back on
fork/exec, better python, less polling, more notifications?  all good.

Thanks for the details,

 ? ddj

> On Jul 19, 2017, at 9:05 AM, Mathias Dietz <MDIETZ at de.ibm.com> wrote:
> 
> thanks for the feedback. 
> 
> Let me clarify what mmsysmon is doing.
> Since IBM Spectrum Scale 4.2.1 the mmsysmon process is used for the overall health monitoring and CES failover handling.
> Even without CES it is an essential part of the system because it monitors the individual components and provides health state information and error events. 
> This information is needed by other Spectrum Scale components (mmhealth command, the IBM Spectrum Scale GUI, Support tools, Install Toolkit,..) and therefore disabling mmsysmon will impact them. 
> 
> > It?s a huge problem. I don?t understand why it hasn?t been given 
> > much credit by dev or support.
> 
> Over the last couple of month, the development team has put a strong focus on this topic. 
> In order to monitor the health of the individual components, mmsysmon listens for notifications/callback but also has to do some polling.
> We are trying to reduce the polling overhead constantly and replace polling with notifications when possible. 
> 
> Several improvements have been added to 4.2.3, including the ability to configure the polling frequency to reduce the overhead. (mmhealth config interval) 
> See https://www.ibm.com/support/knowledgecenter/en/STXKQY_4.2.3/com.ibm.spectrum.scale.v4r23.doc/bl1adm_mmhealth.htm <https://www.ibm.com/support/knowledgecenter/en/STXKQY_4.2.3/com.ibm.spectrum.scale.v4r23.doc/bl1adm_mmhealth.htm>
> In addition a new option has been introduced to clock align the monitoring threads in order to reduce CPU jitter. 
> 
> Nevertheless, we don't see significant CPU consumption by mmsysmon on our test systems. 
> It might be a problem specific to your system environment or a wrong configuration therefore please get in contact with IBM support to analyze the root cause of the high usage.
> 
> Kind regards
> 
> Mathias Dietz
> 
> IBM Spectrum Scale - Release Lead Architect and RAS Architect 
> 
> 
> gpfsug-discuss-bounces at spectrumscale.org wrote on 07/18/2017 07:51:21 PM:
> 
> > From: Jonathon A Anderson <jonathon.anderson at colorado.edu>
> > To: gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
> > Date: 07/18/2017 07:51 PM
> > Subject: Re: [gpfsug-discuss] mmsysmon.py revisited
> > Sent by: gpfsug-discuss-bounces at spectrumscale.org
> > 
> > There?s no official way to cleanly disable it so far as I know yet; 
> > but you can defacto disable it by deleting /var/mmfs/mmsysmon/
> > mmsysmonitor.conf.
> > 
> > It?s a huge problem. I don?t understand why it hasn?t been given 
> > much credit by dev or support.
> > 
> > ~jonathon
> > 
> > 
> > On 7/18/17, 11:21 AM, "gpfsug-discuss-bounces at spectrumscale.org on 
> > behalf of David Johnson" <gpfsug-discuss-bounces at spectrumscale.org 
> > on behalf of david_johnson at brown.edu> wrote:
> > 
> >     
> >     
> >     
> >     We also noticed a fair amount of CPU time accumulated by mmsysmon.py on
> >     our diskless compute nodes. I read the earlier query, where it 
> > was answered:
> >     
> >     
> >     
> >     
> >     ces == Cluster Export Services,  mmsysmon.py comes from 
> > mmcesmon. It is used for managing export services of GPFS. If it is 
> > killed,  your nfs/smb etc will be out of work.
> >     Their overhead is small and they are very important. Don't 
> > attempt to kill them.
> >     
> >     
> >     
> >     
> >     
> >     
> >     Our question is this ? we don?t run the latest ?protocols", our 
> > NFS is CNFS, and our CIFS is clustered CIFS.
> >     I can understand it might be needed with Ganesha, but on every node? 
> >     
> >     
> >     Why in the world would I be getting this daemon running on all 
> > client nodes, when I didn?t install the ?protocols" version 
> >     of the distribution?   We have release 4.2.2 at the moment.  How
> > can we disable this?
> >     
> >     
> >     Thanks,
> >      ? ddj
> >     
> > 
> > _______________________________________________
> > gpfsug-discuss mailing list
> > gpfsug-discuss at spectrumscale.org
> > http://gpfsug.org/mailman/listinfo/gpfsug-discuss <http://gpfsug.org/mailman/listinfo/gpfsug-discuss>
> 
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20170719/669c525b/attachment.htm>

From mdharris at us.ibm.com  Wed Jul 19 15:40:17 2017
From: mdharris at us.ibm.com (Michael D Harris)
Date: Wed, 19 Jul 2017 10:40:17 -0400
Subject: [gpfsug-discuss] mmsysmon.py revisited
In-Reply-To: <mailman.3159.1500470909.1005.gpfsug-discuss@spectrumscale.org>
References: <mailman.3159.1500470909.1005.gpfsug-discuss@spectrumscale.org>
Message-ID: <OF39D03016.33F3C0BF-ON85258162.004BC754-85258162.005097A6@notes.na.collabserv.com>


Hi David,

Re: "The impact we were seeing was some variation in MPI benchmark results
when the nodes were fully loaded."

MPI workloads show the most mmhealth impact. Specifically the more
sensitive the workload is to jitter the higher the potential impact.

The mmhealth config interval, as per Mathias's link, is a scalar applied to
all monitor interval values in the configuration file. As such it currently
modifies the server side monitoring and health reporting in addition to
mitigating mpi client impact. So "medium" == 5 is a good perhaps reasonable
value - whereas the "slow" == 10 scalar may be too infrequent for your
server side monitoring and reporting (so your 30 second update becomes 5
minutes).

The clock alignment that Mathias mentioned is a new investigatory
undocumented tool for MPI workloads. It nearly completely removes all
mmhealth MPI jitter while retaining default monitor intervals. It also
naturally generates thundering herds of all client reporting to the quorum
nodes. So while you may mitigate the client MPI jitter you may severely
impact the server throughput on those intervals if not also exceed
connection and thread limits.

Configuring "clients" separately from "servers" without resorting to
alignment is another area of investigation.

I'm not familiar with your PMR but as Mathias mentioned "mmhealth config
interval medium" would be a good start. In testing that Kums and I have
done the "mmhealth config interval medium" value provides mitigation almost
as good as the mentioned clock alignment for MPI for say a psnap with
barrier type workload .

Regards, Mike Harris

IBM Spectrum Scale - Core Team


From:	gpfsug-discuss-request at spectrumscale.org
To:	gpfsug-discuss at spectrumscale.org
Date:	07/19/2017 09:28 AM
Subject:	gpfsug-discuss Digest, Vol 66, Issue 30
Sent by:	gpfsug-discuss-bounces at spectrumscale.org


Send gpfsug-discuss mailing list submissions to
		 gpfsug-discuss at spectrumscale.org

To subscribe or unsubscribe via the World Wide Web, visit
		 http://gpfsug.org/mailman/listinfo/gpfsug-discuss
or, via email, send a message with subject or body 'help' to
		 gpfsug-discuss-request at spectrumscale.org

You can reach the person managing the list at
		 gpfsug-discuss-owner at spectrumscale.org

When replying, please edit your Subject line so it is more specific
than "Re: Contents of gpfsug-discuss digest..."


Today's Topics:

   1. Re: mmsysmon.py revisited (Mathias Dietz)
   2. Re: mmsysmon.py revisited (David Johnson)


----------------------------------------------------------------------

Message: 1
Date: Wed, 19 Jul 2017 15:05:49 +0200
From: "Mathias Dietz" <MDIETZ at de.ibm.com>
To: gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
Subject: Re: [gpfsug-discuss] mmsysmon.py revisited
Message-ID:

<OFCA7D9A5E.C7B3505A-ONC1258162.00420361-C1258162.0047F174 at notes.na.collabserv.com>


Content-Type: text/plain; charset="iso-8859-1"

thanks for the feedback.

Let me clarify what mmsysmon is doing.
Since IBM Spectrum Scale 4.2.1 the mmsysmon process is used for the
overall health monitoring and CES failover handling.
Even without CES it is an essential part of the system because it monitors
the individual components and provides health state information and error
events.
This information is needed by other Spectrum Scale components (mmhealth
command, the IBM Spectrum Scale GUI, Support tools, Install Toolkit,..)
and therefore disabling mmsysmon will impact them.

> It?s a huge problem. I don?t understand why it hasn?t been given
> much credit by dev or support.

Over the last couple of month, the development team has put a strong focus
on this topic.
In order to monitor the health of the individual components, mmsysmon
listens for notifications/callback but also has to do some polling.
We are trying to reduce the polling overhead constantly and replace
polling with notifications when possible.

Several improvements have been added to 4.2.3, including the ability to
configure the polling frequency to reduce the overhead. (mmhealth config
interval)
See
https://www.ibm.com/support/knowledgecenter/en/STXKQY_4.2.3/com.ibm.spectrum.scale.v4r23.doc/bl1adm_mmhealth.htm

In addition a new option has been introduced to clock align the monitoring
threads in order to reduce CPU jitter.

Nevertheless, we don't see significant CPU consumption by mmsysmon on our
test systems.
It might be a problem specific to your system environment or a wrong
configuration therefore please get in contact with IBM support to analyze
the root cause of the high usage.

Kind regards

Mathias Dietz

IBM Spectrum Scale - Release Lead Architect and RAS Architect


gpfsug-discuss-bounces at spectrumscale.org wrote on 07/18/2017 07:51:21 PM:

> From: Jonathon A Anderson <jonathon.anderson at colorado.edu>
> To: gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
> Date: 07/18/2017 07:51 PM
> Subject: Re: [gpfsug-discuss] mmsysmon.py revisited
> Sent by: gpfsug-discuss-bounces at spectrumscale.org
>
> There?s no official way to cleanly disable it so far as I know yet;
> but you can defacto disable it by deleting /var/mmfs/mmsysmon/
> mmsysmonitor.conf.
>
> It?s a huge problem. I don?t understand why it hasn?t been given
> much credit by dev or support.
>
> ~jonathon
>
>
> On 7/18/17, 11:21 AM, "gpfsug-discuss-bounces at spectrumscale.org on
> behalf of David Johnson" <gpfsug-discuss-bounces at spectrumscale.org
> on behalf of david_johnson at brown.edu> wrote:
>
>
>
>
>     We also noticed a fair amount of CPU time accumulated by mmsysmon.py
on
>     our diskless compute nodes. I read the earlier query, where it
> was answered:
>
>
>
>
>     ces == Cluster Export Services,  mmsysmon.py comes from
> mmcesmon. It is used for managing export services of GPFS. If it is
> killed,  your nfs/smb etc will be out of work.
>     Their overhead is small and they are very important. Don't
> attempt to kill them.
>
>
>
>
>
>
>     Our question is this ? we don?t run the latest ?protocols", our
> NFS is CNFS, and our CIFS is clustered CIFS.
>     I can understand it might be needed with Ganesha, but on every node?

>
>
>     Why in the world would I be getting this daemon running on all
> client nodes, when I didn?t install the ?protocols" version
>     of the distribution?   We have release 4.2.2 at the moment.  How
> can we disable this?
>
>
>     Thanks,
>      ? ddj
>
>
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <
http://gpfsug.org/pipermail/gpfsug-discuss/attachments/20170719/8c0e33e9/attachment-0001.html
>

------------------------------

Message: 2
Date: Wed, 19 Jul 2017 09:28:23 -0400
From: David Johnson <david_johnson at brown.edu>
To: gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
Subject: Re: [gpfsug-discuss] mmsysmon.py revisited
Message-ID: <BA818B33-1758-469C-9CE3-D8F870F6CAF5 at brown.edu>
Content-Type: text/plain; charset="utf-8"

I have opened a PMR, and the official response reflects what you just
posted.
In addition, it seems there are some performance issues with Python 2 that
will be
improved with eventual migration to Python 3.  I was unaware of the
mmhealth
functions that the mmsysmon daemon provides. The impact we were seeing
was some variation in MPI benchmark results when the nodes were fully
loaded.
I suppose it would be possible to turn off mmsysmon during the
benchmarking,
but I appreciate the effort at streamlining the monitor service.  Cutting
back on
fork/exec, better python, less polling, more notifications?  all good.

Thanks for the details,

 ? ddj

> On Jul 19, 2017, at 9:05 AM, Mathias Dietz <MDIETZ at de.ibm.com> wrote:
>
> thanks for the feedback.
>
> Let me clarify what mmsysmon is doing.
> Since IBM Spectrum Scale 4.2.1 the mmsysmon process is used for the
overall health monitoring and CES failover handling.
> Even without CES it is an essential part of the system because it
monitors the individual components and provides health state information
and error events.
> This information is needed by other Spectrum Scale components (mmhealth
command, the IBM Spectrum Scale GUI, Support tools, Install Toolkit,..) and
therefore disabling mmsysmon will impact them.
>
> > It?s a huge problem. I don?t understand why it hasn?t been given
> > much credit by dev or support.
>
> Over the last couple of month, the development team has put a strong
focus on this topic.
> In order to monitor the health of the individual components, mmsysmon
listens for notifications/callback but also has to do some polling.
> We are trying to reduce the polling overhead constantly and replace
polling with notifications when possible.
>
> Several improvements have been added to 4.2.3, including the ability to
configure the polling frequency to reduce the overhead. (mmhealth config
interval)
> See
https://www.ibm.com/support/knowledgecenter/en/STXKQY_4.2.3/com.ibm.spectrum.scale.v4r23.doc/bl1adm_mmhealth.htm
 <
https://www.ibm.com/support/knowledgecenter/en/STXKQY_4.2.3/com.ibm.spectrum.scale.v4r23.doc/bl1adm_mmhealth.htm
>
> In addition a new option has been introduced to clock align the
monitoring threads in order to reduce CPU jitter.
>
> Nevertheless, we don't see significant CPU consumption by mmsysmon on our
test systems.
> It might be a problem specific to your system environment or a wrong
configuration therefore please get in contact with IBM support to analyze
the root cause of the high usage.
>
> Kind regards
>
> Mathias Dietz
>
> IBM Spectrum Scale - Release Lead Architect and RAS Architect
>
>
> gpfsug-discuss-bounces at spectrumscale.org wrote on 07/18/2017 07:51:21 PM:
>
> > From: Jonathon A Anderson <jonathon.anderson at colorado.edu>
> > To: gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
> > Date: 07/18/2017 07:51 PM
> > Subject: Re: [gpfsug-discuss] mmsysmon.py revisited
> > Sent by: gpfsug-discuss-bounces at spectrumscale.org
> >
> > There?s no official way to cleanly disable it so far as I know yet;
> > but you can defacto disable it by deleting /var/mmfs/mmsysmon/
> > mmsysmonitor.conf.
> >
> > It?s a huge problem. I don?t understand why it hasn?t been given
> > much credit by dev or support.
> >
> > ~jonathon
> >
> >
> > On 7/18/17, 11:21 AM, "gpfsug-discuss-bounces at spectrumscale.org on
> > behalf of David Johnson" <gpfsug-discuss-bounces at spectrumscale.org
> > on behalf of david_johnson at brown.edu> wrote:
> >
> >
> >
> >
> >     We also noticed a fair amount of CPU time accumulated by
mmsysmon.py on
> >     our diskless compute nodes. I read the earlier query, where it
> > was answered:
> >
> >
> >
> >
> >     ces == Cluster Export Services,  mmsysmon.py comes from
> > mmcesmon. It is used for managing export services of GPFS. If it is
> > killed,  your nfs/smb etc will be out of work.
> >     Their overhead is small and they are very important. Don't
> > attempt to kill them.
> >
> >
> >
> >
> >
> >
> >     Our question is this ? we don?t run the latest ?protocols", our
> > NFS is CNFS, and our CIFS is clustered CIFS.
> >     I can understand it might be needed with Ganesha, but on every
node?
> >
> >
> >     Why in the world would I be getting this daemon running on all
> > client nodes, when I didn?t install the ?protocols" version
> >     of the distribution?   We have release 4.2.2 at the moment.  How
> > can we disable this?
> >
> >
> >     Thanks,
> >      ? ddj
> >
> >
> > _______________________________________________
> > gpfsug-discuss mailing list
> > gpfsug-discuss at spectrumscale.org
> > http://gpfsug.org/mailman/listinfo/gpfsug-discuss <
http://gpfsug.org/mailman/listinfo/gpfsug-discuss>
>
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <
http://gpfsug.org/pipermail/gpfsug-discuss/attachments/20170719/669c525b/attachment.html
>

------------------------------

_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


End of gpfsug-discuss Digest, Vol 66, Issue 30
**********************************************


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20170719/1787de66/attachment.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: graycol.gif
Type: image/gif
Size: 105 bytes
Desc: not available
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20170719/1787de66/attachment.gif>

From jonathon.anderson at colorado.edu  Wed Jul 19 18:52:14 2017
From: jonathon.anderson at colorado.edu (Jonathon A Anderson)
Date: Wed, 19 Jul 2017 17:52:14 +0000
Subject: [gpfsug-discuss] mmsysmon.py revisited
In-Reply-To: <OFCA7D9A5E.C7B3505A-ONC1258162.00420361-C1258162.0047F174@notes.na.collabserv.com>
References: <C4930E75-F866-4ACD-8DC3-9BA8423DF55C@brown.edu>
	<EE6EA194-2F89-4E20-9EDD-A9694AAB58F7@colorado.edu>
	<OFCA7D9A5E.C7B3505A-ONC1258162.00420361-C1258162.0047F174@notes.na.collabserv.com>
Message-ID: <E1038B1F-CC46-4039-9F62-BA632D218E15@colorado.edu>

> It might be a problem specific to your system environment or a wrong configuration therefore please get in contact with IBM support to analyze the root cause of the high usage.

I suspect it?s actually a result of frequent IO interrupts causing jitter in conflict with MPI on the shared Intel Omni-Path network, in our case.

We?ve already tried pursuing support on this through our vendor, DDN, and got no-where. Eventually we were the ones who tried killing mmsysmon, and that fixed our problem.

The official company line of ?we don't see significant CPU consumption by mmsysmon on our test systems? isn?t helping. Do you have a test system with OPA?

~jonathon


On 7/19/17, 7:05 AM, "gpfsug-discuss-bounces at spectrumscale.org on behalf of Mathias Dietz" <gpfsug-discuss-bounces at spectrumscale.org on behalf of MDIETZ at de.ibm.com> wrote:

    thanks for the feedback. 
    
    Let me clarify what mmsysmon is doing.
    Since IBM Spectrum Scale 4.2.1 the mmsysmon process is used for the overall health monitoring and CES failover handling.
    Even without CES it is an essential part of the system because it monitors the individual components and provides health state information and error events.
    
    This information is needed by other Spectrum Scale components (mmhealth command, the IBM Spectrum Scale GUI, Support tools, Install Toolkit,..) and therefore disabling mmsysmon will impact them.
    
    
    > It?s a huge problem. I don?t understand why it hasn?t been given
    
    > much credit by dev or support.
    
    Over the last couple of month, the development team has put a strong focus on this topic.
    
    In order to monitor the health of the individual components, mmsysmon listens for notifications/callback but also has to do some polling.
    We are trying to reduce the polling overhead constantly and replace polling with notifications when possible.
    
    
    Several improvements have been added to 4.2.3, including the ability to configure the polling frequency to reduce the overhead. (mmhealth config interval)
    
    See https://www.ibm.com/support/knowledgecenter/en/STXKQY_4.2.3/com.ibm.spectrum.scale.v4r23.doc/bl1adm_mmhealth.htm
    In addition a new option has been introduced to clock align the monitoring threads in order to reduce CPU jitter.
    
    
    Nevertheless, we don't see significant CPU consumption by mmsysmon on our test systems.
    	
    It might be a problem specific to your system environment or a wrong configuration therefore please get in contact with IBM support to analyze the root cause of the high usage.
    
    Kind regards
    
    Mathias Dietz
    
    IBM Spectrum Scale - Release Lead Architect and RAS Architect
    
    
    gpfsug-discuss-bounces at spectrumscale.org wrote on 07/18/2017 07:51:21 PM:
    
    > From: Jonathon A Anderson <jonathon.anderson at colorado.edu>
    > To: gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
    > Date: 07/18/2017 07:51 PM
    > Subject: Re: [gpfsug-discuss] mmsysmon.py revisited
    > Sent by: gpfsug-discuss-bounces at spectrumscale.org
    > 
    > There?s no official way to cleanly disable it so far as I know yet; 
    > but you can defacto disable it by deleting /var/mmfs/mmsysmon/
    > mmsysmonitor.conf.
    > 
    > It?s a huge problem. I don?t understand why it hasn?t been given 
    > much credit by dev or support.
    > 
    > ~jonathon
    > 
    > 
    > On 7/18/17, 11:21 AM, "gpfsug-discuss-bounces at spectrumscale.org on 
    > behalf of David Johnson" <gpfsug-discuss-bounces at spectrumscale.org 
    > on behalf of david_johnson at brown.edu> wrote:
    > 
    >     
    >     
    >     
    >     We also noticed a fair amount of CPU time accumulated by mmsysmon.py on
    >     our diskless compute nodes. I read the earlier query, where it 
    > was answered:
    >     
    >     
    >     
    >     
    >     ces == Cluster Export Services,  mmsysmon.py comes from 
    > mmcesmon. It is used for managing export services of GPFS. If it is 
    > killed,  your nfs/smb etc will be out of work.
    >     Their overhead is small and they are very important. Don't 
    > attempt to kill them.
    >     
    >     
    >     
    >     
    >     
    >     
    >     Our question is this ? we don?t run the latest ?protocols", our 
    > NFS is CNFS, and our CIFS is clustered CIFS.
    >     I can understand it might be needed with Ganesha, but on every node? 
    >     
    >     
    >     Why in the world would I be getting this daemon running on all 
    > client nodes, when I didn?t install the ?protocols" version 
    >     of the distribution?   We have release 4.2.2 at the moment.  How
    > can we disable this?
    >     
    >     
    >     Thanks,
    >      ? ddj
    >     
    > 
    > _______________________________________________
    > gpfsug-discuss mailing list
    > gpfsug-discuss at spectrumscale.org
    > http://gpfsug.org/mailman/listinfo/gpfsug-discuss
    
    
From david_johnson at brown.edu  Wed Jul 19 19:12:37 2017
From: david_johnson at brown.edu (david_johnson at brown.edu)
Date: Wed, 19 Jul 2017 14:12:37 -0400
Subject: [gpfsug-discuss] mmsysmon.py revisited
In-Reply-To: <E1038B1F-CC46-4039-9F62-BA632D218E15@colorado.edu>
References: <C4930E75-F866-4ACD-8DC3-9BA8423DF55C@brown.edu>
	<EE6EA194-2F89-4E20-9EDD-A9694AAB58F7@colorado.edu>
	<OFCA7D9A5E.C7B3505A-ONC1258162.00420361-C1258162.0047F174@notes.na.collabserv.com>
	<E1038B1F-CC46-4039-9F62-BA632D218E15@colorado.edu>
Message-ID: <85E16CE6-FAF6-4C8E-80A9-3ED66580BD21@brown.edu>

We have FDR14 Mellanox fabric, probably similar interrupt load as OPA. 

  -- ddj
Dave Johnson

On Jul 19, 2017, at 1:52 PM, Jonathon A Anderson <jonathon.anderson at colorado.edu> wrote:

>> It might be a problem specific to your system environment or a wrong configuration therefore please get in contact with IBM support to analyze the root cause of the high usage.
> 
> I suspect it?s actually a result of frequent IO interrupts causing jitter in conflict with MPI on the shared Intel Omni-Path network, in our case.
> 
> We?ve already tried pursuing support on this through our vendor, DDN, and got no-where. Eventually we were the ones who tried killing mmsysmon, and that fixed our problem.
> 
> The official company line of ?we don't see significant CPU consumption by mmsysmon on our test systems? isn?t helping. Do you have a test system with OPA?
> 
> ~jonathon
> 
> 
> On 7/19/17, 7:05 AM, "gpfsug-discuss-bounces at spectrumscale.org on behalf of Mathias Dietz" <gpfsug-discuss-bounces at spectrumscale.org on behalf of MDIETZ at de.ibm.com> wrote:
> 
>    thanks for the feedback. 
> 
>    Let me clarify what mmsysmon is doing.
>    Since IBM Spectrum Scale 4.2.1 the mmsysmon process is used for the overall health monitoring and CES failover handling.
>    Even without CES it is an essential part of the system because it monitors the individual components and provides health state information and error events.
> 
>    This information is needed by other Spectrum Scale components (mmhealth command, the IBM Spectrum Scale GUI, Support tools, Install Toolkit,..) and therefore disabling mmsysmon will impact them.
> 
> 
>> It?s a huge problem. I don?t understand why it hasn?t been given
> 
>> much credit by dev or support.
> 
>    Over the last couple of month, the development team has put a strong focus on this topic.
> 
>    In order to monitor the health of the individual components, mmsysmon listens for notifications/callback but also has to do some polling.
>    We are trying to reduce the polling overhead constantly and replace polling with notifications when possible.
> 
> 
>    Several improvements have been added to 4.2.3, including the ability to configure the polling frequency to reduce the overhead. (mmhealth config interval)
> 
>    See https://www.ibm.com/support/knowledgecenter/en/STXKQY_4.2.3/com.ibm.spectrum.scale.v4r23.doc/bl1adm_mmhealth.htm
>    In addition a new option has been introduced to clock align the monitoring threads in order to reduce CPU jitter.
> 
> 
>    Nevertheless, we don't see significant CPU consumption by mmsysmon on our test systems.
>        
>    It might be a problem specific to your system environment or a wrong configuration therefore please get in contact with IBM support to analyze the root cause of the high usage.
> 
>    Kind regards
> 
>    Mathias Dietz
> 
>    IBM Spectrum Scale - Release Lead Architect and RAS Architect
> 
> 
> 
>    gpfsug-discuss-bounces at spectrumscale.org wrote on 07/18/2017 07:51:21 PM:
> 
>> From: Jonathon A Anderson <jonathon.anderson at colorado.edu>
>> To: gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
>> Date: 07/18/2017 07:51 PM
>> Subject: Re: [gpfsug-discuss] mmsysmon.py revisited
>> Sent by: gpfsug-discuss-bounces at spectrumscale.org
>> 
>> There?s no official way to cleanly disable it so far as I know yet; 
>> but you can defacto disable it by deleting /var/mmfs/mmsysmon/
>> mmsysmonitor.conf.
>> 
>> It?s a huge problem. I don?t understand why it hasn?t been given 
>> much credit by dev or support.
>> 
>> ~jonathon
>> 
>> 
>> On 7/18/17, 11:21 AM, "gpfsug-discuss-bounces at spectrumscale.org on 
>> behalf of David Johnson" <gpfsug-discuss-bounces at spectrumscale.org 
>> on behalf of david_johnson at brown.edu> wrote:
>> 
>> 
>> 
>> 
>>    We also noticed a fair amount of CPU time accumulated by mmsysmon.py on
>>    our diskless compute nodes. I read the earlier query, where it 
>> was answered:
>> 
>> 
>> 
>> 
>>    ces == Cluster Export Services,  mmsysmon.py comes from 
>> mmcesmon. It is used for managing export services of GPFS. If it is 
>> killed,  your nfs/smb etc will be out of work.
>>    Their overhead is small and they are very important. Don't 
>> attempt to kill them.
>> 
>> 
>> 
>> 
>> 
>> 
>>    Our question is this ? we don?t run the latest ?protocols", our 
>> NFS is CNFS, and our CIFS is clustered CIFS.
>>    I can understand it might be needed with Ganesha, but on every node? 
>> 
>> 
>>    Why in the world would I be getting this daemon running on all 
>> client nodes, when I didn?t install the ?protocols" version 
>>    of the distribution?   We have release 4.2.2 at the moment.  How
>> can we disable this?
>> 
>> 
>>    Thanks,
>>     ? ddj
>> 
>> 
>> _______________________________________________
>> gpfsug-discuss mailing list
>> gpfsug-discuss at spectrumscale.org
>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
> 
> 
> 
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss


From jonathon.anderson at colorado.edu  Wed Jul 19 19:29:22 2017
From: jonathon.anderson at colorado.edu (Jonathon A Anderson)
Date: Wed, 19 Jul 2017 18:29:22 +0000
Subject: [gpfsug-discuss] mmsysmon.py revisited
In-Reply-To: <85E16CE6-FAF6-4C8E-80A9-3ED66580BD21@brown.edu>
References: <C4930E75-F866-4ACD-8DC3-9BA8423DF55C@brown.edu>
	<EE6EA194-2F89-4E20-9EDD-A9694AAB58F7@colorado.edu>
	<OFCA7D9A5E.C7B3505A-ONC1258162.00420361-C1258162.0047F174@notes.na.collabserv.com>
	<E1038B1F-CC46-4039-9F62-BA632D218E15@colorado.edu>
	<85E16CE6-FAF6-4C8E-80A9-3ED66580BD21@brown.edu>
Message-ID: <CF652304-B8BB-4A1F-9569-B4D2EFBF5DE4@colorado.edu>

OPA behaves _significantly_ differently from Mellanox IB. OPA uses the host CPU for packet processing, whereas Mellanox IB uses a discrete asic on the HBA. As a result, OPA is much more sensitive to task placement and interrupts, in our experience, because the host CPU load competes with the fabric IO processing load.

~jonathon


On 7/19/17, 12:12 PM, "gpfsug-discuss-bounces at spectrumscale.org on behalf of david_johnson at brown.edu" <gpfsug-discuss-bounces at spectrumscale.org on behalf of david_johnson at brown.edu> wrote:

    We have FDR14 Mellanox fabric, probably similar interrupt load as OPA. 
    
      -- ddj
    Dave Johnson
    
    On Jul 19, 2017, at 1:52 PM, Jonathon A Anderson <jonathon.anderson at colorado.edu> wrote:
    
    >> It might be a problem specific to your system environment or a wrong configuration therefore please get in contact with IBM support to analyze the root cause of the high usage.
    > 
    > I suspect it?s actually a result of frequent IO interrupts causing jitter in conflict with MPI on the shared Intel Omni-Path network, in our case.
    > 
    > We?ve already tried pursuing support on this through our vendor, DDN, and got no-where. Eventually we were the ones who tried killing mmsysmon, and that fixed our problem.
    > 
    > The official company line of ?we don't see significant CPU consumption by mmsysmon on our test systems? isn?t helping. Do you have a test system with OPA?
    > 
    > ~jonathon
    > 
    > 
    > On 7/19/17, 7:05 AM, "gpfsug-discuss-bounces at spectrumscale.org on behalf of Mathias Dietz" <gpfsug-discuss-bounces at spectrumscale.org on behalf of MDIETZ at de.ibm.com> wrote:
    > 
    >    thanks for the feedback. 
    > 
    >    Let me clarify what mmsysmon is doing.
    >    Since IBM Spectrum Scale 4.2.1 the mmsysmon process is used for the overall health monitoring and CES failover handling.
    >    Even without CES it is an essential part of the system because it monitors the individual components and provides health state information and error events.
    > 
    >    This information is needed by other Spectrum Scale components (mmhealth command, the IBM Spectrum Scale GUI, Support tools, Install Toolkit,..) and therefore disabling mmsysmon will impact them.
    > 
    > 
    >> It?s a huge problem. I don?t understand why it hasn?t been given
    > 
    >> much credit by dev or support.
    > 
    >    Over the last couple of month, the development team has put a strong focus on this topic.
    > 
    >    In order to monitor the health of the individual components, mmsysmon listens for notifications/callback but also has to do some polling.
    >    We are trying to reduce the polling overhead constantly and replace polling with notifications when possible.
    > 
    > 
    >    Several improvements have been added to 4.2.3, including the ability to configure the polling frequency to reduce the overhead. (mmhealth config interval)
    > 
    >    See https://www.ibm.com/support/knowledgecenter/en/STXKQY_4.2.3/com.ibm.spectrum.scale.v4r23.doc/bl1adm_mmhealth.htm
    >    In addition a new option has been introduced to clock align the monitoring threads in order to reduce CPU jitter.
    > 
    > 
    >    Nevertheless, we don't see significant CPU consumption by mmsysmon on our test systems.
    >        
    >    It might be a problem specific to your system environment or a wrong configuration therefore please get in contact with IBM support to analyze the root cause of the high usage.
    > 
    >    Kind regards
    > 
    >    Mathias Dietz
    > 
    >    IBM Spectrum Scale - Release Lead Architect and RAS Architect
    > 
    > 
    > 
    >    gpfsug-discuss-bounces at spectrumscale.org wrote on 07/18/2017 07:51:21 PM:
    > 
    >> From: Jonathon A Anderson <jonathon.anderson at colorado.edu>
    >> To: gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
    >> Date: 07/18/2017 07:51 PM
    >> Subject: Re: [gpfsug-discuss] mmsysmon.py revisited
    >> Sent by: gpfsug-discuss-bounces at spectrumscale.org
    >> 
    >> There?s no official way to cleanly disable it so far as I know yet; 
    >> but you can defacto disable it by deleting /var/mmfs/mmsysmon/
    >> mmsysmonitor.conf.
    >> 
    >> It?s a huge problem. I don?t understand why it hasn?t been given 
    >> much credit by dev or support.
    >> 
    >> ~jonathon
    >> 
    >> 
    >> On 7/18/17, 11:21 AM, "gpfsug-discuss-bounces at spectrumscale.org on 
    >> behalf of David Johnson" <gpfsug-discuss-bounces at spectrumscale.org 
    >> on behalf of david_johnson at brown.edu> wrote:
    >> 
    >> 
    >> 
    >> 
    >>    We also noticed a fair amount of CPU time accumulated by mmsysmon.py on
    >>    our diskless compute nodes. I read the earlier query, where it 
    >> was answered:
    >> 
    >> 
    >> 
    >> 
    >>    ces == Cluster Export Services,  mmsysmon.py comes from 
    >> mmcesmon. It is used for managing export services of GPFS. If it is 
    >> killed,  your nfs/smb etc will be out of work.
    >>    Their overhead is small and they are very important. Don't 
    >> attempt to kill them.
    >> 
    >> 
    >> 
    >> 
    >> 
    >> 
    >>    Our question is this ? we don?t run the latest ?protocols", our 
    >> NFS is CNFS, and our CIFS is clustered CIFS.
    >>    I can understand it might be needed with Ganesha, but on every node? 
    >> 
    >> 
    >>    Why in the world would I be getting this daemon running on all 
    >> client nodes, when I didn?t install the ?protocols" version 
    >>    of the distribution?   We have release 4.2.2 at the moment.  How
    >> can we disable this?
    >> 
    >> 
    >>    Thanks,
    >>     ? ddj
    >> 
    >> 
    >> _______________________________________________
    >> gpfsug-discuss mailing list
    >> gpfsug-discuss at spectrumscale.org
    >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
    > 
    > 
    > 
    > _______________________________________________
    > gpfsug-discuss mailing list
    > gpfsug-discuss at spectrumscale.org
    > http://gpfsug.org/mailman/listinfo/gpfsug-discuss
    _______________________________________________
    gpfsug-discuss mailing list
    gpfsug-discuss at spectrumscale.org
    http://gpfsug.org/mailman/listinfo/gpfsug-discuss
    

From john.hearns at asml.com  Thu Jul 20 08:39:29 2017
From: john.hearns at asml.com (John Hearns)
Date: Thu, 20 Jul 2017 07:39:29 +0000
Subject: [gpfsug-discuss] mmsysmon.py revisited
In-Reply-To: <CF652304-B8BB-4A1F-9569-B4D2EFBF5DE4@colorado.edu>
References: <C4930E75-F866-4ACD-8DC3-9BA8423DF55C@brown.edu>
	<EE6EA194-2F89-4E20-9EDD-A9694AAB58F7@colorado.edu>
	<OFCA7D9A5E.C7B3505A-ONC1258162.00420361-C1258162.0047F174@notes.na.collabserv.com>
	<E1038B1F-CC46-4039-9F62-BA632D218E15@colorado.edu>
	<85E16CE6-FAF6-4C8E-80A9-3ED66580BD21@brown.edu>
	<CF652304-B8BB-4A1F-9569-B4D2EFBF5DE4@colorado.edu>
Message-ID: <HE1PR02MB14500889BBDC57D22924151288A70@HE1PR02MB1450.eurprd02.prod.outlook.com>

This is really interesting.
I know we can look at the interrupt rates of course, but is there a way we can quantify the effects of interrupts / OS jitter here?


-----Original Message-----
From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Jonathon A Anderson
Sent: Wednesday, July 19, 2017 8:29 PM
To: gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
Subject: Re: [gpfsug-discuss] mmsysmon.py revisited

OPA behaves _significantly_ differently from Mellanox IB. OPA uses the host CPU for packet processing, whereas Mellanox IB uses a discrete asic on the HBA. As a result, OPA is much more sensitive to task placement and interrupts, in our experience, because the host CPU load competes with the fabric IO processing load.

~jonathon


On 7/19/17, 12:12 PM, "gpfsug-discuss-bounces at spectrumscale.org on behalf of david_johnson at brown.edu" <gpfsug-discuss-bounces at spectrumscale.org on behalf of david_johnson at brown.edu> wrote:

    We have FDR14 Mellanox fabric, probably similar interrupt load as OPA.

      -- ddj
    Dave Johnson

    On Jul 19, 2017, at 1:52 PM, Jonathon A Anderson <jonathon.anderson at colorado.edu> wrote:

    >> It might be a problem specific to your system environment or a wrong configuration therefore please get in contact with IBM support to analyze the root cause of the high usage.
    >
    > I suspect it?s actually a result of frequent IO interrupts causing jitter in conflict with MPI on the shared Intel Omni-Path network, in our case.
    >
    > We?ve already tried pursuing support on this through our vendor, DDN, and got no-where. Eventually we were the ones who tried killing mmsysmon, and that fixed our problem.
    >
    > The official company line of ?we don't see significant CPU consumption by mmsysmon on our test systems? isn?t helping. Do you have a test system with OPA?
    >
    > ~jonathon
    >
    >
    > On 7/19/17, 7:05 AM, "gpfsug-discuss-bounces at spectrumscale.org on behalf of Mathias Dietz" <gpfsug-discuss-bounces at spectrumscale.org on behalf of MDIETZ at de.ibm.com> wrote:
    >
    >    thanks for the feedback.
    >
    >    Let me clarify what mmsysmon is doing.
    >    Since IBM Spectrum Scale 4.2.1 the mmsysmon process is used for the overall health monitoring and CES failover handling.
    >    Even without CES it is an essential part of the system because it monitors the individual components and provides health state information and error events.
    >
    >    This information is needed by other Spectrum Scale components (mmhealth command, the IBM Spectrum Scale GUI, Support tools, Install Toolkit,..) and therefore disabling mmsysmon will impact them.
    >
    >
    >> It?s a huge problem. I don?t understand why it hasn?t been given
    >
    >> much credit by dev or support.
    >
    >    Over the last couple of month, the development team has put a strong focus on this topic.
    >
    >    In order to monitor the health of the individual components, mmsysmon listens for notifications/callback but also has to do some polling.
    >    We are trying to reduce the polling overhead constantly and replace polling with notifications when possible.
    >
    >
    >    Several improvements have been added to 4.2.3, including the ability to configure the polling frequency to reduce the overhead. (mmhealth config interval)
    >
    >    See https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.ibm.com%2Fsupport%2Fknowledgecenter%2Fen%2FSTXKQY_4.2.3%2Fcom.ibm.spectrum.scale.v4r23.doc%2Fbl1adm_mmhealth.htm&data=01%7C01%7Cjohn.hearns%40asml.com%7C63ff97a625e4499cb8ba08d4ced41754%7Caf73baa8f5944eb2a39d93e96cad61fc%7C1&sdata=Uzdg4ogcQwidNfi8TMp%2FdCMqnSLTFxU4y8n2ub%2F28xQ%3D&reserved=0
    >    In addition a new option has been introduced to clock align the monitoring threads in order to reduce CPU jitter.
    >
    >
    >    Nevertheless, we don't see significant CPU consumption by mmsysmon on our test systems.
    >
    >    It might be a problem specific to your system environment or a wrong configuration therefore please get in contact with IBM support to analyze the root cause of the high usage.
    >
    >    Kind regards
    >
    >    Mathias Dietz
    >
    >    IBM Spectrum Scale - Release Lead Architect and RAS Architect
    >
    >
    >
    >    gpfsug-discuss-bounces at spectrumscale.org wrote on 07/18/2017 07:51:21 PM:
    >
    >> From: Jonathon A Anderson <jonathon.anderson at colorado.edu>
    >> To: gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
    >> Date: 07/18/2017 07:51 PM
    >> Subject: Re: [gpfsug-discuss] mmsysmon.py revisited
    >> Sent by: gpfsug-discuss-bounces at spectrumscale.org
    >>
    >> There?s no official way to cleanly disable it so far as I know yet;
    >> but you can defacto disable it by deleting /var/mmfs/mmsysmon/
    >> mmsysmonitor.conf.
    >>
    >> It?s a huge problem. I don?t understand why it hasn?t been given
    >> much credit by dev or support.
    >>
    >> ~jonathon
    >>
    >>
    >> On 7/18/17, 11:21 AM, "gpfsug-discuss-bounces at spectrumscale.org on
    >> behalf of David Johnson" <gpfsug-discuss-bounces at spectrumscale.org
    >> on behalf of david_johnson at brown.edu> wrote:
    >>
    >>
    >>
    >>
    >>    We also noticed a fair amount of CPU time accumulated by mmsysmon.py on
    >>    our diskless compute nodes. I read the earlier query, where it
    >> was answered:
    >>
    >>
    >>
    >>
    >>    ces == Cluster Export Services,  mmsysmon.py comes from
    >> mmcesmon. It is used for managing export services of GPFS. If it is
    >> killed,  your nfs/smb etc will be out of work.
    >>    Their overhead is small and they are very important. Don't
    >> attempt to kill them.
    >>
    >>
    >>
    >>
    >>
    >>
    >>    Our question is this ? we don?t run the latest ?protocols", our
    >> NFS is CNFS, and our CIFS is clustered CIFS.
    >>    I can understand it might be needed with Ganesha, but on every node?
    >>
    >>
    >>    Why in the world would I be getting this daemon running on all
    >> client nodes, when I didn?t install the ?protocols" version
    >>    of the distribution?   We have release 4.2.2 at the moment.  How
    >> can we disable this?
    >>
    >>
    >>    Thanks,
    >>     ? ddj
    >>
    >>
    >> _______________________________________________
    >> gpfsug-discuss mailing list
    >> gpfsug-discuss at spectrumscale.org
    >> https://emea01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fgpfsug.org%2Fmailman%2Flistinfo%2Fgpfsug-discuss&data=01%7C01%7Cjohn.hearns%40asml.com%7C63ff97a625e4499cb8ba08d4ced41754%7Caf73baa8f5944eb2a39d93e96cad61fc%7C1&sdata=9yatmHdcxwy%2FD%2FoZuLVDkPjpIeS9F7crTLl2MoUUIyo%3D&reserved=0
    >
    >
    >
    > _______________________________________________
    > gpfsug-discuss mailing list
    > gpfsug-discuss at spectrumscale.org
    > https://emea01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fgpfsug.org%2Fmailman%2Flistinfo%2Fgpfsug-discuss&data=01%7C01%7Cjohn.hearns%40asml.com%7C63ff97a625e4499cb8ba08d4ced41754%7Caf73baa8f5944eb2a39d93e96cad61fc%7C1&sdata=9yatmHdcxwy%2FD%2FoZuLVDkPjpIeS9F7crTLl2MoUUIyo%3D&reserved=0
    _______________________________________________
    gpfsug-discuss mailing list
    gpfsug-discuss at spectrumscale.org
    https://emea01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fgpfsug.org%2Fmailman%2Flistinfo%2Fgpfsug-discuss&data=01%7C01%7Cjohn.hearns%40asml.com%7C63ff97a625e4499cb8ba08d4ced41754%7Caf73baa8f5944eb2a39d93e96cad61fc%7C1&sdata=9yatmHdcxwy%2FD%2FoZuLVDkPjpIeS9F7crTLl2MoUUIyo%3D&reserved=0


_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
https://emea01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fgpfsug.org%2Fmailman%2Flistinfo%2Fgpfsug-discuss&data=01%7C01%7Cjohn.hearns%40asml.com%7C63ff97a625e4499cb8ba08d4ced41754%7Caf73baa8f5944eb2a39d93e96cad61fc%7C1&sdata=9yatmHdcxwy%2FD%2FoZuLVDkPjpIeS9F7crTLl2MoUUIyo%3D&reserved=0
-- The information contained in this communication and any attachments is confidential and may be privileged, and is for the sole use of the intended recipient(s). Any unauthorized review, use, disclosure or distribution is prohibited. Unless explicitly stated otherwise in the body of this communication or the attachment thereto (if any), the information is provided on an AS-IS basis without any express or implied warranties or liabilities. To the extent you are relying on this information, you are doing so at your own risk. If you are not the intended recipient, please notify the sender immediately by replying to this message and destroy all copies of this message and any attachments. Neither the sender nor the company/group of companies he or she represents shall be liable for the proper and complete transmission of the information contained in this communication, or for any delay in its receipt.

From MDIETZ at de.ibm.com  Thu Jul 20 10:30:50 2017
From: MDIETZ at de.ibm.com (Mathias Dietz)
Date: Thu, 20 Jul 2017 11:30:50 +0200
Subject: [gpfsug-discuss] mmsysmon.py revisited
In-Reply-To: <E1038B1F-CC46-4039-9F62-BA632D218E15@colorado.edu>
References: <C4930E75-F866-4ACD-8DC3-9BA8423DF55C@brown.edu><EE6EA194-2F89-4E20-9EDD-A9694AAB58F7@colorado.edu><OFCA7D9A5E.C7B3505A-ONC1258162.00420361-C1258162.0047F174@notes.na.collabserv.com>
	<E1038B1F-CC46-4039-9F62-BA632D218E15@colorado.edu>
Message-ID: <OF7084CAD2.8545C7B8-ONC1258163.00332F70-C1258163.003442A6@notes.na.collabserv.com>

Jonathon,

its important to separate the two issues "high CPU consumption" and "CPU 
Jitter".

As mentioned, we are aware of the CPU jitter issue and already put several 
improvements in place. (more to come with the next release)
Did you try with a lower polling frequency and/or enabling clock alignment 
as Mike suggested ? 

Non-MPI workloads are usually not impacted by CPU jitter, but might be 
impacted by high CPU consumption. 
But we don't see such such high CPU consumption in the lab and therefore 
ask affected customers to get in contact with IBM support to find the root 
cause. 


Kind regards
 
Mathias Dietz
 
IBM Spectrum Scale - Release Lead Architect and RAS Architect
 

gpfsug-discuss-bounces at spectrumscale.org wrote on 07/19/2017 07:52:14 PM:

> From: Jonathon A Anderson <jonathon.anderson at colorado.edu>
> To: gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
> Date: 07/19/2017 07:52 PM
> Subject: Re: [gpfsug-discuss] mmsysmon.py revisited
> Sent by: gpfsug-discuss-bounces at spectrumscale.org
> 
> > It might be a problem specific to your system environment or a 
> wrong configuration therefore please get in contact with IBM support
> to analyze the root cause of the high usage.
> 
> I suspect it?s actually a result of frequent IO interrupts causing 
> jitter in conflict with MPI on the shared Intel Omni-Path network, 
> in our case.
> 
> We?ve already tried pursuing support on this through our vendor, 
> DDN, and got no-where. Eventually we were the ones who tried killing
> mmsysmon, and that fixed our problem.
> 
> The official company line of ?we don't see significant CPU 
> consumption by mmsysmon on our test systems? isn?t helping. Do you 
> have a test system with OPA?
> 
> ~jonathon
> 
> 
> On 7/19/17, 7:05 AM, "gpfsug-discuss-bounces at spectrumscale.org on 
> behalf of Mathias Dietz" <gpfsug-discuss-bounces at spectrumscale.org 
> on behalf of MDIETZ at de.ibm.com> wrote:
> 
>     thanks for the feedback. 
> 
>     Let me clarify what mmsysmon is doing.
>     Since IBM Spectrum Scale 4.2.1 the mmsysmon process is used for 
> the overall health monitoring and CES failover handling.
>     Even without CES it is an essential part of the system because 
> it monitors the individual components and provides health state 
> information and error events.
> 
>     This information is needed by other Spectrum Scale components 
> (mmhealth command, the IBM Spectrum Scale GUI, Support tools, 
> Install Toolkit,..) and therefore disabling mmsysmon will impact them.
> 
> 
>     > It?s a huge problem. I don?t understand why it hasn?t been given
> 
>     > much credit by dev or support.
> 
>     Over the last couple of month, the development team has put a 
> strong focus on this topic.
> 
>     In order to monitor the health of the individual components, 
> mmsysmon listens for notifications/callback but also has to do some 
polling.
>     We are trying to reduce the polling overhead constantly and 
> replace polling with notifications when possible.
> 
> 
>     Several improvements have been added to 4.2.3, including the 
> ability to configure the polling frequency to reduce the overhead. 
> (mmhealth config interval)
> 
>     See https://www.ibm.com/support/knowledgecenter/en/STXKQY_4.2.3/
> com.ibm.spectrum.scale.v4r23.doc/bl1adm_mmhealth.htm
>     In addition a new option has been introduced to clock align the 
> monitoring threads in order to reduce CPU jitter.
> 
> 
>     Nevertheless, we don't see significant CPU consumption by 
> mmsysmon on our test systems.
> 
>     It might be a problem specific to your system environment or a 
> wrong configuration therefore please get in contact with IBM support
> to analyze the root cause of the high usage.
> 
>     Kind regards
> 
>     Mathias Dietz
> 
>     IBM Spectrum Scale - Release Lead Architect and RAS Architect
> 
> 
> 
>     gpfsug-discuss-bounces at spectrumscale.org wrote on 07/18/2017 
07:51:21 PM:
> 
>     > From: Jonathon A Anderson <jonathon.anderson at colorado.edu>
>     > To: gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
>     > Date: 07/18/2017 07:51 PM
>     > Subject: Re: [gpfsug-discuss] mmsysmon.py revisited
>     > Sent by: gpfsug-discuss-bounces at spectrumscale.org
>     > 
>     > There?s no official way to cleanly disable it so far as I know 
yet; 
>     > but you can defacto disable it by deleting /var/mmfs/mmsysmon/
>     > mmsysmonitor.conf.
>     > 
>     > It?s a huge problem. I don?t understand why it hasn?t been given 
>     > much credit by dev or support.
>     > 
>     > ~jonathon
>     > 
>     > 
>     > On 7/18/17, 11:21 AM, "gpfsug-discuss-bounces at spectrumscale.org on 

>     > behalf of David Johnson" <gpfsug-discuss-bounces at spectrumscale.org 

>     > on behalf of david_johnson at brown.edu> wrote:
>     > 
>     > 
>     > 
>     > 
>     >     We also noticed a fair amount of CPU time accumulated by 
> mmsysmon.py on
>     >     our diskless compute nodes. I read the earlier query, where it 

>     > was answered:
>     > 
>     > 
>     > 
>     > 
>     >     ces == Cluster Export Services,  mmsysmon.py comes from 
>     > mmcesmon. It is used for managing export services of GPFS. If it 
is 
>     > killed,  your nfs/smb etc will be out of work.
>     >     Their overhead is small and they are very important. Don't 
>     > attempt to kill them.
>     > 
>     > 
>     > 
>     > 
>     > 
>     > 
>     >     Our question is this ? we don?t run the latest ?protocols", 
our 
>     > NFS is CNFS, and our CIFS is clustered CIFS.
>     >     I can understand it might be needed with Ganesha, but on 
> every node? 
>     > 
>     > 
>     >     Why in the world would I be getting this daemon running on all 

>     > client nodes, when I didn?t install the ?protocols" version 
>     >     of the distribution?   We have release 4.2.2 at the moment. 
How
>     > can we disable this?
>     > 
>     > 
>     >     Thanks,
>     >      ? ddj
>     > 
>     > 
>     > _______________________________________________
>     > gpfsug-discuss mailing list
>     > gpfsug-discuss at spectrumscale.org
>     > http://gpfsug.org/mailman/listinfo/gpfsug-discuss
> 
> 
> 
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20170720/0d83cb75/attachment.htm>

From cgirda at wustl.edu  Thu Jul 20 15:57:14 2017
From: cgirda at wustl.edu (Chakravarthy Girda)
Date: Thu, 20 Jul 2017 09:57:14 -0500
Subject: [gpfsug-discuss] pmmonitor - ERROR
Message-ID: <40d1f871-6081-4db3-8d6a-cec816266a00@wustl.edu>

Hi There,

  I was running a bridge port services to push my stats to grafana. It
was running fine until we started some rigorous  IOPS testing on the
cluster.  Now its failing to start with the following error.

Questions:

 1. Any clues on it fix?
 2. Is there anyway I can run this in a service/daemon mode rather than
running in a screen session?


[root at linuscs107 zimonGrafanaIntf]# python zimonGrafanaIntf.py -s
linuscs107.gsc.wustl.edu
Failed to initialize MetadataHandler, please check log file for reason

#cat pmmonitor.log

2017-07-20 09:41:29,384 - pmmonitor - ERROR - GPFSCapacityUtil::execute:
Error sending query in execute, quitting
2017-07-20 09:41:29,552 - pmmonitor - ERROR - GPFSCapacityUtil::execute:
Error sending query in execute, quitting
2017-07-20 09:41:30,047 - pmmonitor - ERROR - GPFSCapacityUtil::execute:
Error sending query in execute, quitting


Thank you
Chakri


From Robert.Oesterlin at nuance.com  Thu Jul 20 16:06:48 2017
From: Robert.Oesterlin at nuance.com (Oesterlin, Robert)
Date: Thu, 20 Jul 2017 15:06:48 +0000
Subject: [gpfsug-discuss] mmsysmon and CCR
Message-ID: <B3395B8C-5DE7-4128-A5C3-2DF449EB04FE@nuance.com>

I recently ran into an issue where the frequency of mmsysmon polling (GPFS 4.2.2) was causing issues with CCR updates. I eventually ended decreasing the polling interval to 30 mins (I don?t have any CES) which seemed to solve the issue. So, if you have a large cluster, be on the lookout for CCR issues, if you have that configured.

Bob Oesterlin
Sr Principal Storage Engineer, Nuance


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20170720/4cb3fc20/attachment.htm>

From cgirda at wustl.edu  Thu Jul 20 17:38:25 2017
From: cgirda at wustl.edu (Chakravarthy Girda)
Date: Thu, 20 Jul 2017 11:38:25 -0500
Subject: [gpfsug-discuss] pmmonitor - ERROR
In-Reply-To: <40d1f871-6081-4db3-8d6a-cec816266a00@wustl.edu>
References: <40d1f871-6081-4db3-8d6a-cec816266a00@wustl.edu>
Message-ID: <31b9b441-f51c-c0d1-11e0-b01a070f9e4e@wustl.edu>


cat zserver.log

2017-07-20 11:21:59,001 - zimonGrafanaIntf - ERROR - Could not
initialize the QueryHandler, GetHandler::initializeTables failed (errno:
None, errmsg: Unable to connect to 10.100.3.150 on port 9084, error
number: 111, error code: ECONNREFUSED)
2017-07-20 11:32:29,090 - zimonGrafanaIntf - ERROR - Could not
initialize the QueryHandler, GetHandler::initializeTables failed (errno:
None, errmsg: Unable to connect to 10.100.3.150 on port 9084, error
number: 111, error code: ECONNREFUSED)

Thank you
Chakri

On 7/20/17 9:57 AM, Chakravarthy Girda wrote:
> Hi There,
>
>   I was running a bridge port services to push my stats to grafana. It
> was running fine until we started some rigorous  IOPS testing on the
> cluster.  Now its failing to start with the following error.
>
> Questions:
>
>  1. Any clues on it fix?
>  2. Is there anyway I can run this in a service/daemon mode rather than
> running in a screen session?
>
>
> [root at linuscs107 zimonGrafanaIntf]# python zimonGrafanaIntf.py -s
> linuscs107.gsc.wustl.edu
> Failed to initialize MetadataHandler, please check log file for reason
>
> #cat pmmonitor.log
>
> 2017-07-20 09:41:29,384 - pmmonitor - ERROR - GPFSCapacityUtil::execute:
> Error sending query in execute, quitting
> 2017-07-20 09:41:29,552 - pmmonitor - ERROR - GPFSCapacityUtil::execute:
> Error sending query in execute, quitting
> 2017-07-20 09:41:30,047 - pmmonitor - ERROR - GPFSCapacityUtil::execute:
> Error sending query in execute, quitting
>
>
> Thank you
> Chakri
>
>
>
>
>


From Robert.Oesterlin at nuance.com  Thu Jul 20 17:50:12 2017
From: Robert.Oesterlin at nuance.com (Oesterlin, Robert)
Date: Thu, 20 Jul 2017 16:50:12 +0000
Subject: [gpfsug-discuss] pmmonitor - ERROR
Message-ID: <3377EEBD-7F1A-4269-83EE-458362762457@nuance.com>

This looks like the Grafana bridge could not connect to the pmcollector process - is it running normally? See if some of the normal ?mmprefmon? commands work and/or look at the log file on the pmcollector node. (under /var/log/zimon)

You will also see this node when the pmcollector process is still initializing. (reading in the existing database, not ready to service requests)

Bob Oesterlin
Sr Principal Storage Engineer, Nuance

 
On 7/20/17, 11:38 AM, "gpfsug-discuss-bounces at spectrumscale.org on behalf of Chakravarthy Girda" <gpfsug-discuss-bounces at spectrumscale.org on behalf of cgirda at wustl.edu> wrote:

    2017-07-20 11:32:29,090 - zimonGrafanaIntf - ERROR - Could not initialize the QueryHandler, GetHandler::initializeTables failed (errno: None, errmsg: Unable to connect to 10.100.3.150 on port 9084, error number: 111, error code: ECONNREFUSED)
    
    
From mdharris at us.ibm.com  Thu Jul 20 17:55:56 2017
From: mdharris at us.ibm.com (Michael D Harris)
Date: Thu, 20 Jul 2017 12:55:56 -0400
Subject: [gpfsug-discuss] gpfsug-discuss Digest, Vol 66, Issue 34
In-Reply-To: <mailman.3213.1500568712.1005.gpfsug-discuss@spectrumscale.org>
References: <mailman.3213.1500568712.1005.gpfsug-discuss@spectrumscale.org>
Message-ID: <OF8304A6E3.44428146-ON85258163.005CC169-85258163.005D0303@notes.na.collabserv.com>

Hi Bob,

The CCR monitor interval is addressed in 4.2.3 or 4.2.3 ptf1

Regards, Mike Harris

Spectrum Scale Development - Core Team
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20170720/7fd98cca/attachment.htm>

From cgirda at wustl.edu  Thu Jul 20 18:12:09 2017
From: cgirda at wustl.edu (Chakravarthy Girda)
Date: Thu, 20 Jul 2017 12:12:09 -0500
Subject: [gpfsug-discuss] pmmonitor - ERROR
In-Reply-To: <3377EEBD-7F1A-4269-83EE-458362762457@nuance.com>
References: <3377EEBD-7F1A-4269-83EE-458362762457@nuance.com>
Message-ID: <b3f7749f-7289-a664-8755-07d5338ffb44@wustl.edu>

Bob, 

 Your correct. Found the issues with pmcollector services. Fixed issues
with pmcollector, resolved the issues.


Thank you

Chakri


On 7/20/17 11:50 AM, Oesterlin, Robert wrote:
> You will also see this node when the pmcollector process is still initializing. (reading in the existing database, not ready to service requests)


From cgirda at wustl.edu  Thu Jul 20 18:30:03 2017
From: cgirda at wustl.edu (Chakravarthy Girda)
Date: Thu, 20 Jul 2017 12:30:03 -0500
Subject: [gpfsug-discuss] pmmonitor - ERROR
In-Reply-To: <b3f7749f-7289-a664-8755-07d5338ffb44@wustl.edu>
References: <3377EEBD-7F1A-4269-83EE-458362762457@nuance.com>
	<b3f7749f-7289-a664-8755-07d5338ffb44@wustl.edu>
Message-ID: <576fe053-e356-dc5c-e71d-18ef96f7ccaa@wustl.edu>

Bob,

 Actually the pmcollector service died in 5min.

2017-07-20 12:11:29,639 - pmmonitor - WARNING - GPFSCapacityUtil:
received zero inode/pool total size
2017-07-20 12:16:29,470 - pmmonitor - WARNING - GPFSCapacityUtil:
received zero inode/pool total size
2017-07-20 12:16:29,639 - pmmonitor - WARNING - GPFSCapacityUtil:
received zero inode/pool total size
2017-07-20 12:21:29,384 - pmmonitor - ERROR - QueryHandler: Socket
connection broken, received no data
2017-07-20 12:21:29,384 - pmmonitor - ERROR - GPFSCapacityUtil::execute:
Error sending query in execute, quitting
2017-07-20 12:21:29,552 - pmmonitor - ERROR - GPFSCapacityUtil::execute:
Error sending query in execute, quitting
2017-07-20 12:21:30,047 - pmmonitor - ERROR - GPFSCapacityUtil::execute:
Error sending query in execute, quitting

Thank you
Chakri


On 7/20/17 12:12 PM, Chakravarthy Girda wrote:
> Bob, 
>
>  Your correct. Found the issues with pmcollector services. Fixed issues
> with pmcollector, resolved the issues.
>
>
> Thank you
>
> Chakri
>
>
> On 7/20/17 11:50 AM, Oesterlin, Robert wrote:
>> You will also see this node when the pmcollector process is still initializing. (reading in the existing database, not ready to service requests)


From cgirda at wustl.edu  Thu Jul 20 21:03:56 2017
From: cgirda at wustl.edu (Chakravarthy Girda)
Date: Thu, 20 Jul 2017 15:03:56 -0500
Subject: [gpfsug-discuss] pmmonitor - ERROR
In-Reply-To: <576fe053-e356-dc5c-e71d-18ef96f7ccaa@wustl.edu>
References: <3377EEBD-7F1A-4269-83EE-458362762457@nuance.com>
	<b3f7749f-7289-a664-8755-07d5338ffb44@wustl.edu>
	<576fe053-e356-dc5c-e71d-18ef96f7ccaa@wustl.edu>
Message-ID: <d12416af-feee-fd1a-8e6b-e1a1b44fe28c@wustl.edu>

For now I switched the "zimonGrafanaIntf" to port "4262". So far it
didn't crash the pmcollector. Will wait for some more time to ensure its
working.

* Can we start this process in a daemon or service mode?


Thank you
Chakri


On 7/20/17 12:30 PM, Chakravarthy Girda wrote:
> Bob,
>
>  Actually the pmcollector service died in 5min.
>
> 2017-07-20 12:11:29,639 - pmmonitor - WARNING - GPFSCapacityUtil:
> received zero inode/pool total size
> 2017-07-20 12:16:29,470 - pmmonitor - WARNING - GPFSCapacityUtil:
> received zero inode/pool total size
> 2017-07-20 12:16:29,639 - pmmonitor - WARNING - GPFSCapacityUtil:
> received zero inode/pool total size
> 2017-07-20 12:21:29,384 - pmmonitor - ERROR - QueryHandler: Socket
> connection broken, received no data
> 2017-07-20 12:21:29,384 - pmmonitor - ERROR - GPFSCapacityUtil::execute:
> Error sending query in execute, quitting
> 2017-07-20 12:21:29,552 - pmmonitor - ERROR - GPFSCapacityUtil::execute:
> Error sending query in execute, quitting
> 2017-07-20 12:21:30,047 - pmmonitor - ERROR - GPFSCapacityUtil::execute:
> Error sending query in execute, quitting
>
> Thank you
> Chakri
>
>
> On 7/20/17 12:12 PM, Chakravarthy Girda wrote:
>> Bob, 
>>
>>  Your correct. Found the issues with pmcollector services. Fixed issues
>> with pmcollector, resolved the issues.
>>
>>
>> Thank you
>>
>> Chakri
>>
>>
>> On 7/20/17 11:50 AM, Oesterlin, Robert wrote:
>>> You will also see this node when the pmcollector process is still initializing. (reading in the existing database, not ready to service requests)
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss


From cgirda at wustl.edu  Thu Jul 20 21:42:09 2017
From: cgirda at wustl.edu (Chakravarthy Girda)
Date: Thu, 20 Jul 2017 15:42:09 -0500
Subject: [gpfsug-discuss] zimonGrafanaIntf template variable
Message-ID: <00372fdc-a0b7-26ac-84c1-aa32c78e4261@wustl.edu>

Hi,

 I imported the pre-built grafana dashboard.

https://www.ibm.com/developerworks/community/wikis/form/anonymous/api/wiki/fa32927c-e904-49cc-a4cc-870bcc8e307c/page/a180eb7e-9161-4e07-a6e4-35a0a076f7b3/attachment/5e9a5886-5bd9-4a6f-919e-bc66d16760cf/media/default%20dashboards%20set.zip

Get updates from few graphs but not all. I realize that I need to update
the template variables.

Eg:-

 I get into the  "File Systems View"

Variable ( gpfsMetrics_fs1 ) -->

        Query ( gpfsMetrics_fs1 )
        Regex (
/.*[^gpfs_fs_inode_used|gpfs_fs_inode_alloc|gpfs_fs_inode_free|gpfs_fs_inode_max]/
)


Question:

  * How can I execute the above Query and regex to fix the issues.
  * Is there any document on CLI options?

Thank you
Chakri

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20170720/c2884f8c/attachment.htm>

From valdis.kletnieks at vt.edu  Fri Jul 21 22:13:17 2017
From: valdis.kletnieks at vt.edu (valdis.kletnieks at vt.edu)
Date: Fri, 21 Jul 2017 17:13:17 -0400
Subject: [gpfsug-discuss] GPFS, LTFS/EE and data-in-inode?
Message-ID: <28986.1500671597@turing-police.cc.vt.edu>

So we're running GPFS 4.2.2.3 and LTFS/EE 1.2.3 to use as an archive service.
Inode size is 4K, and we had a requirement to encrypt-at-rest, so encryption
is in play as well.  Data is replicated 2x and fragment size is 32K.

I was investigating how much data-in-inode would help deal with users who put
large trees of small files into the archive (yes, I know we can use applypolicy
with external programs to tarball offending directories, but that's a separate
discussion ;)

## ls -ls *
64 -rw-r--r-- 1 root root 2048 Jul 21 14:47 random.data
64 -rw-r--r-- 1 root root  512 Jul 21 14:48 random.data.1
64 -rw-r--r-- 1 root root  128 Jul 21 14:50 random.data.2
64 -rw-r--r-- 1 root root   32 Jul 21 14:50 random.data.3
64 -rw-r--r-- 1 root root   16 Jul 21 14:50 random.data.4

Hmm.. I was expecting at least *some* of these to fit in the inode, and
not take 2 32K blocks...

## mmlsattr -d -L random.data.4
file name:            random.data.4
metadata replication: 2 max 2
data replication:     2 max 2
immutable:            no
appendOnly:           no
flags:
storage pool name:    system
fileset name:         root
snapshot name:
creation time:        Fri Jul 21 14:50:51 2017
Misc attributes:      ARCHIVE
Encrypted:            yes
gpfs.Encryption:      0x4541 (... another 296 hex digits)
EncPar 'AES:256:XTS:FEK:HMACSHA512'
	type: wrapped FEK  WrpPar 'AES:KWRAP'  CmbPar 'XORHMACSHA512'
		KEY-97c7f4b7-06cb-4a53-b317-1c187432dc62:archKEY1_gpfsG1

Hmm.. Doesn't *look* like enough extended attributes to prevent storing even
16 bytes in the inode, should be room for around 3.5K minus the above 250 bytes
or so of attributes....

What am I missing here? Does "encrypted" or LTFS/EE disable data-in-inode?
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 486 bytes
Desc: not available
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20170721/138ec81e/attachment.sig>

From oehmes at gmail.com  Fri Jul 21 23:04:32 2017
From: oehmes at gmail.com (Sven Oehme)
Date: Fri, 21 Jul 2017 22:04:32 +0000
Subject: [gpfsug-discuss] GPFS, LTFS/EE and data-in-inode?
In-Reply-To: <28986.1500671597@turing-police.cc.vt.edu>
References: <28986.1500671597@turing-police.cc.vt.edu>
Message-ID: <CALssuR36fdFdpus2Js=TzfiH5oNwMYkYjN71gL=OzzoXq3z0Vg@mail.gmail.com>

Hi,

i talked with a few others to confirm this, but unfortunate this is a
limitation of the code today (maybe not well documented which we will look
into). Encryption only encrypts data blocks, it doesn't encrypt metadata.
 Hence, if encryption is enabled, we don't store data in the inode, because
then it wouldn't be encrypted.  For the same reason HAWC and encryption are
incompatible.

Sven


On Fri, Jul 21, 2017 at 2:13 PM <valdis.kletnieks at vt.edu> wrote:

> So we're running GPFS 4.2.2.3 and LTFS/EE 1.2.3 to use as an archive
> service.
> Inode size is 4K, and we had a requirement to encrypt-at-rest, so
> encryption
> is in play as well.  Data is replicated 2x and fragment size is 32K.
>
> I was investigating how much data-in-inode would help deal with users who
> put
> large trees of small files into the archive (yes, I know we can use
> applypolicy
> with external programs to tarball offending directories, but that's a
> separate
> discussion ;)
>
> ## ls -ls *
> 64 -rw-r--r-- 1 root root 2048 Jul 21 14:47 random.data
> 64 -rw-r--r-- 1 root root  512 Jul 21 14:48 random.data.1
> 64 -rw-r--r-- 1 root root  128 Jul 21 14:50 random.data.2
> 64 -rw-r--r-- 1 root root   32 Jul 21 14:50 random.data.3
> 64 -rw-r--r-- 1 root root   16 Jul 21 14:50 random.data.4
>
> Hmm.. I was expecting at least *some* of these to fit in the inode, and
> not take 2 32K blocks...
>
> ## mmlsattr -d -L random.data.4
> file name:            random.data.4
> metadata replication: 2 max 2
> data replication:     2 max 2
> immutable:            no
> appendOnly:           no
> flags:
> storage pool name:    system
> fileset name:         root
> snapshot name:
> creation time:        Fri Jul 21 14:50:51 2017
> Misc attributes:      ARCHIVE
> Encrypted:            yes
> gpfs.Encryption:      0x4541 (... another 296 hex digits)
> EncPar 'AES:256:XTS:FEK:HMACSHA512'
>         type: wrapped FEK  WrpPar 'AES:KWRAP'  CmbPar 'XORHMACSHA512'
>                 KEY-97c7f4b7-06cb-4a53-b317-1c187432dc62:archKEY1_gpfsG1
>
> Hmm.. Doesn't *look* like enough extended attributes to prevent storing
> even
> 16 bytes in the inode, should be room for around 3.5K minus the above 250
> bytes
> or so of attributes....
>
> What am I missing here? Does "encrypted" or LTFS/EE disable data-in-inode?
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20170721/a8ab6e7c/attachment.htm>

From valdis.kletnieks at vt.edu  Fri Jul 21 23:24:13 2017
From: valdis.kletnieks at vt.edu (valdis.kletnieks at vt.edu)
Date: Fri, 21 Jul 2017 18:24:13 -0400
Subject: [gpfsug-discuss] GPFS, LTFS/EE and data-in-inode?
In-Reply-To: <CALssuR36fdFdpus2Js=TzfiH5oNwMYkYjN71gL=OzzoXq3z0Vg@mail.gmail.com>
References: <28986.1500671597@turing-police.cc.vt.edu>
	<CALssuR36fdFdpus2Js=TzfiH5oNwMYkYjN71gL=OzzoXq3z0Vg@mail.gmail.com>
Message-ID: <33069.1500675853@turing-police.cc.vt.edu>

On Fri, 21 Jul 2017 22:04:32 -0000, Sven Oehme said:
> i talked with a few others to confirm this, but unfortunate this is a
> limitation of the code today (maybe not well documented which we will look
> into). Encryption only encrypts data blocks, it doesn't encrypt metadata.
>  Hence, if encryption is enabled, we don't store data in the inode, because
> then it wouldn't be encrypted.  For the same reason HAWC and encryption are
> incompatible.

I can live with that restriction if it's documented better, thanks...
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 486 bytes
Desc: not available
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20170721/3f7d0898/attachment.sig>

From p.childs at qmul.ac.uk  Mon Jul 24 10:29:49 2017
From: p.childs at qmul.ac.uk (Peter Childs)
Date: Mon, 24 Jul 2017 09:29:49 +0000
Subject: [gpfsug-discuss] Gpfs Memory Usaage Keeps going up and we don't
	know why.
Message-ID: <1500888588.571.3.camel@qmul.ac.uk>

We have two GPFS clusters.

One is fairly old and running 4.2.1-2 and non CCR and the nodes run
fine using up about 1.5G of memory and is consistent (GPFS pagepool is
set to 1G, so that looks about right.)

The other one is "newer" running 4.2.1-3 with CCR and the nodes keep
increasing in there memory usage, starting at about 1.1G and are find
for a few days however after a while they grow to 4.2G which when the
node need to run real work, means the work can't be done.

I'm losing track of what maybe different other than CCR, and I'm trying
to find some more ideas of where to look.

I'm checked all the standard things like pagepool and maxFilesToCache
(set to the default of 4000), workerThreads is set to 128 on the new
gpfs cluster (against default 48 on the old) 

I'm not sure what else to look at on this one hence why I'm asking the
community.

Thanks in advance

Peter Childs
ITS Research Storage
Queen Mary University of London.

From ilan84 at gmail.com  Mon Jul 24 11:36:41 2017
From: ilan84 at gmail.com (Ilan Schwarts)
Date: Mon, 24 Jul 2017 13:36:41 +0300
Subject: [gpfsug-discuss] export nfs share on gpfs with no authentication
Message-ID: <CAJUuSvFSmmgEzgT+ku_kHZhLAHUVefGTg1RqR3uZz1P2snr1vA@mail.gmail.com>

Hi,
I have gpfs with 2 Nodes (redhat).
I am trying to create NFS share - So I would be able to mount and
access it from another linux machine.

I receive error: Current authentication: none is invalid.
What do i need to configure ?
PLEASE NOTE: I dont have the SMB package at the moment, I dont want
authentication on the NFS export..

While trying to create NFS (I execute the following):
[root at LH20-GPFS1 ~]# mmnfs export add /fs_gpfs01 -c "*
Access_Type=RW,Protocols=3:4,Squash=no_root_squash)"

I receive the following error:
[root at LH20-GPFS1 ~]# mmnfs export add /fs_gpfs01 -c
"*(Access_Type=RW,Protocols=3:4,Squash=no_root_squash)"
mmcesfuncs.sh: Current authentication: none is invalid.
This operation can not be completed without correct Authentication
configuration.
Configure authentication using:   mmuserauth
mmnfs export add: Command failed. Examine previous error messages to
determine cause.


[root at LH20-GPFS1 ~]# mmuserauth service list
FILE access not configured
PARAMETERS               VALUES
-------------------------------------------------

OBJECT access not configured
PARAMETERS               VALUES
-------------------------------------------------
[root at LH20-GPFS1 ~]#


Some additional information on cluster:
==============================
[root at LH20-GPFS1 ~]# mmlsmgr
file system      manager node
---------------- ------------------
fs_gpfs01        10.10.158.61 (LH20-GPFS1)
Cluster manager node: 10.10.158.61 (LH20-GPFS1)
[root at LH20-GPFS1 ~]# mmgetstate -a
 Node number  Node name        GPFS state
------------------------------------------
       1      LH20-GPFS1       active
       3      LH20-GPFS2       active
[root at LH20-GPFS1 ~]# mmlscluster

GPFS cluster information
========================
  GPFS cluster name:         LH20-GPFS1
  GPFS cluster id:           10777108240438931454
  GPFS UID domain:           LH20-GPFS1
  Remote shell command:      /usr/bin/ssh
  Remote file copy command:  /usr/bin/scp
  Repository type:           CCR

 Node  Daemon node name  IP address    Admin node name  Designation
--------------------------------------------------------------------
   1   LH20-GPFS1        10.10.158.61  LH20-GPFS1       quorum-manager
   3   LH20-GPFS2        10.10.158.62  LH20-GPFS2       quorum


From jonathan at buzzard.me.uk  Mon Jul 24 12:43:10 2017
From: jonathan at buzzard.me.uk (Jonathan Buzzard)
Date: Mon, 24 Jul 2017 12:43:10 +0100
Subject: [gpfsug-discuss] GPFS, LTFS/EE and data-in-inode?
In-Reply-To: <28986.1500671597@turing-police.cc.vt.edu>
References: <28986.1500671597@turing-police.cc.vt.edu>
Message-ID: <1500896590.4387.167.camel@buzzard.me.uk>

On Fri, 2017-07-21 at 17:13 -0400, valdis.kletnieks at vt.edu wrote:
> So we're running GPFS 4.2.2.3 and LTFS/EE 1.2.3 to use as an archive service.
> Inode size is 4K, and we had a requirement to encrypt-at-rest, so encryption
> is in play as well.  Data is replicated 2x and fragment size is 32K.
> 

For an archive service how about only accepting files in actual
"archive" formats and then severely restricting the number of files a
user can have?

By archive files I am thinking like a .zip, tar.gz, tar.bz or similar.

Has a number of effects. Firstly it makes the files "big" so they move
to tape efficiently. It also makes it less likely the end user will try
and use it as an general purpose file server. As it's an archive there
should be no problem for the user to bundle all the files into a .zip
file or similar. Noting that Windows Vista and up handle ZIP64 files
getting around the older 4GB and 65k files limit.

JAB.

-- 
Jonathan A. Buzzard                 Email: jonathan (at) buzzard.me.uk
Fife, United Kingdom.


From stefan.dietrich at desy.de  Mon Jul 24 13:19:47 2017
From: stefan.dietrich at desy.de (Dietrich, Stefan)
Date: Mon, 24 Jul 2017 14:19:47 +0200 (CEST)
Subject: [gpfsug-discuss] Scale / Perfmon / Grafana - services
	running	but	no data
In-Reply-To: <d821af5f137b42bd8c0be3585adcdd55@exch1-cdc.nexus.csiro.au>
References: <OF6A8EFA7C.5958941C-ON00258155.00193257-00258155.001C1E4B@notes.na.collabserv.com>
	<d821af5f137b42bd8c0be3585adcdd55@exch1-cdc.nexus.csiro.au>
Message-ID: <1981958989.2609398.1500898787132.JavaMail.zimbra@desy.de>

Yep, have look at this Gist [1]
The unit files assumes some paths and users, which are created during the installation of my RPM.

[1] https://gist.github.com/stdietrich/b3b985f872ea648d6c03bb6249c44e72

Regards,
Stefan

----- Original Message -----
> From: "Greg Lehmann" <Greg.Lehmann at csiro.au>
> To: gpfsug-discuss at spectrumscale.org
> Sent: Wednesday, July 19, 2017 9:53:58 AM
> Subject: Re: [gpfsug-discuss] Scale / Perfmon / Grafana - services running	but	no data

> I?m having a play with this now too. Has anybody coded a systemd unit to handle
> step 2b in the knowledge centre article ? bridge creation on the gpfs side? It
> would save me a bit of effort.
> 
> 
> 
> I?m also wondering about the CherryPy version. It looks like this has been
> developed on SLES which has the newer version mentioned as a standard package
> and yet RHEL with an older version of CherryPy is perhaps more common as it
> seems to have the best support for features of GPFS, like object and block
> protocols. Maybe SLES is in favour now?
> 
> 
> 
> Cheers,
> 
> 
> 
> Greg
> 
> 
> 
> From: gpfsug-discuss-bounces at spectrumscale.org
> [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Andrew Beattie
> Sent: Thursday, 6 July 2017 3:07 PM
> To: gpfsug-discuss at spectrumscale.org
> Subject: [gpfsug-discuss] Scale / Perfmon / Grafana - services running but no
> data
> 
> 
> 
> 
> Greetings,
> 
> 
> 
> 
> 
> 
> 
> 
> I'm currently setting up Grafana to interact with one of our Scale Clusters
> 
> 
> and i've followed the knowledge centre link in terms of setup.
> 
> 
> 
> 
> 
> [
> https://www.ibm.com/support/knowledgecenter/en/STXKQY_4.2.3/com.ibm.spectrum.scale.v4r23.doc/bl1adv_setuppmbridgeforgrafana.htm
> |
> https://www.ibm.com/support/knowledgecenter/en/STXKQY_4.2.3/com.ibm.spectrum.scale.v4r23.doc/bl1adv_setuppmbridgeforgrafana.htm
> ]
> 
> 
> 
> 
> 
> However while everything appears to be working i'm not seeing any data coming
> through the reports within the grafana server, even though I can see data in
> the Scale GUI
> 
> 
> 
> 
> 
> The current environment:
> 
> 
> 
> 
> 
> [root at sc01n02 ~]# mmlscluster
> 
> 
> GPFS cluster information
> ========================
> GPFS cluster name: sc01.spectrum
> GPFS cluster id: 18085710661892594990
> GPFS UID domain: sc01.spectrum
> Remote shell command: /usr/bin/ssh
> Remote file copy command: /usr/bin/scp
> Repository type: CCR
> 
> 
> Node Daemon node name IP address Admin node name Designation
> ------------------------------------------------------------------
> 1 sc01n01 10.2.12.11 sc01n01 quorum-manager-perfmon
> 2 sc01n02 10.2.12.12 sc01n02 quorum-manager-perfmon
> 3 sc01n03 10.2.12.13 sc01n03 quorum-manager-perfmon
> 
> 
> [root at sc01n02 ~]#
> 
> 
> 
> 
> 
> 
> 
> 
> [root at sc01n02 ~]# mmlsconfig
> Configuration data for cluster sc01.spectrum:
> ---------------------------------------------
> clusterName sc01.spectrum
> clusterId 18085710661892594990
> autoload yes
> profile gpfsProtocolDefaults
> dmapiFileHandleSize 32
> minReleaseLevel 4.2.2.0
> ccrEnabled yes
> cipherList AUTHONLY
> maxblocksize 16M
> [cesNodes]
> maxMBpS 5000
> numaMemoryInterleave yes
> enforceFilesetQuotaOnRoot yes
> workerThreads 512
> [common]
> tscCmdPortRange 60000-61000
> cesSharedRoot /ibm/cesSharedRoot/ces
> cifsBypassTraversalChecking yes
> syncSambaMetadataOps yes
> cifsBypassShareLocksOnRename yes
> adminMode central
> 
> 
> File systems in cluster sc01.spectrum:
> --------------------------------------
> /dev/cesSharedRoot
> /dev/icos_demo
> /dev/scale01
> [root at sc01n02 ~]#
> 
> 
> 
> 
> 
> 
> 
> 
> [root at sc01n02 ~]# systemctl status pmcollector
> ? pmcollector.service - LSB: Start the ZIMon performance monitor collector.
> Loaded: loaded (/etc/rc.d/init.d/pmcollector)
> Active: active (running) since Tue 2017-05-30 08:46:32 AEST; 1 months 6 days ago
> Docs: man:systemd-sysv-generator(8)
> Main PID: 2693 (ZIMonCollector)
> CGroup: /system.slice/pmcollector.service
> ??2693 /opt/IBM/zimon/ZIMonCollector -C /opt/IBM/zimon/ZIMonCollector.cfg...
> ??2698 python /opt/IBM/zimon/bin/pmmonitor.py -f /opt/IBM/zimon/syshealth...
> 
> 
> May 30 08:46:32 sc01n02 systemd[1]: Starting LSB: Start the ZIMon performance
> mon......
> May 30 08:46:32 sc01n02 pmcollector[2584]: Starting performance monitor
> collector...
> May 30 08:46:32 sc01n02 systemd[1]: Started LSB: Start the ZIMon performance
> moni...r..
> Hint: Some lines were ellipsized, use -l to show in full.
> 
> 
> 
> 
> 
> From Grafana Server:
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> when I send a set of files to the cluster (3.8GB) I can see performance metrics
> within the Scale GUI
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> yet from the Grafana Dashboard im not seeing any data points
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> Can anyone provide some hints as to what might be happening?
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> Regards,
> 
> 
> 
> 
> 
> 
> 
> 
> Andrew Beattie
> 
> 
> Software Defined Storage - IT Specialist
> 
> 
> Phone: 614-2133-7927
> 
> 
> E-mail: [ mailto:abeattie at au1.ibm.com | abeattie at au1.ibm.com ]
> 
> 
> 
> 
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss

From jjdoherty at yahoo.com  Mon Jul 24 14:11:12 2017
From: jjdoherty at yahoo.com (Jim Doherty)
Date: Mon, 24 Jul 2017 13:11:12 +0000 (UTC)
Subject: [gpfsug-discuss] Gpfs Memory Usaage Keeps going up and we
	don't	know why.
References: <261384244.3866909.1500901872347.ref@mail.yahoo.com>
Message-ID: <261384244.3866909.1500901872347@mail.yahoo.com>

There are 3 places that the GPFS mmfsd uses memory? the pagepool? plus 2 shared memory segments.?? To see the memory utilization of the shared memory segments run the command?? mmfsadm dump malloc .??? The statistics for memory pool id 2 is where? maxFilesToCache/maxStatCache objects are? and the manager nodes use memory pool id 3 to track the MFTC/MSC objects.?? 

You might want to upgrade to later PTF? as there was a PTF to fix a memory leak that occurred in tscomm associated with network connection drops.?? 
 

    On Monday, July 24, 2017 5:29 AM, Peter Childs <p.childs at qmul.ac.uk> wrote:
 

 We have two GPFS clusters.

One is fairly old and running 4.2.1-2 and non CCR and the nodes run
fine using up about 1.5G of memory and is consistent (GPFS pagepool is
set to 1G, so that looks about right.)

The other one is "newer" running 4.2.1-3 with CCR and the nodes keep
increasing in there memory usage, starting at about 1.1G and are find
for a few days however after a while they grow to 4.2G which when the
node need to run real work, means the work can't be done.

I'm losing track of what maybe different other than CCR, and I'm trying
to find some more ideas of where to look.

I'm checked all the standard things like pagepool and maxFilesToCache
(set to the default of 4000), workerThreads is set to 128 on the new
gpfs cluster (against default 48 on the old) 

I'm not sure what else to look at on this one hence why I'm asking the
community.

Thanks in advance

Peter Childs
ITS Research Storage
Queen Mary University of London.
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20170724/f42c1ad9/attachment.htm>

From p.childs at qmul.ac.uk  Mon Jul 24 14:30:49 2017
From: p.childs at qmul.ac.uk (Peter Childs)
Date: Mon, 24 Jul 2017 13:30:49 +0000
Subject: [gpfsug-discuss] Gpfs Memory Usaage Keeps going up and we
	don't	know why.
In-Reply-To: <261384244.3866909.1500901872347@mail.yahoo.com>
References: <261384244.3866909.1500901872347.ref@mail.yahoo.com>
	<261384244.3866909.1500901872347@mail.yahoo.com>
Message-ID: <1500903047.571.7.camel@qmul.ac.uk>

I've had a look at mmfsadm dump malloc and it looks to agree with the output from mmdiag --memory. and does not seam to account for the excessive memory usage.

The new machines do have idleSocketTimout set to 0 from what your saying it could be related to keeping that many connections between nodes working.

Thanks in advance

Peter.


[root at dn29<mailto:root at dn29> ~]# mmdiag --memory

=== mmdiag: memory ===
mmfsd heap size: 2039808 bytes


Statistics for MemoryPool id 1 ("Shared Segment (EPHEMERAL)")
           128 bytes in use
   17500049370 hard limit on memory usage
       1048576 bytes committed to regions
             1 number of regions
           555 allocations
           555 frees
             0 allocation failures


Statistics for MemoryPool id 2 ("Shared Segment")
      42179592 bytes in use
   17500049370 hard limit on memory usage
      56623104 bytes committed to regions
             9 number of regions
        100027 allocations
         79624 frees
             0 allocation failures


Statistics for MemoryPool id 3 ("Token Manager")
       2099520 bytes in use
   17500049370 hard limit on memory usage
      16778240 bytes committed to regions
             1 number of regions
             4 allocations
             0 frees
             0 allocation failures


On Mon, 2017-07-24 at 13:11 +0000, Jim Doherty wrote:
There are 3 places that the GPFS mmfsd uses memory  the pagepool  plus 2 shared memory segments.   To see the memory utilization of the shared memory segments run the command   mmfsadm dump malloc .    The statistics for memory pool id 2 is where  maxFilesToCache/maxStatCache objects are  and the manager nodes use memory pool id 3 to track the MFTC/MSC objects.

You might want to upgrade to later PTF  as there was a PTF to fix a memory leak that occurred in tscomm associated with network connection drops.


On Monday, July 24, 2017 5:29 AM, Peter Childs <p.childs at qmul.ac.uk> wrote:


We have two GPFS clusters.

One is fairly old and running 4.2.1-2 and non CCR and the nodes run
fine using up about 1.5G of memory and is consistent (GPFS pagepool is
set to 1G, so that looks about right.)

The other one is "newer" running 4.2.1-3 with CCR and the nodes keep
increasing in there memory usage, starting at about 1.1G and are find
for a few days however after a while they grow to 4.2G which when the
node need to run real work, means the work can't be done.

I'm losing track of what maybe different other than CCR, and I'm trying
to find some more ideas of where to look.

I'm checked all the standard things like pagepool and maxFilesToCache
(set to the default of 4000), workerThreads is set to 128 on the new
gpfs cluster (against default 48 on the old)

I'm not sure what else to look at on this one hence why I'm asking the
community.

Thanks in advance

Peter Childs
ITS Research Storage
Queen Mary University of London.
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


--

Peter Childs
ITS Research Storage
Queen Mary, University of London

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20170724/589b740b/attachment.htm>

From jjdoherty at yahoo.com  Mon Jul 24 15:10:45 2017
From: jjdoherty at yahoo.com (Jim Doherty)
Date: Mon, 24 Jul 2017 14:10:45 +0000 (UTC)
Subject: [gpfsug-discuss] Gpfs Memory Usaage Keeps going up and
	we	don't	know why.
In-Reply-To: <1500903047.571.7.camel@qmul.ac.uk>
References: <261384244.3866909.1500901872347.ref@mail.yahoo.com>
	<261384244.3866909.1500901872347@mail.yahoo.com>
	<1500903047.571.7.camel@qmul.ac.uk>
Message-ID: <1770436429.3911327.1500905445052@mail.yahoo.com>

How are you identifying? the high memory usage???? 
 

    On Monday, July 24, 2017 9:30 AM, Peter Childs <p.childs at qmul.ac.uk> wrote:
 

 I've had a look at mmfsadm dump malloc and it looks to agree with the output from mmdiag --memory. and does not seam to account for the excessive memory usage.
The new machines do have idleSocketTimout set to 0 from what your saying it could be related to keeping that many connections between nodes working.
Thanks in advance
Peter.


[root at dn29 ~]# mmdiag --memory
=== mmdiag: memory ===mmfsd heap size: 2039808 bytes

Statistics for MemoryPool id 1 ("Shared Segment (EPHEMERAL)")???????????128 bytes in use???17500049370 hard limit on memory usage???????1048576 bytes committed to regions?????????????1 number of regions???????????555 allocations???????????555 frees?????????????0 allocation failures

Statistics for MemoryPool id 2 ("Shared Segment")??????42179592 bytes in use???17500049370 hard limit on memory usage??????56623104 bytes committed to regions?????????????9 number of regions????????100027 allocations?????????79624 frees?????????????0 allocation failures

Statistics for MemoryPool id 3 ("Token Manager")???????2099520 bytes in use???17500049370 hard limit on memory usage??????16778240 bytes committed to regions?????????????1 number of regions?????????????4 allocations?????????????0 frees?????????????0 allocation failures

On Mon, 2017-07-24 at 13:11 +0000, Jim Doherty wrote:
There are 3 places that the GPFS mmfsd uses memory? the pagepool? plus 2 shared memory segments.?? To see the memory utilization of the shared memory segments run the command?? mmfsadm dump malloc .??? The statistics for memory pool id 2 is where? maxFilesToCache/maxStatCache objects are? and the manager nodes use memory pool id 3 to track the MFTC/MSC objects.??

You might want to upgrade to later PTF? as there was a PTF to fix a memory leak that occurred in tscomm associated with network connection drops.??


On Monday, July 24, 2017 5:29 AM, Peter Childs <p.childs at qmul.ac.uk> wrote:


We have two GPFS clusters.

One is fairly old and running 4.2.1-2 and non CCR and the nodes run
fine using up about 1.5G of memory and is consistent (GPFS pagepool is
set to 1G, so that looks about right.)

The other one is "newer" running 4.2.1-3 with CCR and the nodes keep
increasing in there memory usage, starting at about 1.1G and are find
for a few days however after a while they grow to 4.2G which when the
node need to run real work, means the work can't be done.

I'm losing track of what maybe different other than CCR, and I'm trying
to find some more ideas of where to look.

I'm checked all the standard things like pagepool and maxFilesToCache
(set to the default of 4000), workerThreads is set to 128 on the new
gpfs cluster (against default 48 on the old) 

I'm not sure what else to look at on this one hence why I'm asking the
community.

Thanks in advance

Peter Childs
ITS Research Storage
Queen Mary University of London.
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss

-- 
Peter ChildsITS Research StorageQueen Mary, University of London
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20170724/adf8dae8/attachment.htm>

From p.childs at qmul.ac.uk  Mon Jul 24 15:21:27 2017
From: p.childs at qmul.ac.uk (Peter Childs)
Date: Mon, 24 Jul 2017 14:21:27 +0000
Subject: [gpfsug-discuss] Gpfs Memory Usaage Keeps going up and
	we	don't	know why.
In-Reply-To: <1770436429.3911327.1500905445052@mail.yahoo.com>
References: <261384244.3866909.1500901872347.ref@mail.yahoo.com>
	<261384244.3866909.1500901872347@mail.yahoo.com>
	<1500903047.571.7.camel@qmul.ac.uk>
	<1770436429.3911327.1500905445052@mail.yahoo.com>
Message-ID: <1500906086.571.9.camel@qmul.ac.uk>


top

but ps gives the same value.

[root at dn29<mailto:root at dn29> ~]# ps auww -q 4444
USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
root      4444  2.7 22.3 10537600 5472580 ?    S<Ll Jul12 466:13 /usr/lpp/mmfs/bin/mmfsd

Thanks for the help

Peter.


On Mon, 2017-07-24 at 14:10 +0000, Jim Doherty wrote:
How are you identifying  the high memory usage?


On Monday, July 24, 2017 9:30 AM, Peter Childs <p.childs at qmul.ac.uk> wrote:


I've had a look at mmfsadm dump malloc and it looks to agree with the output from mmdiag --memory. and does not seam to account for the excessive memory usage.

The new machines do have idleSocketTimout set to 0 from what your saying it could be related to keeping that many connections between nodes working.

Thanks in advance

Peter.


[root at dn29<mailto:root at dn29> ~]# mmdiag --memory

=== mmdiag: memory ===
mmfsd heap size: 2039808 bytes


Statistics for MemoryPool id 1 ("Shared Segment (EPHEMERAL)")
           128 bytes in use
   17500049370 hard limit on memory usage
       1048576 bytes committed to regions
             1 number of regions
           555 allocations
           555 frees
             0 allocation failures


Statistics for MemoryPool id 2 ("Shared Segment")
      42179592 bytes in use
   17500049370 hard limit on memory usage
      56623104 bytes committed to regions
             9 number of regions
        100027 allocations
         79624 frees
             0 allocation failures


Statistics for MemoryPool id 3 ("Token Manager")
       2099520 bytes in use
   17500049370 hard limit on memory usage
      16778240 bytes committed to regions
             1 number of regions
             4 allocations
             0 frees
             0 allocation failures


On Mon, 2017-07-24 at 13:11 +0000, Jim Doherty wrote:
There are 3 places that the GPFS mmfsd uses memory  the pagepool  plus 2 shared memory segments.   To see the memory utilization of the shared memory segments run the command   mmfsadm dump malloc .    The statistics for memory pool id 2 is where  maxFilesToCache/maxStatCache objects are  and the manager nodes use memory pool id 3 to track the MFTC/MSC objects.

You might want to upgrade to later PTF  as there was a PTF to fix a memory leak that occurred in tscomm associated with network connection drops.


On Monday, July 24, 2017 5:29 AM, Peter Childs <p.childs at qmul.ac.uk> wrote:


We have two GPFS clusters.

One is fairly old and running 4.2.1-2 and non CCR and the nodes run
fine using up about 1.5G of memory and is consistent (GPFS pagepool is
set to 1G, so that looks about right.)

The other one is "newer" running 4.2.1-3 with CCR and the nodes keep
increasing in there memory usage, starting at about 1.1G and are find
for a few days however after a while they grow to 4.2G which when the
node need to run real work, means the work can't be done.

I'm losing track of what maybe different other than CCR, and I'm trying
to find some more ideas of where to look.

I'm checked all the standard things like pagepool and maxFilesToCache
(set to the default of 4000), workerThreads is set to 128 on the new
gpfs cluster (against default 48 on the old)

I'm not sure what else to look at on this one hence why I'm asking the
community.

Thanks in advance

Peter Childs
ITS Research Storage
Queen Mary University of London.
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


--

Peter Childs
ITS Research Storage
Queen Mary, University of London

_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


--

Peter Childs
ITS Research Storage
Queen Mary, University of London

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20170724/4aab8c55/attachment.htm>

From adam.huffman at crick.ac.uk  Mon Jul 24 15:40:51 2017
From: adam.huffman at crick.ac.uk (Adam Huffman)
Date: Mon, 24 Jul 2017 14:40:51 +0000
Subject: [gpfsug-discuss] Gpfs Memory Usaage Keeps going up
	and	we	don't	know why.
In-Reply-To: <1500906086.571.9.camel@qmul.ac.uk>
References: <261384244.3866909.1500901872347.ref@mail.yahoo.com>
	<261384244.3866909.1500901872347@mail.yahoo.com>
	<1500903047.571.7.camel@qmul.ac.uk>
	<1770436429.3911327.1500905445052@mail.yahoo.com>
	<1500906086.571.9.camel@qmul.ac.uk>
Message-ID: <1CC632F0-55DB-4185-8177-B814A2F8A874@crick.ac.uk>

smem is recommended here

Cheers,
Adam

--

Adam Huffman
Senior HPC and Cloud Systems Engineer
The Francis Crick Institute
1 Midland Road
London NW1 1AT

T: 020 3796 1175
E: adam.huffman at crick.ac.uk<mailto:adam.huffman at crick.ac.uk>
W: www.crick.ac.uk<http://www.crick.ac.uk>


On 24 Jul 2017, at 15:21, Peter Childs <p.childs at qmul.ac.uk<mailto:p.childs at qmul.ac.uk>> wrote:


top

but ps gives the same value.

[root at dn29<mailto:root at dn29> ~]# ps auww -q 4444
USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
root      4444  2.7 22.3 10537600 5472580 ?    S<Ll Jul12 466:13 /usr/lpp/mmfs/bin/mmfsd

Thanks for the help

Peter.


On Mon, 2017-07-24 at 14:10 +0000, Jim Doherty wrote:
How are you identifying  the high memory usage?


On Monday, July 24, 2017 9:30 AM, Peter Childs <p.childs at qmul.ac.uk<mailto:p.childs at qmul.ac.uk>> wrote:


I've had a look at mmfsadm dump malloc and it looks to agree with the output from mmdiag --memory. and does not seam to account for the excessive memory usage.

The new machines do have idleSocketTimout set to 0 from what your saying it could be related to keeping that many connections between nodes working.

Thanks in advance

Peter.


[root at dn29<mailto:root at dn29> ~]# mmdiag --memory

=== mmdiag: memory ===
mmfsd heap size: 2039808 bytes


Statistics for MemoryPool id 1 ("Shared Segment (EPHEMERAL)")
           128 bytes in use
   17500049370 hard limit on memory usage
       1048576 bytes committed to regions
             1 number of regions
           555 allocations
           555 frees
             0 allocation failures


Statistics for MemoryPool id 2 ("Shared Segment")
      42179592 bytes in use
   17500049370 hard limit on memory usage
      56623104 bytes committed to regions
             9 number of regions
        100027 allocations
         79624 frees
             0 allocation failures


Statistics for MemoryPool id 3 ("Token Manager")
       2099520 bytes in use
   17500049370 hard limit on memory usage
      16778240 bytes committed to regions
             1 number of regions
             4 allocations
             0 frees
             0 allocation failures


On Mon, 2017-07-24 at 13:11 +0000, Jim Doherty wrote:
There are 3 places that the GPFS mmfsd uses memory  the pagepool  plus 2 shared memory segments.   To see the memory utilization of the shared memory segments run the command   mmfsadm dump malloc .    The statistics for memory pool id 2 is where  maxFilesToCache/maxStatCache objects are  and the manager nodes use memory pool id 3 to track the MFTC/MSC objects.

You might want to upgrade to later PTF  as there was a PTF to fix a memory leak that occurred in tscomm associated with network connection drops.


On Monday, July 24, 2017 5:29 AM, Peter Childs <p.childs at qmul.ac.uk<mailto:p.childs at qmul.ac.uk>> wrote:


We have two GPFS clusters.

One is fairly old and running 4.2.1-2 and non CCR and the nodes run
fine using up about 1.5G of memory and is consistent (GPFS pagepool is
set to 1G, so that looks about right.)

The other one is "newer" running 4.2.1-3 with CCR and the nodes keep
increasing in there memory usage, starting at about 1.1G and are find
for a few days however after a while they grow to 4.2G which when the
node need to run real work, means the work can't be done.

I'm losing track of what maybe different other than CCR, and I'm trying
to find some more ideas of where to look.

I'm checked all the standard things like pagepool and maxFilesToCache
(set to the default of 4000), workerThreads is set to 128 on the new
gpfs cluster (against default 48 on the old)

I'm not sure what else to look at on this one hence why I'm asking the
community.

Thanks in advance

Peter Childs
ITS Research Storage
Queen Mary University of London.
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org<http://spectrumscale.org>
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org<http://spectrumscale.org>
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


--

Peter Childs
ITS Research Storage
Queen Mary, University of London

_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org<http://spectrumscale.org>
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org<http://spectrumscale.org>
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


--

Peter Childs
ITS Research Storage
Queen Mary, University of London

_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org<http://spectrumscale.org>
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


The Francis Crick Institute Limited is a registered charity in England and Wales no. 1140062 and a company registered in England and Wales no. 06885462, with its registered office at 1 Midland Road London NW1 1AT
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20170724/f9b734fa/attachment.htm>

From jamiedavis at us.ibm.com  Mon Jul 24 15:45:26 2017
From: jamiedavis at us.ibm.com (James Davis)
Date: Mon, 24 Jul 2017 14:45:26 +0000
Subject: [gpfsug-discuss] GPFS, LTFS/EE and data-in-inode?
In-Reply-To: <33069.1500675853@turing-police.cc.vt.edu>
References: <33069.1500675853@turing-police.cc.vt.edu>,
	<28986.1500671597@turing-police.cc.vt.edu><CALssuR36fdFdpus2Js=TzfiH5oNwMYkYjN71gL=OzzoXq3z0Vg@mail.gmail.com>
Message-ID: <OF5380CE3B.211C61F7-ON00258167.00509905-00258167.005110B6@notes.na.collabserv.com>

An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20170724/ccddafdf/attachment.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: attlisjw.dat
Type: application/octet-stream
Size: 497 bytes
Desc: not available
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20170724/ccddafdf/attachment.obj>

From r.sobey at imperial.ac.uk  Mon Jul 24 15:50:57 2017
From: r.sobey at imperial.ac.uk (Sobey, Richard A)
Date: Mon, 24 Jul 2017 14:50:57 +0000
Subject: [gpfsug-discuss] GPFS, LTFS/EE and data-in-inode?
In-Reply-To: <OF5380CE3B.211C61F7-ON00258167.00509905-00258167.005110B6@notes.na.collabserv.com>
References: <33069.1500675853@turing-police.cc.vt.edu>,
	<28986.1500671597@turing-police.cc.vt.edu><CALssuR36fdFdpus2Js=TzfiH5oNwMYkYjN71gL=OzzoXq3z0Vg@mail.gmail.com>
	<OF5380CE3B.211C61F7-ON00258167.00509905-00258167.005110B6@notes.na.collabserv.com>
Message-ID: <HE1PR0602MB3225CF5246844A1F8C735B76DFBB0@HE1PR0602MB3225.eurprd06.prod.outlook.com>

I suppose the distinction between data, metadata and data IN metadata could be made. Whilst it is clear to me (us) now, perhaps the thought was that the data would be encrypted even if it was stored inside the metadata.

My two pence.

Richard

From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of James Davis
Sent: 24 July 2017 15:45
To: gpfsug-discuss at spectrumscale.org
Subject: Re: [gpfsug-discuss] GPFS, LTFS/EE and data-in-inode?

Hey all,

On the documentation of encryption restrictions and encryption/HAWC interplay...

The encryption documentation currently states:

"Secure storage uses encryption to make data unreadable to anyone who does not possess the necessary encryption keys...Only data, not metadata, is encrypted."

The HAWC restrictions include:

"Encrypted data is never stored in the recovery log..."

If this is unclear, I'm open to suggestions for improvements.

Cordially,

Jamie

----- Original message -----
From: valdis.kletnieks at vt.edu<mailto:valdis.kletnieks at vt.edu>
Sent by: gpfsug-discuss-bounces at spectrumscale.org<mailto:gpfsug-discuss-bounces at spectrumscale.org>
To: gpfsug main discussion list <gpfsug-discuss at spectrumscale.org<mailto:gpfsug-discuss at spectrumscale.org>>
Cc:
Subject: Re: [gpfsug-discuss] GPFS, LTFS/EE and data-in-inode?
Date: Fri, Jul 21, 2017 6:24 PM

On Fri, 21 Jul 2017 22:04:32 -0000, Sven Oehme said:
> i talked with a few others to confirm this, but unfortunate this is a
> limitation of the code today (maybe not well documented which we will look
> into). Encryption only encrypts data blocks, it doesn't encrypt metadata.
>  Hence, if encryption is enabled, we don't store data in the inode, because
> then it wouldn't be encrypted.  For the same reason HAWC and encryption are
> incompatible.

I can live with that restriction if it's documented better, thanks...


[Document Icon]attq4saq.dat<https://mail.notes.na.collabserv.com/livemail/0/82a99bcc9635f22a6009b956b15655c7/Body/M1.2/attq4saq.dat?OpenElement>

Type: application/pgp-signature
Name: attq4saq.dat

_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20170724/a7ed5d9a/attachment.htm>

From jonathan at buzzard.me.uk  Mon Jul 24 15:57:13 2017
From: jonathan at buzzard.me.uk (Jonathan Buzzard)
Date: Mon, 24 Jul 2017 15:57:13 +0100
Subject: [gpfsug-discuss] GPFS, LTFS/EE and data-in-inode?
In-Reply-To: <OF5380CE3B.211C61F7-ON00258167.00509905-00258167.005110B6@notes.na.collabserv.com>
References: <33069.1500675853@turing-police.cc.vt.edu>
	, <28986.1500671597@turing-police.cc.vt.edu>
	<CALssuR36fdFdpus2Js=TzfiH5oNwMYkYjN71gL=OzzoXq3z0Vg@mail.gmail.com>
	<OF5380CE3B.211C61F7-ON00258167.00509905-00258167.005110B6@notes.na.collabserv.com>
Message-ID: <1500908233.4387.194.camel@buzzard.me.uk>

On Mon, 2017-07-24 at 14:45 +0000, James Davis wrote:
> Hey all,
>  
> On the documentation of encryption restrictions and encryption/HAWC
> interplay...
>  
> The encryption documentation currently states:
>  
> "Secure storage uses encryption to make data unreadable to anyone who
> does not possess the necessary encryption keys...Only data, not
> metadata, is encrypted."
>  
> The HAWC restrictions include:
>  
> "Encrypted data is never stored in the recovery log..."
>  
> If this is unclear, I'm open to suggestions for improvements.
>  

Just because *DATA* is stored in the metadata does not make it magically
metadata. It's still data so you could quite reasonably conclude that it
is encrypted.

We have now been disabused of this, but the documentation is not clear
and needs clarifying. Perhaps say metadata blocks are not encrypted. Or
just a simple data stored in inodes is not encrypted would suffice.

JAB.

-- 
Jonathan A. Buzzard                 Email: jonathan (at) buzzard.me.uk
Fife, United Kingdom.


From valdis.kletnieks at vt.edu  Mon Jul 24 16:49:07 2017
From: valdis.kletnieks at vt.edu (valdis.kletnieks at vt.edu)
Date: Mon, 24 Jul 2017 11:49:07 -0400
Subject: [gpfsug-discuss] GPFS, LTFS/EE and data-in-inode?
In-Reply-To: <1500896590.4387.167.camel@buzzard.me.uk>
References: <28986.1500671597@turing-police.cc.vt.edu>
	<1500896590.4387.167.camel@buzzard.me.uk>
Message-ID: <17702.1500911347@turing-police.cc.vt.edu>

On Mon, 24 Jul 2017 12:43:10 +0100, Jonathan Buzzard said:

> For an archive service how about only accepting files in actual
> "archive" formats and then severely restricting the number of files a
> user can have?
>
> By archive files I am thinking like a .zip, tar.gz, tar.bz or similar.

After having dealt with users who fill up disk storage for almost 4 decades
now, I'm fully aware of those advantages. :)

( /me ponders when an IBM 2314 disk pack with 27M of space was "a lot" in 1978,
and when we moved 2 IBM mainframes in 1989, 400G took 2,500+ square feet, and
now 8T drives are all over the place...)

On the flip side, my current project is migrating 5 petabytes of data from our
old archive system that didn't have such rules (mostly due to politics and the
fact that the underlying XFS filesystem uses a 4K blocksize so it wasn't as big
an issue), so I'm stuck with what people put in there years ago.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 486 bytes
Desc: not available
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20170724/bfe585c4/attachment.sig>

From Mark.Bush at siriuscom.com  Mon Jul 24 16:49:26 2017
From: Mark.Bush at siriuscom.com (Mark Bush)
Date: Mon, 24 Jul 2017 15:49:26 +0000
Subject: [gpfsug-discuss] export nfs share on gpfs with no authentication
In-Reply-To: <CAJUuSvFSmmgEzgT+ku_kHZhLAHUVefGTg1RqR3uZz1P2snr1vA@mail.gmail.com>
References: <CAJUuSvFSmmgEzgT+ku_kHZhLAHUVefGTg1RqR3uZz1P2snr1vA@mail.gmail.com>
Message-ID: <BN6PR05MB35398B92BE38AA820D282BECE4BB0@BN6PR05MB3539.namprd05.prod.outlook.com>

Ilan, you must create some type of authentication mechanism for CES to work properly first.  If you want a quick and dirty way that would just use your local /etc/passwd try this.

/usr/lpp/mmfs/bin/mmuserauth service create --data-access-method file --type userdefined

Mark

-----Original Message-----
From: Ilan Schwarts [mailto:ilan84 at gmail.com]
Sent: Monday, July 24, 2017 5:37 AM
To: gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
Subject: [gpfsug-discuss] export nfs share on gpfs with no authentication

Hi,
I have gpfs with 2 Nodes (redhat).
I am trying to create NFS share - So I would be able to mount and access it from another linux machine.

I receive error: Current authentication: none is invalid.
What do i need to configure ?
PLEASE NOTE: I dont have the SMB package at the moment, I dont want authentication on the NFS export..

While trying to create NFS (I execute the following):
[root at LH20-GPFS1 ~]# mmnfs export add /fs_gpfs01 -c "* Access_Type=RW,Protocols=3:4,Squash=no_root_squash)"

I receive the following error:
[root at LH20-GPFS1 ~]# mmnfs export add /fs_gpfs01 -c "*(Access_Type=RW,Protocols=3:4,Squash=no_root_squash)"
mmcesfuncs.sh: Current authentication: none is invalid.
This operation can not be completed without correct Authentication configuration.
Configure authentication using:   mmuserauth
mmnfs export add: Command failed. Examine previous error messages to determine cause.


[root at LH20-GPFS1 ~]# mmuserauth service list FILE access not configured
PARAMETERS               VALUES
-------------------------------------------------

OBJECT access not configured
PARAMETERS               VALUES
-------------------------------------------------
[root at LH20-GPFS1 ~]#


Some additional information on cluster:
==============================
[root at LH20-GPFS1 ~]# mmlsmgr
file system      manager node
---------------- ------------------
fs_gpfs01        10.10.158.61 (LH20-GPFS1)
Cluster manager node: 10.10.158.61 (LH20-GPFS1)
[root at LH20-GPFS1 ~]# mmgetstate -a
 Node number  Node name        GPFS state
------------------------------------------
       1      LH20-GPFS1       active
       3      LH20-GPFS2       active
[root at LH20-GPFS1 ~]# mmlscluster

GPFS cluster information
========================
  GPFS cluster name:         LH20-GPFS1
  GPFS cluster id:           10777108240438931454
  GPFS UID domain:           LH20-GPFS1
  Remote shell command:      /usr/bin/ssh
  Remote file copy command:  /usr/bin/scp
  Repository type:           CCR

 Node  Daemon node name  IP address    Admin node name  Designation
--------------------------------------------------------------------
   1   LH20-GPFS1        10.10.158.61  LH20-GPFS1       quorum-manager
   3   LH20-GPFS2        10.10.158.62  LH20-GPFS2       quorum
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss
This message (including any attachments) is intended only for the use of the individual or entity to which it is addressed and may contain information that is non-public, proprietary, privileged, confidential, and exempt from disclosure under applicable law. If you are not the intended recipient, you are hereby notified that any use, dissemination, distribution, or copying of this communication is strictly prohibited. This message may be viewed by parties at Sirius Computer Solutions other than those named in the message header. This message does not contain an official representation of Sirius Computer Solutions. If you have received this communication in error, notify Sirius Computer Solutions immediately and (i) destroy this message if a facsimile or (ii) delete this message immediately if this is an electronic communication. Thank you.

Sirius Computer Solutions<http://www.siriuscom.com>


From valdis.kletnieks at vt.edu  Mon Jul 24 17:35:34 2017
From: valdis.kletnieks at vt.edu (valdis.kletnieks at vt.edu)
Date: Mon, 24 Jul 2017 12:35:34 -0400
Subject: [gpfsug-discuss] export nfs share on gpfs with no authentication
In-Reply-To: <CAJUuSvFSmmgEzgT+ku_kHZhLAHUVefGTg1RqR3uZz1P2snr1vA@mail.gmail.com>
References: <CAJUuSvFSmmgEzgT+ku_kHZhLAHUVefGTg1RqR3uZz1P2snr1vA@mail.gmail.com>
Message-ID: <27469.1500914134@turing-police.cc.vt.edu>

On Mon, 24 Jul 2017 13:36:41 +0300, Ilan Schwarts said:
> Hi,
> I have gpfs with 2 Nodes (redhat).
> I am trying to create NFS share - So I would be able to mount and
> access it from another linux machine.

> While trying to create NFS (I execute the following):
> [root at LH20-GPFS1 ~]# mmnfs export add /fs_gpfs01 -c "*
> Access_Type=RW,Protocols=3:4,Squash=no_root_squash)"

You can get away with little to no authentication for NFSv3, but
not for NFSv4.  Try with Protocols=3 only and

mmuserauth service create --type userdefined

that should get you Unix-y NFSv3 UID/GID support and "trust what the NFS
client tells you".  This of course only works sanely if each NFS export is
only to a set of machines in the same administrative domain that manages their
UID/GIDs.  Exporting to two sets of machines that don't coordinate their
UID/GID space is, of course, where hilarity and hijinks ensue....
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 486 bytes
Desc: not available
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20170724/d4cbe87e/attachment.sig>

From luke.raimbach at googlemail.com  Mon Jul 24 23:23:03 2017
From: luke.raimbach at googlemail.com (Luke Raimbach)
Date: Mon, 24 Jul 2017 22:23:03 +0000
Subject: [gpfsug-discuss] Gpfs Memory Usaage Keeps going up and we don't
 know why.
In-Reply-To: <1CC632F0-55DB-4185-8177-B814A2F8A874@crick.ac.uk>
References: <261384244.3866909.1500901872347.ref@mail.yahoo.com>
	<261384244.3866909.1500901872347@mail.yahoo.com>
	<1500903047.571.7.camel@qmul.ac.uk>
	<1770436429.3911327.1500905445052@mail.yahoo.com>
	<1500906086.571.9.camel@qmul.ac.uk>
	<1CC632F0-55DB-4185-8177-B814A2F8A874@crick.ac.uk>
Message-ID: <CAAGb8Nveqq616fjg7bCfYFSdFeTMaGvQGs5Q-9JazpAfodewVQ@mail.gmail.com>

Switch of CCR and see what happens.

On Mon, 24 Jul 2017, 15:40 Adam Huffman, <adam.huffman at crick.ac.uk> wrote:

> smem is recommended here
>
> Cheers,
> Adam
>
> --
>
> Adam Huffman
> Senior HPC and Cloud Systems Engineer
> The Francis Crick Institute
> 1 Midland Road
> London NW1 1AT
>
> T: 020 3796 1175
> E: adam.huffman at crick.ac.uk
> W: www.crick.ac.uk
>
>
>
>
>
> On 24 Jul 2017, at 15:21, Peter Childs <p.childs at qmul.ac.uk> wrote:
>
>
> top
>
> but ps gives the same value.
>
> [root at dn29 ~]# ps auww -q 4444
> USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
> root      4444  2.7 22.3 10537600 5472580 ?    S<Ll Jul12 466:13
> /usr/lpp/mmfs/bin/mmfsd
>
> Thanks for the help
>
> Peter.
>
>
> On Mon, 2017-07-24 at 14:10 +0000, Jim Doherty wrote:
>
> How are you identifying  the high memory usage?
>
>
> On Monday, July 24, 2017 9:30 AM, Peter Childs <p.childs at qmul.ac.uk>
> wrote:
>
>
> I've had a look at mmfsadm dump malloc and it looks to agree with the
> output from mmdiag --memory. and does not seam to account for the excessive
> memory usage.
>
> The new machines do have idleSocketTimout set to 0 from what your saying
> it could be related to keeping that many connections between nodes working.
>
> Thanks in advance
>
> Peter.
>
>
>
>
> [root at dn29 ~]# mmdiag --memory
>
> === mmdiag: memory ===
> mmfsd heap size: 2039808 bytes
>
>
> Statistics for MemoryPool id 1 ("Shared Segment (EPHEMERAL)")
>            128 bytes in use
>    17500049370 hard limit on memory usage
>        1048576 bytes committed to regions
>              1 number of regions
>            555 allocations
>            555 frees
>              0 allocation failures
>
>
> Statistics for MemoryPool id 2 ("Shared Segment")
>       42179592 bytes in use
>    17500049370 hard limit on memory usage
>       56623104 bytes committed to regions
>              9 number of regions
>         100027 allocations
>          79624 frees
>              0 allocation failures
>
>
> Statistics for MemoryPool id 3 ("Token Manager")
>        2099520 bytes in use
>    17500049370 hard limit on memory usage
>       16778240 bytes committed to regions
>              1 number of regions
>              4 allocations
>              0 frees
>              0 allocation failures
>
>
> On Mon, 2017-07-24 at 13:11 +0000, Jim Doherty wrote:
>
> There are 3 places that the GPFS mmfsd uses memory  the pagepool  plus 2
> shared memory segments.   To see the memory utilization of the shared
> memory segments run the command   mmfsadm dump malloc .    The statistics
> for memory pool id 2 is where  maxFilesToCache/maxStatCache objects are
> and the manager nodes use memory pool id 3 to track the MFTC/MSC objects.
>
> You might want to upgrade to later PTF  as there was a PTF to fix a memory
> leak that occurred in tscomm associated with network connection drops.
>
>
> On Monday, July 24, 2017 5:29 AM, Peter Childs <p.childs at qmul.ac.uk>
> wrote:
>
>
> We have two GPFS clusters.
>
> One is fairly old and running 4.2.1-2 and non CCR and the nodes run
> fine using up about 1.5G of memory and is consistent (GPFS pagepool is
> set to 1G, so that looks about right.)
>
> The other one is "newer" running 4.2.1-3 with CCR and the nodes keep
> increasing in there memory usage, starting at about 1.1G and are find
> for a few days however after a while they grow to 4.2G which when the
> node need to run real work, means the work can't be done.
>
> I'm losing track of what maybe different other than CCR, and I'm trying
> to find some more ideas of where to look.
>
> I'm checked all the standard things like pagepool and maxFilesToCache
> (set to the default of 4000), workerThreads is set to 128 on the new
> gpfs cluster (against default 48 on the old)
>
> I'm not sure what else to look at on this one hence why I'm asking the
> community.
>
> Thanks in advance
>
> Peter Childs
> ITS Research Storage
> Queen Mary University of London.
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>
>
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.orghttp://gpfsug.org/mailman/listinfo/gpfsug-discuss
>
> --
>
> Peter Childs
> ITS Research Storage
> Queen Mary, University of London
>
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>
>
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.orghttp://gpfsug.org/mailman/listinfo/gpfsug-discuss
>
> --
>
> Peter Childs
> ITS Research Storage
> Queen Mary, University of London
>
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>
>
> The Francis Crick Institute Limited is a registered charity in England and
> Wales no. 1140062 and a company registered in England and Wales no.
> 06885462, with its registered office at 1 Midland Road London NW1 1AT
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20170724/6af94da1/attachment.htm>

From ilan84 at gmail.com  Tue Jul 25 05:52:11 2017
From: ilan84 at gmail.com (Ilan Schwarts)
Date: Tue, 25 Jul 2017 07:52:11 +0300
Subject: [gpfsug-discuss] export nfs share on gpfs with no authentication
In-Reply-To: <27469.1500914134@turing-police.cc.vt.edu>
References: <CAJUuSvFSmmgEzgT+ku_kHZhLAHUVefGTg1RqR3uZz1P2snr1vA@mail.gmail.com>
	<27469.1500914134@turing-police.cc.vt.edu>
Message-ID: <CAJUuSvF8+vBZzzgvSnLS8oXJMMaR98wNAECUPdQi+TQh0rdaiQ@mail.gmail.com>

Hi,

While trying to add the userdefined auth, I receive error that SMB
service not enabled.
I am currently working on a spectrum scale cluster, and i dont have
the SMB package, I am waiting for it.. is there a way to export NFSv3
using the spectrum scale tools without SMB package ?
[root at LH20-GPFS1 ~]# mmuserauth service create --type userdefined
: SMB service not enabled. Enable SMB service first.
mmcesuserauthcrservice: Command failed. Examine previous error
messages to determine cause.


I exported the NFS via /etc/exports and than ./exportfs -a .. It works
fine, I was able to mount the gpfs export from another machine.. this
was my work-around since the spectrum scale tools failed to export
NFSv3

On Mon, Jul 24, 2017 at 7:35 PM,  <valdis.kletnieks at vt.edu> wrote:
> On Mon, 24 Jul 2017 13:36:41 +0300, Ilan Schwarts said:
>> Hi,
>> I have gpfs with 2 Nodes (redhat).
>> I am trying to create NFS share - So I would be able to mount and
>> access it from another linux machine.
>
>> While trying to create NFS (I execute the following):
>> [root at LH20-GPFS1 ~]# mmnfs export add /fs_gpfs01 -c "*
>> Access_Type=RW,Protocols=3:4,Squash=no_root_squash)"
>
> You can get away with little to no authentication for NFSv3, but
> not for NFSv4.  Try with Protocols=3 only and
>
> mmuserauth service create --type userdefined
>
> that should get you Unix-y NFSv3 UID/GID support and "trust what the NFS
> client tells you".  This of course only works sanely if each NFS export is
> only to a set of machines in the same administrative domain that manages their
> UID/GIDs.  Exporting to two sets of machines that don't coordinate their
> UID/GID space is, of course, where hilarity and hijinks ensue....
>
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>


-- 


-
Ilan Schwarts


From ulmer at ulmer.org  Tue Jul 25 06:33:13 2017
From: ulmer at ulmer.org (Stephen Ulmer)
Date: Tue, 25 Jul 2017 01:33:13 -0400
Subject: [gpfsug-discuss] GPFS, LTFS/EE and data-in-inode?
In-Reply-To: <1500908233.4387.194.camel@buzzard.me.uk>
References: <33069.1500675853@turing-police.cc.vt.edu>
	<28986.1500671597@turing-police.cc.vt.edu>
	<CALssuR36fdFdpus2Js=TzfiH5oNwMYkYjN71gL=OzzoXq3z0Vg@mail.gmail.com>
	<OF5380CE3B.211C61F7-ON00258167.00509905-00258167.005110B6@notes.na.collabserv.com>
	<1500908233.4387.194.camel@buzzard.me.uk>
Message-ID: <1233C5A4-A8C9-4A56-AEC3-AE65DBB5D346@ulmer.org>


> On Jul 24, 2017, at 10:57 AM, Jonathan Buzzard <jonathan at buzzard.me.uk <mailto:jonathan at buzzard.me.uk>> wrote:
> 
> On Mon, 2017-07-24 at 14:45 +0000, James Davis wrote:
>> Hey all,
>> 
>> On the documentation of encryption restrictions and encryption/HAWC
>> interplay...
>> 
>> The encryption documentation currently states:
>> 
>> "Secure storage uses encryption to make data unreadable to anyone who
>> does not possess the necessary encryption keys...Only data, not
>> metadata, is encrypted."
>> 
>> The HAWC restrictions include:
>> 
>> "Encrypted data is never stored in the recovery log..."
>> 
>> If this is unclear, I'm open to suggestions for improvements.
>> 
> 
> Just because *DATA* is stored in the metadata does not make it magically
> metadata. It's still data so you could quite reasonably conclude that it
> is encrypted.
> 

[?]

> JAB.

+1.

Also, "Encrypted data is never stored in the recovery log?" does not make it clear whether:
The data that is supposed to be encrypted is not written to the recovery log.
The data that is supposed to be encrypted is written to the recovery log, but is not encrypted there.

Thanks,

-- 
Stephen


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20170725/92184b8f/attachment.htm>

From jonathan at buzzard.me.uk  Tue Jul 25 10:02:14 2017
From: jonathan at buzzard.me.uk (Jonathan Buzzard)
Date: Tue, 25 Jul 2017 10:02:14 +0100
Subject: [gpfsug-discuss] GPFS, LTFS/EE and data-in-inode?
In-Reply-To: <17702.1500911347@turing-police.cc.vt.edu>
References: <28986.1500671597@turing-police.cc.vt.edu>
	<1500896590.4387.167.camel@buzzard.me.uk>
	<17702.1500911347@turing-police.cc.vt.edu>
Message-ID: <1500973334.4387.201.camel@buzzard.me.uk>

On Mon, 2017-07-24 at 11:49 -0400, valdis.kletnieks at vt.edu wrote:
> On Mon, 24 Jul 2017 12:43:10 +0100, Jonathan Buzzard said:
> 
> > For an archive service how about only accepting files in actual
> > "archive" formats and then severely restricting the number of files a
> > user can have?
> >
> > By archive files I am thinking like a .zip, tar.gz, tar.bz or similar.
> 
> After having dealt with users who fill up disk storage for almost 4 decades
> now, I'm fully aware of those advantages. :)
> 
> ( /me ponders when an IBM 2314 disk pack with 27M of space was "a lot" in 1978,
> and when we moved 2 IBM mainframes in 1989, 400G took 2,500+ square feet, and
> now 8T drives are all over the place...)
> 
> On the flip side, my current project is migrating 5 petabytes of data from our
> old archive system that didn't have such rules (mostly due to politics and the
> fact that the underlying XFS filesystem uses a 4K blocksize so it wasn't as big
> an issue), so I'm stuck with what people put in there years ago.

I would be tempted to zip up the directories and move them ziped ;-)

JAB.

-- 
Jonathan A. Buzzard                 Email: jonathan (at) buzzard.me.uk
Fife, United Kingdom.


From john.hearns at asml.com  Tue Jul 25 10:30:28 2017
From: john.hearns at asml.com (John Hearns)
Date: Tue, 25 Jul 2017 09:30:28 +0000
Subject: [gpfsug-discuss] GPFS, LTFS/EE and data-in-inode?
In-Reply-To: <1500973334.4387.201.camel@buzzard.me.uk>
References: <28986.1500671597@turing-police.cc.vt.edu>
	<1500896590.4387.167.camel@buzzard.me.uk>
	<17702.1500911347@turing-police.cc.vt.edu>
	<1500973334.4387.201.camel@buzzard.me.uk>
Message-ID: <HE1PR02MB1450F490C4AB2F91B838338188B80@HE1PR02MB1450.eurprd02.prod.outlook.com>

I agree with Jonathan.
In my experience, if you look at why there are many small files being stored by researchers, these are either the results of data acquisition - high speed cameras, microscopes, or in my experience a wind tunnel. Or the images are a sequence of images produced by a simulation which are later post-processed into a movie or Ensight/Paraview format. When questioned, the resaechers will always say "but I would like to keep this data available just in case". In reality those files are never looked at again. And as has been said if you have a tape based archiving system you could end up with thousands of small files being spread all over your tapes.  So it is legitimate to make zips / tars of directories like that.

I am intrigued to see that GPFS has a policy facility which can call an external program. That is useful.

-----Original Message-----
From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Jonathan Buzzard
Sent: Tuesday, July 25, 2017 11:02 AM
To: gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
Subject: Re: [gpfsug-discuss] GPFS, LTFS/EE and data-in-inode?

On Mon, 2017-07-24 at 11:49 -0400, valdis.kletnieks at vt.edu wrote:
> On Mon, 24 Jul 2017 12:43:10 +0100, Jonathan Buzzard said:
>
> > For an archive service how about only accepting files in actual
> > "archive" formats and then severely restricting the number of files
> > a user can have?
> >
> > By archive files I am thinking like a .zip, tar.gz, tar.bz or similar.
>
> After having dealt with users who fill up disk storage for almost 4
> decades now, I'm fully aware of those advantages. :)
>
> ( /me ponders when an IBM 2314 disk pack with 27M of space was "a lot"
> in 1978, and when we moved 2 IBM mainframes in 1989, 400G took 2,500+
> square feet, and now 8T drives are all over the place...)
>
> On the flip side, my current project is migrating 5 petabytes of data
> from our old archive system that didn't have such rules (mostly due to
> politics and the fact that the underlying XFS filesystem uses a 4K
> blocksize so it wasn't as big an issue), so I'm stuck with what people put in there years ago.

I would be tempted to zip up the directories and move them ziped ;-)

JAB.

--
Jonathan A. Buzzard                 Email: jonathan (at) buzzard.me.uk
Fife, United Kingdom.

_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
https://emea01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fgpfsug.org%2Fmailman%2Flistinfo%2Fgpfsug-discuss&data=01%7C01%7Cjohn.hearns%40asml.com%7Ce8a4016223414177bf9408d4d33bdb31%7Caf73baa8f5944eb2a39d93e96cad61fc%7C1&sdata=pean0PRBgJJmtbZ7TwO%2BxiSvhKsba%2FRGI9VUCxhp6kM%3D&reserved=0
-- The information contained in this communication and any attachments is confidential and may be privileged, and is for the sole use of the intended recipient(s). Any unauthorized review, use, disclosure or distribution is prohibited. Unless explicitly stated otherwise in the body of this communication or the attachment thereto (if any), the information is provided on an AS-IS basis without any express or implied warranties or liabilities. To the extent you are relying on this information, you are doing so at your own risk. If you are not the intended recipient, please notify the sender immediately by replying to this message and destroy all copies of this message and any attachments. Neither the sender nor the company/group of companies he or she represents shall be liable for the proper and complete transmission of the information contained in this communication, or for any delay in its receipt.


From jonathan at buzzard.me.uk  Tue Jul 25 12:22:49 2017
From: jonathan at buzzard.me.uk (Jonathan Buzzard)
Date: Tue, 25 Jul 2017 12:22:49 +0100
Subject: [gpfsug-discuss] GPFS, LTFS/EE and data-in-inode?
In-Reply-To: <HE1PR02MB1450F490C4AB2F91B838338188B80@HE1PR02MB1450.eurprd02.prod.outlook.com>
References: <28986.1500671597@turing-police.cc.vt.edu>
	<1500896590.4387.167.camel@buzzard.me.uk>
	<17702.1500911347@turing-police.cc.vt.edu>
	<1500973334.4387.201.camel@buzzard.me.uk>
	<HE1PR02MB1450F490C4AB2F91B838338188B80@HE1PR02MB1450.eurprd02.prod.outlook.com>
Message-ID: <1500981769.4387.222.camel@buzzard.me.uk>

On Tue, 2017-07-25 at 09:30 +0000, John Hearns wrote:
> I agree with Jonathan.
>
> In my experience, if you look at why there are many small files being
> stored by researchers, these are either the results of data acquisition
> - high speed cameras, microscopes, or in my experience a wind tunnel.
> Or the images are a sequence of images produced by a simulation which
> are later post-processed into a movie or Ensight/Paraview format. When
> questioned, the resaechers will always say "but I would like to keep
> this data available just in case". In reality those files are never
> looked at again. And as has been said if you have a tape based
> archiving system you could end up with thousands of small files being
> spread all over your tapes.  So it is legitimate to make zips / tars of
> directories like that.
> 

Note that rules on data retention may require them to keep them for 10
years, so it is not unreasonable. Letting them spew thousands of files
into an "archive" is not sensible.

I was thinking of ways of getting the users to do it, and I guess
leaving them with zero available file number quota in the new system
would force them to zip up their data so they could add new stuff ;-)
Archives in my view should have no quota on the space, only quota's on
the number of files.

Of course that might not be very popular.

On reflection I think I would use a policy to restrict to files ending
with .zip/.ZIP only. It's an archive and this format is effectively open
source, widely understood and cross platform, and with the ZIP64 version
will now stand the test of time too.

Given it's an archive I would have a script that ran around setting all
the files to immutable 7 days after creation too. Or maybe change the
ownership and set a readonly ACL to the original user. Need to stop them
changing stuff after the event if you are going to use to as part of
your anti research fraud measures.

JAB.

-- 
Jonathan A. Buzzard                 Email: jonathan (at) buzzard.me.uk
Fife, United Kingdom.


From valdis.kletnieks at vt.edu  Tue Jul 25 17:11:45 2017
From: valdis.kletnieks at vt.edu (valdis.kletnieks at vt.edu)
Date: Tue, 25 Jul 2017 12:11:45 -0400
Subject: [gpfsug-discuss] GPFS, LTFS/EE and data-in-inode?
In-Reply-To: <1500973334.4387.201.camel@buzzard.me.uk>
References: <28986.1500671597@turing-police.cc.vt.edu>
	<1500896590.4387.167.camel@buzzard.me.uk>
	<17702.1500911347@turing-police.cc.vt.edu>
	<1500973334.4387.201.camel@buzzard.me.uk>
Message-ID: <88035.1500999105@turing-police.cc.vt.edu>

On Tue, 25 Jul 2017 10:02:14 +0100, Jonathan Buzzard said:

> I would be tempted to zip up the directories and move them ziped ;-)

Not an option, unless you want to come here and re-write the researcher's
tracking systems that knows where they archived a given run, and teach it
"Except now it's in a .tar.gz in that directory, or perhaps one or two
directories higher up, under some name".  Yes, researchers do that.  And
as the joke goes:  "What's the difference between a tenured professor and
a terrorist?" "You can negotiate with a terrorist..."

Plus remember that most of these directories are currently scattered across
multiple tapes, which means "zip up a directory" may mean reading as many as 10
to 20 tapes just to get the directory on disk so you can zip it up.  As it is,
I had to write code that recall and processes all the files on tape 1,
*wherever they are in the file system*, free them from the source disk, recall
and process all the files on tape 2, repeat until tape 3,857. (And due to
funding issues 5 years ago which turned into a "who paid for what tapes" food
fight, most of the tapes ended up with files from entirely different file
systems on them, going into different filesets on the destination).

(And in fact, the migration is currently hosed up because a researcher *is*
doing pretty much that - recalling all the files from one directory, then
the next, then the next, to get files they need urgently for a deliverable
but haven't been moved to the new system.  So rather than having 12 LTO-5
drives to multistream the tape recalls, I've got 12 recalls fighting for one
drive while the researcher's processing is hogging the other 11, due to the
way the previous system prioritizes in-line opens of files versus bulk recalls)

-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 486 bytes
Desc: not available
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20170725/a95d7f73/attachment.sig>

From scbatche at us.ibm.com  Tue Jul 25 21:46:45 2017
From: scbatche at us.ibm.com (Scott C Batchelder)
Date: Tue, 25 Jul 2017 15:46:45 -0500
Subject: [gpfsug-discuss] Baseline testing GPFS with gpfsperf
Message-ID: <OF713CC828.357A3E53-ON86258168.00707F51-86258168.007224E4@notes.na.collabserv.com>

Hello: 

I am wondering if I can get some more information on the gpfsperf tool for 
baseline testing GPFS. I want to record GPFS read and write performance 
for a file system on the cluster before I enable DMAPI and configure the 
HSM interface.  The README for the tool does not offer much insight in how 
I should run this tool based on the cluster or file system settings.  The 
cluster that I will be running this tool on will not have MPI installed 
and will have multiple file systems in the cluster.  Are there some best 
practises for running this tool?  For example:
- Should the number of threads equal the number of NSDs for the file 
system? or equal to the number of nodes? 
- If I execute a large multi-threaded run of this tool from a single node 
in the cluster, will that give me an accurate result of the performance of 
the file system? 

Any feedback is appreciated. 

Thanks. 


Sincerely,

Scott Batchelder

Phone: 1-281-883-7926 
E-mail: scbatche at us.ibm.com


12301 Kurland Dr
Houston, TX 77034-4812
United States


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20170725/9a272467/attachment.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: image/gif
Size: 2022 bytes
Desc: not available
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20170725/9a272467/attachment.gif>

From valdis.kletnieks at vt.edu  Wed Jul 26 00:59:08 2017
From: valdis.kletnieks at vt.edu (valdis.kletnieks at vt.edu)
Date: Tue, 25 Jul 2017 19:59:08 -0400
Subject: [gpfsug-discuss] Baseline testing GPFS with gpfsperf
In-Reply-To: <OF713CC828.357A3E53-ON86258168.00707F51-86258168.007224E4@notes.na.collabserv.com>
References: <OF713CC828.357A3E53-ON86258168.00707F51-86258168.007224E4@notes.na.collabserv.com>
Message-ID: <13777.1501027148@turing-police.cc.vt.edu>

On Tue, 25 Jul 2017 15:46:45 -0500, "Scott C Batchelder" said:

> - Should the number of threads equal the number of NSDs for the file
> system? or equal to the number of nodes?

Depends on what definition of "throughput" you are interested in. If your
configuration has 50 clients banging on 5 NSD servers, your numbers for 5
threads and 50 threads are going to tell you subtly different things...

(Basically, one thread per NSD is going to tell you the maximum that
one client can expect to get with little to no contention, while one
per client will tell you about the maximum *aggregate* that all 50
can get together - which is probably still giving each individual client
less throughput than one-to-one....)

We usually test with "exactly one thread total", "one thread per server",
and "keep piling the clients on till the total number doesn't get any bigger".

Also be aware that it only gives you insight to your workload performance if
your workload is comprised of large file access - if your users are actually
doing a lot of medium or small files, that changes the results dramatically
as you end up possibly pounding on metadata more than the actual data....
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 486 bytes
Desc: not available
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20170725/3770187c/attachment.sig>

From varun.mittal at in.ibm.com  Wed Jul 26 04:42:27 2017
From: varun.mittal at in.ibm.com (Varun Mittal3)
Date: Wed, 26 Jul 2017 09:12:27 +0530
Subject: [gpfsug-discuss] export nfs share on gpfs with no authentication
In-Reply-To: <CAJUuSvF8+vBZzzgvSnLS8oXJMMaR98wNAECUPdQi+TQh0rdaiQ@mail.gmail.com>
References: <CAJUuSvFSmmgEzgT+ku_kHZhLAHUVefGTg1RqR3uZz1P2snr1vA@mail.gmail.com><27469.1500914134@turing-police.cc.vt.edu>
	<CAJUuSvF8+vBZzzgvSnLS8oXJMMaR98wNAECUPdQi+TQh0rdaiQ@mail.gmail.com>
Message-ID: <OF5797C6D4.70719532-ON00258169.001430CD-65258169.00145DEB@notes.na.collabserv.com>


Hi

Did you try to run this command from a CES designated node ?

If no, then try executing the command from a CES node:
	mmuserauth service create --data-access-type file --type userdefined

Best regards,
Varun Mittal
Cloud/Object Scrum @ Spectrum Scale
ETZ, Pune


From:	Ilan Schwarts <ilan84 at gmail.com>
To:	gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
Date:	25/07/2017 10:22 AM
Subject:	Re: [gpfsug-discuss] export nfs share on gpfs with no
            authentication
Sent by:	gpfsug-discuss-bounces at spectrumscale.org


Hi,

While trying to add the userdefined auth, I receive error that SMB
service not enabled.
I am currently working on a spectrum scale cluster, and i dont have
the SMB package, I am waiting for it.. is there a way to export NFSv3
using the spectrum scale tools without SMB package ?
[root at LH20-GPFS1 ~]# mmuserauth service create --type userdefined
: SMB service not enabled. Enable SMB service first.
mmcesuserauthcrservice: Command failed. Examine previous error
messages to determine cause.


I exported the NFS via /etc/exports and than ./exportfs -a .. It works
fine, I was able to mount the gpfs export from another machine.. this
was my work-around since the spectrum scale tools failed to export
NFSv3

On Mon, Jul 24, 2017 at 7:35 PM,  <valdis.kletnieks at vt.edu> wrote:
> On Mon, 24 Jul 2017 13:36:41 +0300, Ilan Schwarts said:
>> Hi,
>> I have gpfs with 2 Nodes (redhat).
>> I am trying to create NFS share - So I would be able to mount and
>> access it from another linux machine.
>
>> While trying to create NFS (I execute the following):
>> [root at LH20-GPFS1 ~]# mmnfs export add /fs_gpfs01 -c "*
>> Access_Type=RW,Protocols=3:4,Squash=no_root_squash)"
>
> You can get away with little to no authentication for NFSv3, but
> not for NFSv4.  Try with Protocols=3 only and
>
> mmuserauth service create --type userdefined
>
> that should get you Unix-y NFSv3 UID/GID support and "trust what the NFS
> client tells you".  This of course only works sanely if each NFS export
is
> only to a set of machines in the same administrative domain that manages
their
> UID/GIDs.  Exporting to two sets of machines that don't coordinate their
> UID/GID space is, of course, where hilarity and hijinks ensue....
>
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>


--


-
Ilan Schwarts
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20170726/80e41289/attachment.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: graycol.gif
Type: image/gif
Size: 105 bytes
Desc: not available
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20170726/80e41289/attachment.gif>

From varun.mittal at in.ibm.com  Wed Jul 26 04:44:24 2017
From: varun.mittal at in.ibm.com (Varun Mittal3)
Date: Wed, 26 Jul 2017 09:14:24 +0530
Subject: [gpfsug-discuss] export nfs share on gpfs with no authentication
In-Reply-To: <OF5797C6D4.70719532-ON00258169.001430CD-65258169.00145DEB@LocalDomain>
References: <CAJUuSvFSmmgEzgT+ku_kHZhLAHUVefGTg1RqR3uZz1P2snr1vA@mail.gmail.com><27469.1500914134@turing-police.cc.vt.edu>
	<CAJUuSvF8+vBZzzgvSnLS8oXJMMaR98wNAECUPdQi+TQh0rdaiQ@mail.gmail.com>
	<OF5797C6D4.70719532-ON00258169.001430CD-65258169.00145DEB@LocalDomain>
Message-ID: <OF43434044.8592BCFA-ON00258169.00147625-65258169.00148B6E@notes.na.collabserv.com>


Sorry a small typo:
	mmuserauth service create --data-access-method file --type
userdefined


Best regards,
Varun Mittal
Cloud/Object Scrum @ Spectrum Scale
ETZ, Pune


From:	Varun Mittal3/India/IBM
To:	gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
Date:	26/07/2017 09:12 AM
Subject:	Re: [gpfsug-discuss] export nfs share on gpfs with no
            authentication


Hi

Did you try to run this command from a CES designated node ?

If no, then try executing the command from a CES node:
	mmuserauth service create --data-access-type file --type userdefined

Best regards,
Varun Mittal
Cloud/Object Scrum @ Spectrum Scale
ETZ, Pune


From:	Ilan Schwarts <ilan84 at gmail.com>
To:	gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
Date:	25/07/2017 10:22 AM
Subject:	Re: [gpfsug-discuss] export nfs share on gpfs with no
            authentication
Sent by:	gpfsug-discuss-bounces at spectrumscale.org


Hi,

While trying to add the userdefined auth, I receive error that SMB
service not enabled.
I am currently working on a spectrum scale cluster, and i dont have
the SMB package, I am waiting for it.. is there a way to export NFSv3
using the spectrum scale tools without SMB package ?
[root at LH20-GPFS1 ~]# mmuserauth service create --type userdefined
: SMB service not enabled. Enable SMB service first.
mmcesuserauthcrservice: Command failed. Examine previous error
messages to determine cause.


I exported the NFS via /etc/exports and than ./exportfs -a .. It works
fine, I was able to mount the gpfs export from another machine.. this
was my work-around since the spectrum scale tools failed to export
NFSv3

On Mon, Jul 24, 2017 at 7:35 PM,  <valdis.kletnieks at vt.edu> wrote:
> On Mon, 24 Jul 2017 13:36:41 +0300, Ilan Schwarts said:
>> Hi,
>> I have gpfs with 2 Nodes (redhat).
>> I am trying to create NFS share - So I would be able to mount and
>> access it from another linux machine.
>
>> While trying to create NFS (I execute the following):
>> [root at LH20-GPFS1 ~]# mmnfs export add /fs_gpfs01 -c "*
>> Access_Type=RW,Protocols=3:4,Squash=no_root_squash)"
>
> You can get away with little to no authentication for NFSv3, but
> not for NFSv4.  Try with Protocols=3 only and
>
> mmuserauth service create --type userdefined
>
> that should get you Unix-y NFSv3 UID/GID support and "trust what the NFS
> client tells you".  This of course only works sanely if each NFS export
is
> only to a set of machines in the same administrative domain that manages
their
> UID/GIDs.  Exporting to two sets of machines that don't coordinate their
> UID/GID space is, of course, where hilarity and hijinks ensue....
>
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>


--


-
Ilan Schwarts
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20170726/f2cbf25c/attachment.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: graycol.gif
Type: image/gif
Size: 105 bytes
Desc: not available
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20170726/f2cbf25c/attachment.gif>

From Mark.Bush at siriuscom.com  Wed Jul 26 18:28:55 2017
From: Mark.Bush at siriuscom.com (Mark Bush)
Date: Wed, 26 Jul 2017 17:28:55 +0000
Subject: [gpfsug-discuss] Lost disks
Message-ID: <CY4PR05MB3541E5C4A616A2C9E817F072E4B90@CY4PR05MB3541.namprd05.prod.outlook.com>

I have a client has had an issue where all of the nsd disks disappeared in the cluster recently.  Not sure if it's due to a back end disk issue or if it's a reboot that did it.  But in their PMR they were told that all that data is lost now and that the disk headers didn't appear as GPFS disk headers.  How on earth could something like that happen?  Could it be a backend disk thing?  They are confident that nobody tried to reformat disks but aren't 100% sure that something at the disk array couldn't have caused this.

Is there an easy way to see if there is still data on these disks?
Short of a full restore from backup what other options might they have?

The mmlsnsd -X show's blanks for device and device type now.

# mmlsnsd -X

Disk name    NSD volume ID      Device         Devtype  Node name                Remarks
---------------------------------------------------------------------------------------------------
INGEST_FILEMGR_xis2301 0A23982E57FD995D   -              -        ingest-filemgr01.a.fXXXXXXX.net (not found) server node
INGEST_FILEMGR_xis2301 0A23982E57FD995D   -              -        ingest-filemgr02.a.fXXXXXXX.net (not found) server node
INGEST_FILEMGR_xis2302 0A23982E57FD9960   -              -        ingest-filemgr01.a.fXXXXXXX.net (not found) server node
INGEST_FILEMGR_xis2302 0A23982E57FD9960   -              -        ingest-filemgr02.a.fXXXXXXX.net (not found) server node
INGEST_FILEMGR_xis2303 0A23982E57FD9962   -              -        ingest-filemgr01.a.fXXXXXXX.net (not found) server node


Mark

This message (including any attachments) is intended only for the use of the individual or entity to which it is addressed and may contain information that is non-public, proprietary, privileged, confidential, and exempt from disclosure under applicable law. If you are not the intended recipient, you are hereby notified that any use, dissemination, distribution, or copying of this communication is strictly prohibited. This message may be viewed by parties at Sirius Computer Solutions other than those named in the message header. This message does not contain an official representation of Sirius Computer Solutions. If you have received this communication in error, notify Sirius Computer Solutions immediately and (i) destroy this message if a facsimile or (ii) delete this message immediately if this is an electronic communication. Thank you.

Sirius Computer Solutions<http://www.siriuscom.com>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20170726/b39a638b/attachment.htm>

From kums at us.ibm.com  Wed Jul 26 18:37:45 2017
From: kums at us.ibm.com (Kumaran Rajaram)
Date: Wed, 26 Jul 2017 13:37:45 -0400
Subject: [gpfsug-discuss] Baseline testing GPFS with gpfsperf
In-Reply-To: <13777.1501027148@turing-police.cc.vt.edu>
References: <OF713CC828.357A3E53-ON86258168.00707F51-86258168.007224E4@notes.na.collabserv.com>
	<13777.1501027148@turing-police.cc.vt.edu>
Message-ID: <OF433E0253.71237587-ON00258169.005D06D3-85258169.0060D81F@notes.na.collabserv.com>

Hi Scott,

>>- Should the number of threads equal the number of NSDs for the file 
system? or equal to the number of nodes? 
>>- If I execute a large multi-threaded run of this tool from a single 
node in the cluster, will that give me an accurate result of the 
performance of the file system?  

To add to Valdis's note,  the answer to above also depends on the node, 
network used for GPFS communication between client and server, as well as 
storage performance capabilities constituting the GPFS 
cluster/network/storage stack. 

As an example, if the storage subsystem (including controller + disks) 
hosting the file-system can deliver ~20 GB/s and the networking between 
NSD client and server is FDR 56Gb/s Infiniband (with verbsRdma = ~6GB/s). 
Assuming, one FDR-IB link (verbsPorts) is configured per NSD server as 
well as client, then you could need minimum of 4 x NSD servers (4 x 6GB/s 
==> 24 GB/s) to saturate the backend storage.  So, you would need to run 
gpfsperf (or anyother parallel I/O benchmark) across minimum of 4 x GPFS 
NSD clients to saturate the backend storage.  You can scale the gpfsperf 
thread counts (-th parameter) depending on access pattern (buffered/dio 
etc) but this would only be able to drive load from single NSD client 
node. If you would like to drive I/O load from multiple NSD client nodes + 
synchronize the parallel runs across multiple nodes for accuracy, then 
gpfsperf-mpi would be strongly recommended. You would need to use MPI to 
launch gpfsperf-mpi across multiple NSD client nodes and scale the MPI 
processes (across NSD clients with 1 or more MPI process per NSD client) 
accordingly to drive the I/O load for good performance. 

>>The cluster that I will be running this tool on will not have MPI 
installed and will have multiple file systems in the cluster. 

Without MPI, alternative would be to use ssh or pdsh to launch gpfsperf 
across multiple nodes however if there are slow NSD clients then the 
performance may not be accurate (slow clients taking longer and after 
faster clients finished it will get all the network/storage resources 
skewing the performance analysis. You may also consider using parallel 
Iozone as it can be run across multiple node using rsh/ssh with 
combination of  "-+m" and "-t" option. 

http://iozone.org/docs/IOzone_msword_98.pdf

##
-+m filename 
Use this file to obtain the configuration informati
on of the clients for cluster testing. The file 
contains one line for each client. Each line has th
ree fields. The fields are space delimited. A # 
sign in column zero is a comment line. The first fi
eld is the name of the client. The second field is 
the path, on the client, for the working directory 
where Iozone will execute. The third field is the 
path, on the client, for the executable Iozone. 
To use this option one must be able to execute comm
ands on the clients without being challenged 
for a password. Iozone will start remote execution 
by using ?rsh"

To use ssh, export RSH=/usr/bin/ssh 

-t #
Run Iozone in a throughput mode. This option allows
 the user to specify how 
many threads or processes to have active during  th
e measurement.
##

Hope this helps,
-Kums


From:   valdis.kletnieks at vt.edu
To:     gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
Date:   07/25/2017 07:59 PM
Subject:        Re: [gpfsug-discuss] Baseline testing GPFS with gpfsperf
Sent by:        gpfsug-discuss-bounces at spectrumscale.org


On Tue, 25 Jul 2017 15:46:45 -0500, "Scott C Batchelder" said:

> - Should the number of threads equal the number of NSDs for the file
> system? or equal to the number of nodes?

Depends on what definition of "throughput" you are interested in. If your
configuration has 50 clients banging on 5 NSD servers, your numbers for 5
threads and 50 threads are going to tell you subtly different things...

(Basically, one thread per NSD is going to tell you the maximum that
one client can expect to get with little to no contention, while one
per client will tell you about the maximum *aggregate* that all 50
can get together - which is probably still giving each individual client
less throughput than one-to-one....)

We usually test with "exactly one thread total", "one thread per server",
and "keep piling the clients on till the total number doesn't get any 
bigger".

Also be aware that it only gives you insight to your workload performance 
if
your workload is comprised of large file access - if your users are 
actually
doing a lot of medium or small files, that changes the results 
dramatically
as you end up possibly pounding on metadata more than the actual data....
[attachment "att0twxd.dat" deleted by Kumaran Rajaram/Arlington/IBM] 
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20170726/5122f7ce/attachment.htm>

From Robert.Oesterlin at nuance.com  Wed Jul 26 18:45:35 2017
From: Robert.Oesterlin at nuance.com (Oesterlin, Robert)
Date: Wed, 26 Jul 2017 17:45:35 +0000
Subject: [gpfsug-discuss] Lost disks
Message-ID: <B5B1EFA5-D0E7-4001-88C4-47D8A966C0BE@nuance.com>

One way this could possible happen would be a system is being installed (I?m assuming this is Linux) and the FC adapter is active; then the OS install will see disks and wipe out the NSD descriptor on those disks. (Which is why the NSD V2 format was invented, to prevent this from happening) If you don?t lose all of the descriptors, it?s sometimes possible to manually re-construct the missing header information - I?m assuming since you opened a PMR, IBM has looked at this. This is a scenario I?ve had to recover from - twice. Back-end array issue seems unlikely to me, I?d keep looking at the systems with access to those LUNs and see what commands/operations could have been run.

Bob Oesterlin
Sr Principal Storage Engineer, Nuance


From: <gpfsug-discuss-bounces at spectrumscale.org> on behalf of Mark Bush <Mark.Bush at siriuscom.com>
Reply-To: gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
Date: Wednesday, July 26, 2017 at 12:29 PM
To: "gpfsug-discuss at spectrumscale.org" <gpfsug-discuss at spectrumscale.org>
Subject: [EXTERNAL] [gpfsug-discuss] Lost disks

I have a client has had an issue where all of the nsd disks disappeared in the cluster recently.  Not sure if it?s due to a back end disk issue or if it?s a reboot that did it.  But in their PMR they were told that all that data is lost now and that the disk headers didn?t appear as GPFS disk headers.  How on earth could something like that happen?  Could it be a backend disk thing?  They are confident that nobody tried to reformat disks but aren?t 100% sure that something at the disk array couldn?t have caused this.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20170726/f8fec8fd/attachment.htm>

From oehmes at gmail.com  Wed Jul 26 19:18:38 2017
From: oehmes at gmail.com (Sven Oehme)
Date: Wed, 26 Jul 2017 18:18:38 +0000
Subject: [gpfsug-discuss] Lost disks
In-Reply-To: <CY4PR05MB3541E5C4A616A2C9E817F072E4B90@CY4PR05MB3541.namprd05.prod.outlook.com>
References: <CY4PR05MB3541E5C4A616A2C9E817F072E4B90@CY4PR05MB3541.namprd05.prod.outlook.com>
Message-ID: <CALssuR1N-EKKu-oDPn=r6172dZvzFkD=poLfzZ3v5hK2nX0EpQ@mail.gmail.com>

it can happen for multiple reasons , one is a linux install, unfortunate
there are significant more simpler explanations. Linux as well as BIOS in
servers from time to time looks for empty disks and puts a GPT label on it
if the disk doesn't have one, etc. this thread is explaining a lot of this
:

https://www.ibm.com/developerworks/community/forums/html/topic?id=77777777-0000-0000-0000-000014439222

this is why we implemented NSD V2 format long time ago , unfortunate there
is no way to convert an V1 NSD to a V2 nsd on an existing filesytem except
you remove the NSDs one at a time and re-add them after you upgraded the
system to at least GPFS 4.1 (i would recommend a later version like 4.2.3)

some more details are here in this thread :

https://www.ibm.com/developerworks/community/forums/html/threadTopic?id=5c1ee5bc-41b8-4318-a74e-4d962f82ce2e

but a quick summary of the benefits of V2 are :

   - ? Support for GPT NSD ?
   - Adds a standard disk partition table (GPT type) to NSDs ?
      - Disk label support for Linux ?


   - New GPFS NSD v2 format provides the following benefits: ?
   - Includes a partition table so that the disk is recognized as a GPFS
      device ?
      - Adjusts data alignment to support disks with a 4 KB physical block
      size ?
      - Adds backup copies of some key GPFS data structures ?
      - Expands some reserved areas to allow for future growth

the main reason we can't convert from V1 to V2 is the on disk format
changed significant so we would have to move on disk data which is very
risky.

hope that explains this.

Sven

On Wed, Jul 26, 2017 at 10:29 AM Mark Bush <Mark.Bush at siriuscom.com> wrote:

> I have a client has had an issue where all of the nsd disks disappeared in
> the cluster recently.  Not sure if it?s due to a back end disk issue or if
> it?s a reboot that did it.  But in their PMR they were told that all that
> data is lost now and that the disk headers didn?t appear as GPFS disk
> headers.  How on earth could something like that happen?  Could it be a
> backend disk thing?  They are confident that nobody tried to reformat disks
> but aren?t 100% sure that something at the disk array couldn?t have caused
> this.
>
>
>
> Is there an easy way to see if there is still data on these disks?
>
> Short of a full restore from backup what other options might they have?
>
>
>
> The mmlsnsd -X show?s blanks for device and device type now.
>
>
>
> # mmlsnsd -X
>
>
>
> Disk name    NSD volume ID      Device         Devtype  Node
> name                Remarks
>
>
> ---------------------------------------------------------------------------------------------------
>
> INGEST_FILEMGR_xis2301 0A23982E57FD995D   -              -
> ingest-filemgr01.a.fXXXXXXX.net (not found) server node
>
> INGEST_FILEMGR_xis2301 0A23982E57FD995D   -              -
> ingest-filemgr02.a.fXXXXXXX.net (not found) server node
>
> INGEST_FILEMGR_xis2302 0A23982E57FD9960   -              -
> ingest-filemgr01.a.fXXXXXXX.net (not found) server node
>
> INGEST_FILEMGR_xis2302 0A23982E57FD9960   -              -
> ingest-filemgr02.a.fXXXXXXX.net (not found) server node
>
> INGEST_FILEMGR_xis2303 0A23982E57FD9962   -              -
> ingest-filemgr01.a.fXXXXXXX.net (not found) server node
>
>
>
>
>
> *Mark*
>
> This message (including any attachments) is intended only for the use of
> the individual or entity to which it is addressed and may contain
> information that is non-public, proprietary, privileged, confidential, and
> exempt from disclosure under applicable law. If you are not the intended
> recipient, you are hereby notified that any use, dissemination,
> distribution, or copying of this communication is strictly prohibited. This
> message may be viewed by parties at Sirius Computer Solutions other than
> those named in the message header. This message does not contain an
> official representation of Sirius Computer Solutions. If you have received
> this communication in error, notify Sirius Computer Solutions immediately
> and (i) destroy this message if a facsimile or (ii) delete this message
> immediately if this is an electronic communication. Thank you.
> Sirius Computer Solutions <http://www.siriuscom.com>
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20170726/892aa5ff/attachment.htm>

From Mark.Bush at siriuscom.com  Wed Jul 26 19:19:15 2017
From: Mark.Bush at siriuscom.com (Mark Bush)
Date: Wed, 26 Jul 2017 18:19:15 +0000
Subject: [gpfsug-discuss] Lost disks
In-Reply-To: <B5B1EFA5-D0E7-4001-88C4-47D8A966C0BE@nuance.com>
References: <B5B1EFA5-D0E7-4001-88C4-47D8A966C0BE@nuance.com>
Message-ID: <CY4PR05MB35419D89CBA01BC7E0C2A213E4B90@CY4PR05MB3541.namprd05.prod.outlook.com>

What is this manual header reconstruction you speak of?   That doesn?t sound trivial at all.

From: Oesterlin, Robert [mailto:Robert.Oesterlin at nuance.com]
Sent: Wednesday, July 26, 2017 12:46 PM
To: gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
Subject: Re: [gpfsug-discuss] Lost disks

One way this could possible happen would be a system is being installed (I?m assuming this is Linux) and the FC adapter is active; then the OS install will see disks and wipe out the NSD descriptor on those disks. (Which is why the NSD V2 format was invented, to prevent this from happening) If you don?t lose all of the descriptors, it?s sometimes possible to manually re-construct the missing header information - I?m assuming since you opened a PMR, IBM has looked at this. This is a scenario I?ve had to recover from - twice. Back-end array issue seems unlikely to me, I?d keep looking at the systems with access to those LUNs and see what commands/operations could have been run.

Bob Oesterlin
Sr Principal Storage Engineer, Nuance


From: <gpfsug-discuss-bounces at spectrumscale.org<mailto:gpfsug-discuss-bounces at spectrumscale.org>> on behalf of Mark Bush <Mark.Bush at siriuscom.com<mailto:Mark.Bush at siriuscom.com>>
Reply-To: gpfsug main discussion list <gpfsug-discuss at spectrumscale.org<mailto:gpfsug-discuss at spectrumscale.org>>
Date: Wednesday, July 26, 2017 at 12:29 PM
To: "gpfsug-discuss at spectrumscale.org<mailto:gpfsug-discuss at spectrumscale.org>" <gpfsug-discuss at spectrumscale.org<mailto:gpfsug-discuss at spectrumscale.org>>
Subject: [EXTERNAL] [gpfsug-discuss] Lost disks

I have a client has had an issue where all of the nsd disks disappeared in the cluster recently.  Not sure if it?s due to a back end disk issue or if it?s a reboot that did it.  But in their PMR they were told that all that data is lost now and that the disk headers didn?t appear as GPFS disk headers.  How on earth could something like that happen?  Could it be a backend disk thing?  They are confident that nobody tried to reformat disks but aren?t 100% sure that something at the disk array couldn?t have caused this.

This message (including any attachments) is intended only for the use of the individual or entity to which it is addressed and may contain information that is non-public, proprietary, privileged, confidential, and exempt from disclosure under applicable law. If you are not the intended recipient, you are hereby notified that any use, dissemination, distribution, or copying of this communication is strictly prohibited. This message may be viewed by parties at Sirius Computer Solutions other than those named in the message header. This message does not contain an official representation of Sirius Computer Solutions. If you have received this communication in error, notify Sirius Computer Solutions immediately and (i) destroy this message if a facsimile or (ii) delete this message immediately if this is an electronic communication. Thank you.

Sirius Computer Solutions<http://www.siriuscom.com>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20170726/b349ce20/attachment.htm>

From Robert.Oesterlin at nuance.com  Wed Jul 26 20:05:59 2017
From: Robert.Oesterlin at nuance.com (Oesterlin, Robert)
Date: Wed, 26 Jul 2017 19:05:59 +0000
Subject: [gpfsug-discuss] Lost disks
In-Reply-To: <CY4PR05MB35419D89CBA01BC7E0C2A213E4B90@CY4PR05MB3541.namprd05.prod.outlook.com>
References: <B5B1EFA5-D0E7-4001-88C4-47D8A966C0BE@nuance.com>
	<CY4PR05MB35419D89CBA01BC7E0C2A213E4B90@CY4PR05MB3541.namprd05.prod.outlook.com>
Message-ID: <DC0531C9-B129-413A-BDA0-44AC3D25FC00@nuance.com>

IBM has a procedure for it that may work in some cases, but you?re manually editing the NSD descriptors on disk. Contact IBM if you think an NSD has been lost to descriptor being re-written.

Bob Oesterlin
Sr Principal Storage Engineer, Nuance


From: <gpfsug-discuss-bounces at spectrumscale.org> on behalf of Mark Bush <Mark.Bush at siriuscom.com>
Reply-To: gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
Date: Wednesday, July 26, 2017 at 1:19 PM
To: gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
Subject: [EXTERNAL] Re: [gpfsug-discuss] Lost disks

What is this manual header reconstruction you speak of?   That doesn?t sound trivial at all.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20170726/734fe1e5/attachment.htm>

From john.hearns at asml.com  Thu Jul 27 11:39:28 2017
From: john.hearns at asml.com (John Hearns)
Date: Thu, 27 Jul 2017 10:39:28 +0000
Subject: [gpfsug-discuss] Lost disks
In-Reply-To: <CY4PR05MB35419D89CBA01BC7E0C2A213E4B90@CY4PR05MB3541.namprd05.prod.outlook.com>
References: <B5B1EFA5-D0E7-4001-88C4-47D8A966C0BE@nuance.com>
	<CY4PR05MB35419D89CBA01BC7E0C2A213E4B90@CY4PR05MB3541.namprd05.prod.outlook.com>
Message-ID: <HE1PR02MB14500E3DE5A616317B0153F988BE0@HE1PR02MB1450.eurprd02.prod.outlook.com>

Mark,
  I once rescued a system which had the disk partition on the OS disks deleted. (This was a system with a device mapper RAID pair of OS disks).
Download a copy of sysrescue  http://www.system-rescue-cd.org/ and create a bootable USB stick (or network boot).
When you boot the system in sysrescue it has a utility to scan disks which will identify existing partitions, even if the partition table has been erased.
I can?t say if this will do anything with the disks in your system, but this is certainly worth a try if you suspect that the data is all still on disk.


From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Mark Bush
Sent: Wednesday, July 26, 2017 8:19 PM
To: gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
Subject: Re: [gpfsug-discuss] Lost disks

What is this manual header reconstruction you speak of?   That doesn?t sound trivial at all.

From: Oesterlin, Robert [mailto:Robert.Oesterlin at nuance.com]
Sent: Wednesday, July 26, 2017 12:46 PM
To: gpfsug main discussion list <gpfsug-discuss at spectrumscale.org<mailto:gpfsug-discuss at spectrumscale.org>>
Subject: Re: [gpfsug-discuss] Lost disks

One way this could possible happen would be a system is being installed (I?m assuming this is Linux) and the FC adapter is active; then the OS install will see disks and wipe out the NSD descriptor on those disks. (Which is why the NSD V2 format was invented, to prevent this from happening) If you don?t lose all of the descriptors, it?s sometimes possible to manually re-construct the missing header information - I?m assuming since you opened a PMR, IBM has looked at this. This is a scenario I?ve had to recover from - twice. Back-end array issue seems unlikely to me, I?d keep looking at the systems with access to those LUNs and see what commands/operations could have been run.

Bob Oesterlin
Sr Principal Storage Engineer, Nuance


From: <gpfsug-discuss-bounces at spectrumscale.org<mailto:gpfsug-discuss-bounces at spectrumscale.org>> on behalf of Mark Bush <Mark.Bush at siriuscom.com<mailto:Mark.Bush at siriuscom.com>>
Reply-To: gpfsug main discussion list <gpfsug-discuss at spectrumscale.org<mailto:gpfsug-discuss at spectrumscale.org>>
Date: Wednesday, July 26, 2017 at 12:29 PM
To: "gpfsug-discuss at spectrumscale.org<mailto:gpfsug-discuss at spectrumscale.org>" <gpfsug-discuss at spectrumscale.org<mailto:gpfsug-discuss at spectrumscale.org>>
Subject: [EXTERNAL] [gpfsug-discuss] Lost disks

I have a client has had an issue where all of the nsd disks disappeared in the cluster recently.  Not sure if it?s due to a back end disk issue or if it?s a reboot that did it.  But in their PMR they were told that all that data is lost now and that the disk headers didn?t appear as GPFS disk headers.  How on earth could something like that happen?  Could it be a backend disk thing?  They are confident that nobody tried to reformat disks but aren?t 100% sure that something at the disk array couldn?t have caused this.

This message (including any attachments) is intended only for the use of the individual or entity to which it is addressed and may contain information that is non-public, proprietary, privileged, confidential, and exempt from disclosure under applicable law. If you are not the intended recipient, you are hereby notified that any use, dissemination, distribution, or copying of this communication is strictly prohibited. This message may be viewed by parties at Sirius Computer Solutions other than those named in the message header. This message does not contain an official representation of Sirius Computer Solutions. If you have received this communication in error, notify Sirius Computer Solutions immediately and (i) destroy this message if a facsimile or (ii) delete this message immediately if this is an electronic communication. Thank you.
Sirius Computer Solutions<https://emea01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.siriuscom.com&data=01%7C01%7Cjohn.hearns%40asml.com%7Cc6b43411a63b4054538108d4d452d67f%7Caf73baa8f5944eb2a39d93e96cad61fc%7C1&sdata=LFT%2B8Iszvz%2B7pa9CCq0UIYLqT%2Fj%2FtVso2UU7%2Bh3jBJw%3D&reserved=0>
-- The information contained in this communication and any attachments is confidential and may be privileged, and is for the sole use of the intended recipient(s). Any unauthorized review, use, disclosure or distribution is prohibited. Unless explicitly stated otherwise in the body of this communication or the attachment thereto (if any), the information is provided on an AS-IS basis without any express or implied warranties or liabilities. To the extent you are relying on this information, you are doing so at your own risk. If you are not the intended recipient, please notify the sender immediately by replying to this message and destroy all copies of this message and any attachments. Neither the sender nor the company/group of companies he or she represents shall be liable for the proper and complete transmission of the information contained in this communication, or for any delay in its receipt.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20170727/02a87a82/attachment.htm>

From jonathan at buzzard.me.uk  Thu Jul 27 11:58:08 2017
From: jonathan at buzzard.me.uk (Jonathan Buzzard)
Date: Thu, 27 Jul 2017 11:58:08 +0100
Subject: [gpfsug-discuss] Lost disks
In-Reply-To: <B5B1EFA5-D0E7-4001-88C4-47D8A966C0BE@nuance.com>
References: <B5B1EFA5-D0E7-4001-88C4-47D8A966C0BE@nuance.com>
Message-ID: <1501153088.26563.39.camel@buzzard.me.uk>

On Wed, 2017-07-26 at 17:45 +0000, Oesterlin, Robert wrote:
> One way this could possible happen would be a system is being
> installed (I?m assuming this is Linux) and the FC adapter is active;
> then the OS install will see disks and wipe out the NSD descriptor on
> those disks. (Which is why the NSD V2 format was invented, to prevent
> this from happening) If you don?t lose all of the descriptors, it?s
> sometimes possible to manually re-construct the missing header
> information - I?m assuming since you opened a PMR, IBM has looked at
> this. This is a scenario I?ve had to recover from - twice. Back-end
> array issue seems unlikely to me, I?d keep looking at the systems with
> access to those LUNs and see what commands/operations could have been
> run.

I would concur that this is the most likely scenario; an install where
for whatever reason the machine could see the disks and they are gone. I
know that RHEL6 and its derivatives will do that for you. Has happened
to me at previous place of work where another admin forgot to de-zone a
server, went to install CentOS6 as part of a cluster upgrade from
CentOS5 and overwrote all the NSD descriptors.

Thing is GPFS does not look at the NSD descriptors that much. So in my
case it was several days before it was noticed, and only then because I
rebooted the last NSD server as part of a rolling upgrade of GPFS. I
could have cruised for weeks/months with no NSD descriptors if I had not
restarted all the NSD servers. The moral of this is the overwrite could
have take place quite some time ago.

Basically if the disks are all missing then the NSD descriptor has been
overwritten, and the protestations of the client are irrelevant. The
chances of the disk array doing it to *ALL* the disks is somewhere
around ? IMHO.

JAB.

-- 
Jonathan A. Buzzard                 Email: jonathan (at) buzzard.me.uk
Fife, United Kingdom.


From richard.rupp at us.ibm.com  Thu Jul 27 12:28:35 2017
From: richard.rupp at us.ibm.com (RICHARD RUPP)
Date: Thu, 27 Jul 2017 07:28:35 -0400
Subject: [gpfsug-discuss] Lost disks
In-Reply-To: <HE1PR02MB14500E3DE5A616317B0153F988BE0@HE1PR02MB1450.eurprd02.prod.outlook.com>
References: <B5B1EFA5-D0E7-4001-88C4-47D8A966C0BE@nuance.com><CY4PR05MB35419D89CBA01BC7E0C2A213E4B90@CY4PR05MB3541.namprd05.prod.outlook.com>
	<HE1PR02MB14500E3DE5A616317B0153F988BE0@HE1PR02MB1450.eurprd02.prod.outlook.com>
Message-ID: <OF44341D90.40B0F881-ON8525816A.003EF824-8525816A.003F0B0A@notes.na.collabserv.com>

If you are under IBM support, leverage IBM for help. A third party utility
has the possibility of making it worse.


From:	John Hearns <john.hearns at asml.com>
To:	gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
Date:	07/27/2017 06:40 AM
Subject:	Re: [gpfsug-discuss] Lost disks
Sent by:	gpfsug-discuss-bounces at spectrumscale.org


Mark,
  I once rescued a system which had the disk partition on the OS disks
deleted. (This was a system with a device mapper RAID pair of OS disks).
Download a copy of sysrescue  http://www.system-rescue-cd.org/ and create a
bootable USB stick (or network boot).
When you boot the system in sysrescue it has a utility to scan disks which
will identify existing partitions, even if the partition table has been
erased.
I can?t say if this will do anything with the disks in your system, but
this is certainly worth a try if you suspect that the data is all still on
disk.


From: gpfsug-discuss-bounces at spectrumscale.org [
mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Mark Bush
Sent: Wednesday, July 26, 2017 8:19 PM
To: gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
Subject: Re: [gpfsug-discuss] Lost disks

What is this manual header reconstruction you speak of?   That doesn?t
sound trivial at all.

From: Oesterlin, Robert [mailto:Robert.Oesterlin at nuance.com]
Sent: Wednesday, July 26, 2017 12:46 PM
To: gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
Subject: Re: [gpfsug-discuss] Lost disks

One way this could possible happen would be a system is being installed
(I?m assuming this is Linux) and the FC adapter is active; then the OS
install will see disks and wipe out the NSD descriptor on those disks.
(Which is why the NSD V2 format was invented, to prevent this from
happening) If you don?t lose all of the descriptors, it?s sometimes
possible to manually re-construct the missing header information - I?m
assuming since you opened a PMR, IBM has looked at this. This is a scenario
I?ve had to recover from - twice. Back-end array issue seems unlikely to
me, I?d keep looking at the systems with access to those LUNs and see what
commands/operations could have been run.

Bob Oesterlin
Sr Principal Storage Engineer, Nuance


From: <gpfsug-discuss-bounces at spectrumscale.org> on behalf of Mark Bush <
Mark.Bush at siriuscom.com>
Reply-To: gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
Date: Wednesday, July 26, 2017 at 12:29 PM
To: "gpfsug-discuss at spectrumscale.org" <gpfsug-discuss at spectrumscale.org>
Subject: [EXTERNAL] [gpfsug-discuss] Lost disks

I have a client has had an issue where all of the nsd disks disappeared in
the cluster recently.  Not sure if it?s due to a back end disk issue or if
it?s a reboot that did it.  But in their PMR they were told that all that
data is lost now and that the disk headers didn?t appear as GPFS disk
headers.  How on earth could something like that happen?  Could it be a
backend disk thing?  They are confident that nobody tried to reformat disks
but aren?t 100% sure that something at the disk array couldn?t have caused
this.


This message (including any attachments) is intended only for the use of
the individual or entity to which it is addressed and may contain
information that is non-public, proprietary, privileged, confidential, and
exempt from disclosure under applicable law. If you are not the intended
recipient, you are hereby notified that any use, dissemination,
distribution, or copying of this communication is strictly prohibited. This
message may be viewed by parties at Sirius Computer Solutions other than
those named in the message header. This message does not contain an
official representation of Sirius Computer Solutions. If you have received
this communication in error, notify Sirius Computer Solutions immediately
and (i) destroy this message if a facsimile or (ii) delete this message
immediately if this is an electronic communication. Thank you.
Sirius Computer Solutions


-- The information contained in this communication and any attachments is
confidential and may be privileged, and is for the sole use of the intended
recipient(s). Any unauthorized review, use, disclosure or distribution is
prohibited. Unless explicitly stated otherwise in the body of this
communication or the attachment thereto (if any), the information is
provided on an AS-IS basis without any express or implied warranties or
liabilities. To the extent you are relying on this information, you are
doing so at your own risk. If you are not the intended recipient, please
notify the sender immediately by replying to this message and destroy all
copies of this message and any attachments. Neither the sender nor the
company/group of companies he or she represents shall be liable for the
proper and complete transmission of the information contained in this
communication, or for any delay in its receipt.
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20170727/0370c42b/attachment.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: graycol.gif
Type: image/gif
Size: 105 bytes
Desc: not available
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20170727/0370c42b/attachment.gif>

From jonathan.buzzard at strath.ac.uk  Thu Jul 27 12:58:50 2017
From: jonathan.buzzard at strath.ac.uk (Jonathan Buzzard)
Date: Thu, 27 Jul 2017 12:58:50 +0100
Subject: [gpfsug-discuss] Lost disks
In-Reply-To: <OF44341D90.40B0F881-ON8525816A.003EF824-8525816A.003F0B0A@notes.na.collabserv.com>
References: <B5B1EFA5-D0E7-4001-88C4-47D8A966C0BE@nuance.com>
	<CY4PR05MB35419D89CBA01BC7E0C2A213E4B90@CY4PR05MB3541.namprd05.prod.outlook.com>
	<HE1PR02MB14500E3DE5A616317B0153F988BE0@HE1PR02MB1450.eurprd02.prod.outlook.com>
	<OF44341D90.40B0F881-ON8525816A.003EF824-8525816A.003F0B0A@notes.na.collabserv.com>
Message-ID: <1501156730.26563.49.camel@strath.ac.uk>

On Thu, 2017-07-27 at 07:28 -0400, RICHARD RUPP wrote:
> If you are under IBM support, leverage IBM for help. A third party
> utility has the possibility of making it worse. 
> 

The chances of recovery are slim in the first place from this sort of
problem. At least with v1 NSD descriptors. Further IBM have *ALREADY*
told him the data is lost, I quote 

    But in their PMR they were told that all that data is lost now
    and that the disk headers didn?t appear as GPFS disk headers. 

So in this scenario you have little to loose trying something because
you are now on your own. Worst case scenario is that whatever you try
does not work, which leave you no worse of than you are now. Well apart
from lost time for the restore, but you might have started that already
to somewhere else.

I was once told by IBM (nine years ago now) that my GPFS file system was
caput and to arrange a restore from tape. At which point some fiddling
by myself fixed the problem and a 100TB restore was no longer required.
However this was not due to overwritten NSD descriptors. When that
happened the two file systems effected had to be restored. Well
bizarrely one was still mounted and I was able to rsync the data off. 

However the point is that at this stage fiddling with third party tools
is the only option left.

JAB.

-- 
Jonathan A. Buzzard                         Tel: +44141-5483420
HPC System Administrator, ARCHIE-WeSt.
University of Strathclyde, John Anderson Building, Glasgow. G4 0NG


From UWEFALKE at de.ibm.com  Thu Jul 27 15:18:02 2017
From: UWEFALKE at de.ibm.com (Uwe Falke)
Date: Thu, 27 Jul 2017 16:18:02 +0200
Subject: [gpfsug-discuss] Lost disks
In-Reply-To: <1501156730.26563.49.camel@strath.ac.uk>
References: <B5B1EFA5-D0E7-4001-88C4-47D8A966C0BE@nuance.com><CY4PR05MB35419D89CBA01BC7E0C2A213E4B90@CY4PR05MB3541.namprd05.prod.outlook.com><HE1PR02MB14500E3DE5A616317B0153F988BE0@HE1PR02MB1450.eurprd02.prod.outlook.com><OF44341D90.40B0F881-ON8525816A.003EF824-8525816A.003F0B0A@notes.na.collabserv.com>
	<1501156730.26563.49.camel@strath.ac.uk>
Message-ID: <OF85FFAFF3.35561DB1-ONC125816A.00495ABD-C125816A.004E8E12@notes.na.collabserv.com>

"Just doing something" makes things worse usually. Whether a 3rd party 
tool knows how to handle GPFS NSDs can be doubted (as long as it is not 
dedicated to that purpose). 

First, I'd look what is actually on the sectors where the NSD headers used 
to be, and try to find  whether data beyond that area were also modified 
(if the latter is the case, restoring the NSDs does not make much sense as 
data and/or metadata (depending on disk usage)  would also be corrupted. 
If you are sure that just the NSD header area has been affected, you might 
try to trick GPFS in getting just the information into the header area 
needed that GPFS recognises the devices as the NSDs they were. 

The first 4 kiB of a v1 NSD from a VM on my laptop look like 

$ cat nsdv1head | od --address-radix=x -xc
000000    0000    0000    0000    0000    0000    0000    0000    0000
        \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0
*
000200    cf70    4192    0000    0100    0000    3000    e930    a028
         p 317 222   A  \0  \0  \0 001  \0  \0  \0   0   0 351   ( 240
000210    a8c0    ce7a    a251    1f92    a251    1a92    0000    0800
       300 250   z 316   Q 242 222 037   Q 242 222 032  \0  \0  \0  \b
000220    0000    f20f    0000    0000    0000    0000    0000    0000
        \0  \0 017 362  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0
000230    0000    0000    0000    0000    0000    0000    0000    0000
        \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0
*
000400    93d2    7885    0000    0100    0000    0002    141e    64a8
       322 223 205   x  \0  \0  \0 001  \0  \0 002  \0 036 024 250   d
000410    a8c0    ce7a    a251    3490    0000    fa0f    0000    0800
       300 250   z 316   Q 242 220   4  \0  \0 017 372  \0  \0  \0  \b
000420    0000    0000    0000    0000    0000    0000    0000    0000
        \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0
*
000480    534e    2044    6564    6373    6972    7470    726f    6620
         N   S   D       d   e   s   c   r   i   p   t   o   r       f
000490    726f    2f20    6564    2f76    6476    2062    7263    6165
         o   r       /   d   e   v   /   v   d   b       c   r   e   a
0004a0    6574    2064    7962    4720    4650    2053    6f4d    206e
         t   e   d       b   y       G   P   F   S       M   o   n 
0004b0    614d    2079    3732    3020    3a30    3434    303a    2034
         M   a   y       2   7       0   0   :   4   4   :   0   4 
0004c0    3032    3331    000a    0000    0000    0000    0000    0000
         2   0   1   3  \n  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0
0004d0    0000    0000    0000    0000    0000    0000    0000    0000
        \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0
*
000e00    4c5f    4d56    0000    017d    0000    017d    0000    017d
         _   L   V   M  \0  \0   } 001  \0  \0   } 001  \0  \0   } 001
000e10    0000    017d    0000    0000    0000    0000    0000    0000
        \0  \0   } 001  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0
000e20    0000    0000    0000    0000    0000    0000    0000    0000
        \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0
000e30    0000    0000    0000    0000    0000    0000    017d    0000
        \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0   } 001  \0  \0
000e40    0000    0000    0000    0000    0000    0000    0000    0000
        \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0
*
001000

I suppose, the important area starts at 0x0200 (ie. with the second 
512Byte sector) and ends at 0x04df (which would be within the 3rd 512Bytes 
sector, hence the 2nd and 3rd sectors appear crucial). I think that there 
is some more space before the  payload area starts.  Without knowledge 
what exactly has to go into the header, I'd try to create an NSD on one or 
two (new) disks, save the headers, then create an FS on them, save the 
headers again, check if anything has changed. 
So, creating some new NSDs, checking what keys might appear there and in 
the cluster configuration could get you very close to craft the header 
information which is gone. Of course, that depends on how dear the data on 
the gone FS AKA SG are and how hard it'd be to rebuild them otherwise 
(replay from backup, recalculate, ...) 

It seems not a bad idea to set aside the NSD headers of your NSDs  in a 
back up :-)
And also now: Before amending any blocks on your disks, save them!

 
Mit freundlichen Gr??en / Kind regards

 
Dr. Uwe Falke
 
IT Specialist
High Performance Computing Services / Integrated Technology Services / 
Data Center Services
-------------------------------------------------------------------------------------------------------------------------------------------
IBM Deutschland
Rathausstr. 7
09111 Chemnitz
Phone: +49 371 6978 2165
Mobile: +49 175 575 2877
E-Mail: uwefalke at de.ibm.com
-------------------------------------------------------------------------------------------------------------------------------------------
IBM Deutschland Business & Technology Services GmbH / Gesch?ftsf?hrung: 
Andreas Hasse, Thomas Wolter
Sitz der Gesellschaft: Ehningen / Registergericht: Amtsgericht Stuttgart, 
HRB 17122 


From:   Jonathan Buzzard <jonathan.buzzard at strath.ac.uk>
To:     gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
Date:   07/27/2017 01:59 PM
Subject:        Re: [gpfsug-discuss] Lost disks
Sent by:        gpfsug-discuss-bounces at spectrumscale.org


On Thu, 2017-07-27 at 07:28 -0400, RICHARD RUPP wrote:
> If you are under IBM support, leverage IBM for help. A third party
> utility has the possibility of making it worse. 
> 

The chances of recovery are slim in the first place from this sort of
problem. At least with v1 NSD descriptors. Further IBM have *ALREADY*
told him the data is lost, I quote 

    But in their PMR they were told that all that data is lost now
    and that the disk headers didn?t appear as GPFS disk headers. 

So in this scenario you have little to loose trying something because
you are now on your own. Worst case scenario is that whatever you try
does not work, which leave you no worse of than you are now. Well apart
from lost time for the restore, but you might have started that already
to somewhere else.

I was once told by IBM (nine years ago now) that my GPFS file system was
caput and to arrange a restore from tape. At which point some fiddling
by myself fixed the problem and a 100TB restore was no longer required.
However this was not due to overwritten NSD descriptors. When that
happened the two file systems effected had to be restored. Well
bizarrely one was still mounted and I was able to rsync the data off. 

However the point is that at this stage fiddling with third party tools
is the only option left.

JAB.

-- 
Jonathan A. Buzzard                         Tel: +44141-5483420
HPC System Administrator, ARCHIE-WeSt.
University of Strathclyde, John Anderson Building, Glasgow. G4 0NG


_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


From jonathan.buzzard at strath.ac.uk  Thu Jul 27 16:09:31 2017
From: jonathan.buzzard at strath.ac.uk (Jonathan Buzzard)
Date: Thu, 27 Jul 2017 16:09:31 +0100
Subject: [gpfsug-discuss] Lost disks
In-Reply-To: <OF85FFAFF3.35561DB1-ONC125816A.00495ABD-C125816A.004E8E12@notes.na.collabserv.com>
References: <B5B1EFA5-D0E7-4001-88C4-47D8A966C0BE@nuance.com>
	<CY4PR05MB35419D89CBA01BC7E0C2A213E4B90@CY4PR05MB3541.namprd05.prod.outlook.com>
	<HE1PR02MB14500E3DE5A616317B0153F988BE0@HE1PR02MB1450.eurprd02.prod.outlook.com>
	<OF44341D90.40B0F881-ON8525816A.003EF824-8525816A.003F0B0A@notes.na.collabserv.com>
	<1501156730.26563.49.camel@strath.ac.uk>
	<OF85FFAFF3.35561DB1-ONC125816A.00495ABD-C125816A.004E8E12@notes.na.collabserv.com>
Message-ID: <1501168171.26563.56.camel@strath.ac.uk>

On Thu, 2017-07-27 at 16:18 +0200, Uwe Falke wrote:

> "Just doing something" makes things worse usually. Whether a 3rd
> party tool knows how to handle GPFS NSDs can be doubted (as long as it
> is not dedicated to that purpose). 

It might usually, but IBM have *ALREADY* given up in this case and told
the customer their data is toast. Under these circumstances other than
wasting time that could have been spent profitably on a restore it is
*IMPOSSIBLE* to make the situation worse.

[SNIP]

> It seems not a bad idea to set aside the NSD headers of your NSDs  in a 
> back up :-)
> And also now: Before amending any blocks on your disks, save them!
> 

It's called NSD v2 descriptor format, so rather than use raw disks they
are in a GPT partition, and for good measure a backup copy is stored at
the end of the disk too.

Personally if I had any v1 NSD's in a file system I would have a plan
for a series of mmdeldisk/mmcrnsd/mmadddisk to get them all to v2 sooner
rather than later.

JAB.

-- 
Jonathan A. Buzzard                         Tel: +44141-5483420
HPC System Administrator, ARCHIE-WeSt.
University of Strathclyde, John Anderson Building, Glasgow. G4 0NG


From Robert.Oesterlin at nuance.com  Thu Jul 27 16:28:02 2017
From: Robert.Oesterlin at nuance.com (Oesterlin, Robert)
Date: Thu, 27 Jul 2017 15:28:02 +0000
Subject: [gpfsug-discuss] NSD V2 vs V1 - how can you tell?
Message-ID: <58646EEA-0AD2-4EC8-8A05-070064C35F2E@nuance.com>

I?m sure I have a mix of V1 and V2 NSDs - how can I tell what the format each is?

Bob Oesterlin
Sr Principal Storage Engineer, Nuance


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20170727/fbda9447/attachment.htm>

From UWEFALKE at de.ibm.com  Thu Jul 27 16:51:29 2017
From: UWEFALKE at de.ibm.com (Uwe Falke)
Date: Thu, 27 Jul 2017 17:51:29 +0200
Subject: [gpfsug-discuss] Lost disks
In-Reply-To: <1501168171.26563.56.camel@strath.ac.uk>
References: <B5B1EFA5-D0E7-4001-88C4-47D8A966C0BE@nuance.com><CY4PR05MB35419D89CBA01BC7E0C2A213E4B90@CY4PR05MB3541.namprd05.prod.outlook.com><HE1PR02MB14500E3DE5A616317B0153F988BE0@HE1PR02MB1450.eurprd02.prod.outlook.com><OF44341D90.40B0F881-ON8525816A.003EF824-8525816A.003F0B0A@notes.na.collabserv.com><1501156730.26563.49.camel@strath.ac.uk><OF85FFAFF3.35561DB1-ONC125816A.00495ABD-C125816A.004E8E12@notes.na.collabserv.com>
	<1501168171.26563.56.camel@strath.ac.uk>
Message-ID: <OF1C1B07DF.60E2D9A6-ONC125816A.0055C50A-C125816A.00571C77@notes.na.collabserv.com>

gpfsug-discuss-bounces at spectrumscale.org wrote on 07/27/2017 05:09:31 PM:

> From: Jonathan Buzzard <jonathan.buzzard at strath.ac.uk>
> To: gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
> Date: 07/27/2017 05:09 PM
> Subject: Re: [gpfsug-discuss] Lost disks
> Sent by: gpfsug-discuss-bounces at spectrumscale.org

> 
> On Thu, 2017-07-27 at 16:18 +0200, Uwe Falke wrote:
> 
> > "Just doing something" makes things worse usually. Whether a 3rd
> > party tool knows how to handle GPFS NSDs can be doubted (as long as it
> > is not dedicated to that purpose). 
> 
> It might usually, but IBM have *ALREADY* given up in this case and told
> the customer their data is toast. Under these circumstances other than
> wasting time that could have been spent profitably on a restore it is
> *IMPOSSIBLE* to make the situation worse.
SCNR: It is always possible to make things worse.
However, of course, if the efforts to do research on that system appear 
too expensive compared to the possible gain, then it is wise to give up 
and restore data from backup to a new file system. 

> 
> [SNIP]
> 
> > It seems not a bad idea to set aside the NSD headers of your NSDs  in 
a 
> > back up :-)
> > And also now: Before amending any blocks on your disks, save them!
> > 
> 
> It's called NSD v2 descriptor format, so rather than use raw disks they
> are in a GPT partition, and for good measure a backup copy is stored at
> the end of the disk too.
> 
> Personally if I had any v1 NSD's in a file system I would have a plan
> for a series of mmdeldisk/mmcrnsd/mmadddisk to get them all to v2 sooner
> rather than later.

Yep, but I suppose the gone NSDs were v1. 
Then, there might be some restrictions blocking the move from NSDv1 to 
NSDv2 (old FS level still req.ed, or just the hugeness of a file system).
And you never know, if some tool runs wild due to logical failures it 
overwrites all GPT copies on a disk and you're lost again (but of course 
NSDv2 has been a tremendous step ahead).

 
Mit freundlichen Gr??en / Kind regards

 
Dr. Uwe Falke
 
IT Specialist
High Performance Computing Services / Integrated Technology Services / 
Data Center Services
-------------------------------------------------------------------------------------------------------------------------------------------
IBM Deutschland
Rathausstr. 
09111 Chemnitz
Phone: +49 371 6978 2165
Mobile: +49 175 575 2877
E-Mail: uwefalke at de.ibm.com
-------------------------------------------------------------------------------------------------------------------------------------------
IBM Deutschland Business & Technology Services GmbH / Gesch?ftsf?hrung: 
Andreas Hasse, Thomas Wolter
Sitz der Gesellschaft: Ehningen / Registergericht: Amtsgericht Stuttgart, 
HRB 17122 


From luke.raimbach at googlemail.com  Thu Jul 27 17:09:42 2017
From: luke.raimbach at googlemail.com (Luke Raimbach)
Date: Thu, 27 Jul 2017 16:09:42 +0000
Subject: [gpfsug-discuss] NSD V2 vs V1 - how can you tell?
In-Reply-To: <58646EEA-0AD2-4EC8-8A05-070064C35F2E@nuance.com>
References: <58646EEA-0AD2-4EC8-8A05-070064C35F2E@nuance.com>
Message-ID: <CAAGb8Ns4Q496R+_vzTkC9VWPog8Eo6fJLwVAG=hAcKzkmcyynw@mail.gmail.com>

mmfsadm test readdescraw <device>

On Thu, 27 Jul 2017, 16:28 Oesterlin, Robert, <Robert.Oesterlin at nuance.com>
wrote:

> I?m sure I have a mix of V1 and V2 NSDs - how can I tell what the format
> each is?
>
>
>
> Bob Oesterlin
> Sr Principal Storage Engineer, Nuance
>
>
>
>
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20170727/9a4f875f/attachment.htm>

From Robert.Oesterlin at nuance.com  Thu Jul 27 17:17:20 2017
From: Robert.Oesterlin at nuance.com (Oesterlin, Robert)
Date: Thu, 27 Jul 2017 16:17:20 +0000
Subject: [gpfsug-discuss] NSD V2 vs V1 - how can you tell?
Message-ID: <50669E00-32A8-4AC7-A729-CB961F96ECAE@nuance.com>

Right - but what field do I look at?

Bob Oesterlin
Sr Principal Storage Engineer, Nuance


From: <gpfsug-discuss-bounces at spectrumscale.org> on behalf of Luke Raimbach <luke.raimbach at googlemail.com>
Reply-To: gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
Date: Thursday, July 27, 2017 at 11:10 AM
To: gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
Subject: [EXTERNAL] Re: [gpfsug-discuss] NSD V2 vs V1 - how can you tell?

mmfsadm test readdescraw <device>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20170727/ad9cae74/attachment.htm>

From jonathan.buzzard at strath.ac.uk  Thu Jul 27 19:26:45 2017
From: jonathan.buzzard at strath.ac.uk (Jonathan Buzzard)
Date: Thu, 27 Jul 2017 19:26:45 +0100
Subject: [gpfsug-discuss] Lost disks
In-Reply-To: <OF1C1B07DF.60E2D9A6-ONC125816A.0055C50A-C125816A.00571C77@notes.na.collabserv.com>
References: <B5B1EFA5-D0E7-4001-88C4-47D8A966C0BE@nuance.com>
	<CY4PR05MB35419D89CBA01BC7E0C2A213E4B90@CY4PR05MB3541.namprd05.prod.outlook.com>
	<HE1PR02MB14500E3DE5A616317B0153F988BE0@HE1PR02MB1450.eurprd02.prod.outlook.com>
	<OF44341D90.40B0F881-ON8525816A.003EF824-8525816A.003F0B0A@notes.na.collabserv.com>
	<1501156730.26563.49.camel@strath.ac.uk>
	<OF85FFAFF3.35561DB1-ONC125816A.00495ABD-C125816A.004E8E12@notes.na.collabserv.com>
	<1501168171.26563.56.camel@strath.ac.uk>
	<OF1C1B07DF.60E2D9A6-ONC125816A.0055C50A-C125816A.00571C77@notes.na.collabserv.com>
Message-ID: <3ea4371f-49c9-1236-44e2-d44f39e27c9e@strath.ac.uk>

On 27/07/17 16:51, Uwe Falke wrote:

[SNIP]

> SCNR: It is always possible to make things worse.
> However, of course, if the efforts to do research on that system appear
> too expensive compared to the possible gain, then it is wise to give up
> and restore data from backup to a new file system.
> 

Explain to me when IBM have washed their hands of the situation; that is 
they deem the file system unrecoverable and will take no further action 
to help the customer, how under these circumstances it is possible for 
it to get any worse attempting to recover the situation yourself?

The answer is you can't so and are talking complete codswallop.

In general you are right, in this situation you are utterly and totally 
wrong.

JAB.

-- 
Jonathan A. Buzzard                         Tel: +44141-5483420
HPC System Administrator, ARCHIE-WeSt.
University of Strathclyde, John Anderson Building, Glasgow. G4 0NG


From chair at spectrumscale.org  Thu Jul 27 21:19:15 2017
From: chair at spectrumscale.org (Spectrum Scale UG Chair (Simon Thompson))
Date: Thu, 27 Jul 2017 21:19:15 +0100
Subject: [gpfsug-discuss] Lost disks
In-Reply-To: <3ea4371f-49c9-1236-44e2-d44f39e27c9e@strath.ac.uk>
References: <B5B1EFA5-D0E7-4001-88C4-47D8A966C0BE@nuance.com>
	<CY4PR05MB35419D89CBA01BC7E0C2A213E4B90@CY4PR05MB3541.namprd05.prod.outlook.com>
	<HE1PR02MB14500E3DE5A616317B0153F988BE0@HE1PR02MB1450.eurprd02.prod.outlook.com>
	<OF44341D90.40B0F881-ON8525816A.003EF824-8525816A.003F0B0A@notes.na.collabserv.com>
	<1501156730.26563.49.camel@strath.ac.uk>
	<OF85FFAFF3.35561DB1-ONC125816A.00495ABD-C125816A.004E8E12@notes.na.collabserv.com>
	<1501168171.26563.56.camel@strath.ac.uk>
	<OF1C1B07DF.60E2D9A6-ONC125816A.0055C50A-C125816A.00571C77@notes.na.collabserv.com>
	<3ea4371f-49c9-1236-44e2-d44f39e27c9e@strath.ac.uk>
Message-ID: <D5A0090B.4BEEF%s.j.thompson@bham.ac.uk>


Guys, this is supposed to be a community mailing list where people can
come and ask questions and we can have healthy debate, but please can we
keep it calm?

Thanks

Simon
Group Chair


From sfadden at us.ibm.com  Thu Jul 27 21:33:19 2017
From: sfadden at us.ibm.com (Scott Fadden)
Date: Thu, 27 Jul 2017 20:33:19 +0000
Subject: [gpfsug-discuss] NSD V2 vs V1 - how can you tell?
In-Reply-To: <CAAGb8Ns4Q496R+_vzTkC9VWPog8Eo6fJLwVAG=hAcKzkmcyynw@mail.gmail.com>
References: <CAAGb8Ns4Q496R+_vzTkC9VWPog8Eo6fJLwVAG=hAcKzkmcyynw@mail.gmail.com>,
	<58646EEA-0AD2-4EC8-8A05-070064C35F2E@nuance.com>
Message-ID: <OF305306A4.871E7D3B-ON0025816A.006F1D5C-0025816A.0070E9F3@notes.na.collabserv.com>

An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20170727/c9cc1594/attachment.htm>

From jonathan.buzzard at strath.ac.uk  Fri Jul 28 00:29:47 2017
From: jonathan.buzzard at strath.ac.uk (Jonathan Buzzard)
Date: Fri, 28 Jul 2017 00:29:47 +0100
Subject: [gpfsug-discuss] NSD V2 vs V1 - how can you tell?
In-Reply-To: <58646EEA-0AD2-4EC8-8A05-070064C35F2E@nuance.com>
References: <58646EEA-0AD2-4EC8-8A05-070064C35F2E@nuance.com>
Message-ID: <f13689bc-a154-65e6-a42e-40636859d5aa@strath.ac.uk>

On 27/07/17 16:28, Oesterlin, Robert wrote:
> I?m sure I have a mix of V1 and V2 NSDs - how can I tell what the format 
> each is?
Well on anything approaching a recent Linux lsblk should as I understand 
it should show GPT partitions on v2 NSD's. Normally a v1 NSD would show 
up as a raw block device. I guess you could have created the v1 NSD's 
inside a partition but that was not normal practice.

JAB.

-- 
Jonathan A. Buzzard                         Tel: +44141-5483420
HPC System Administrator, ARCHIE-WeSt.
University of Strathclyde, John Anderson Building, Glasgow. G4 0NG


From daniel.kidger at uk.ibm.com  Fri Jul 28 12:03:40 2017
From: daniel.kidger at uk.ibm.com (Daniel Kidger)
Date: Fri, 28 Jul 2017 11:03:40 +0000
Subject: [gpfsug-discuss] Lost disks
In-Reply-To: <D5A0090B.4BEEF%s.j.thompson@bham.ac.uk>
References: <D5A0090B.4BEEF%s.j.thompson@bham.ac.uk>,
	<B5B1EFA5-D0E7-4001-88C4-47D8A966C0BE@nuance.com><CY4PR05MB35419D89CBA01BC7E0C2A213E4B90@CY4PR05MB3541.namprd05.prod.outlook.com><HE1PR02MB14500E3DE5A616317B0153F988BE0@HE1PR02MB1450.eurprd02.prod.outlook.com><OF44341D90.40B0F881-ON8525816A.003EF824-8525816A.003F0B0A@notes.na.collabserv.com><1501156730.26563.49.camel@strath.ac.uk><OF85FFAFF3.35561DB1-ONC125816A.00495ABD-C125816A.004E8E12@notes.na.collabserv.com><1501168171.26563.56.camel@strath.ac.uk><OF1C1B07DF.60E2D9A6-ONC125816A.0055C50A-C125816A.00571C77@notes.na.collabserv.com><3ea4371f-49c9-1236-44e2-d44f39e27c9e@strath.ac.uk>
Message-ID: <OF11345932.3169FD3A-ON0025816B.003C97E6-0025816B.003CC2FB@notes.na.collabserv.com>

An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20170728/c06e8192/attachment.htm>

From Robert.Oesterlin at nuance.com  Fri Jul 28 12:46:47 2017
From: Robert.Oesterlin at nuance.com (Oesterlin, Robert)
Date: Fri, 28 Jul 2017 11:46:47 +0000
Subject: [gpfsug-discuss] NSD V2 vs V1 - how can you tell?
Message-ID: <6DDEF46C-D30E-4C68-B569-770DE04A9D5B@nuance.com>

Hi Scott

This refers to the file system format which is independent of the NSD version number. File systems can be upgraded but all the NSDs are still at V1. For instance, here is an NSD I know is still V1:

[root at gpfs2-gpfs01 ~]# grep msa0319VOL2 volmaps
msa0319VOL2   mpathel   (3600c0ff0001497e259ebac5001000000) dm-19     14T  sdad  0[active][ready] sdft  1[active][ready] sdam  2[active][ready] sdgc  3[active][ready]
[root at gpfs2-gpfs01 ~]# mmfsadm test readdescraw /dev/dm-19 | grep " original format"
    original format version 1001, cur version 1600 (mgr 1600, helper 1600, mnode 1600)

The file system version is current however.

Bob Oesterlin
Sr Principal Storage Engineer, Nuance


From: <gpfsug-discuss-bounces at spectrumscale.org> on behalf of Scott Fadden <sfadden at us.ibm.com>
Reply-To: gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
Date: Thursday, July 27, 2017 at 3:33 PM
To: "gpfsug-discuss at spectrumscale.org" <gpfsug-discuss at spectrumscale.org>
Cc: "gpfsug-discuss at spectrumscale.org" <gpfsug-discuss at spectrumscale.org>
Subject: [EXTERNAL] Re: [gpfsug-discuss] NSD V2 vs V1 - how can you tell?


# mmfsadm test readdescraw /dev/dm-14 | grep " original format"

    original format version 1600, cur version 1700 (mgr 1700, helper 1700, mnode 1700)

The harder part is what version number = v2 and what matches version 1. The real answer is there is not a simple one, it is not really v1 vs v2 it is what feature you are interested in. Just one small example


4K Disk SECTOR support started in 1403
Dynamically enabling quotas started in 1404
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20170728/2110c40f/attachment.htm>

From Robert.Oesterlin at nuance.com  Fri Jul 28 13:44:11 2017
From: Robert.Oesterlin at nuance.com (Oesterlin, Robert)
Date: Fri, 28 Jul 2017 12:44:11 +0000
Subject: [gpfsug-discuss] LROC example
Message-ID: <8103C497-EFA2-41E3-A047-4C3A3AA3EC0B@nuance.com>

For those of you considering LROC, you may find this interesting. LROC can be very effective in some job mixes, as shown below. This is in a compute cluster of about 400 nodes. Each compute node has a 100GB LROC. In this particular job mix, LROC was recalling 3-4 times the traffic that was going to the NSDs. I see other cases where?s it?s less effective.

[cid:image001.png at 01D30775.4ACF3D20]

Bob Oesterlin
Sr Principal Storage Engineer, Nuance


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20170728/bd7e7ad6/attachment.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image001.png
Type: image/png
Size: 54425 bytes
Desc: image001.png
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20170728/bd7e7ad6/attachment.png>

From knop at us.ibm.com  Fri Jul 28 13:44:26 2017
From: knop at us.ibm.com (Felipe Knop)
Date: Fri, 28 Jul 2017 08:44:26 -0400
Subject: [gpfsug-discuss] NSD V2 vs V1 - how can you tell?
In-Reply-To: <6DDEF46C-D30E-4C68-B569-770DE04A9D5B@nuance.com>
References: <6DDEF46C-D30E-4C68-B569-770DE04A9D5B@nuance.com>
Message-ID: <OF4CD5940E.9DED324C-ON0025816B.004586C8-8525816B.0045F634@notes.na.collabserv.com>

Bob,

I believe the NSD format version (v1 vs v2) is shown in the " format 
version" line that starts with "NSDid" :


# mmfsadm test readdescraw /dev/dm-11
NSD descriptor in sector 64 of /dev/dm-11
    NSDid: 9461C0A85788693A  format version: 1403 Label:

It should say "1403" when the format is v2.


  Felipe

----
Felipe Knop                                     knop at us.ibm.com
GPFS Development and Security
IBM Systems
IBM Building 008
2455 South Rd, Poughkeepsie, NY 12601
(845) 433-9314  T/L 293-9314


From:   "Oesterlin, Robert" <Robert.Oesterlin at nuance.com>
To:     gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
Date:   07/28/2017 07:47 AM
Subject:        Re: [gpfsug-discuss] NSD V2 vs V1 - how can you tell?
Sent by:        gpfsug-discuss-bounces at spectrumscale.org


Hi Scott
 
This refers to the file system format which is independent of the NSD 
version number. File systems can be upgraded but all the NSDs are still at 
V1. For instance, here is an NSD I know is still V1:
 
[root at gpfs2-gpfs01 ~]# grep msa0319VOL2 volmaps
msa0319VOL2   mpathel   (3600c0ff0001497e259ebac5001000000) dm-19     14T 
sdad  0[active][ready] sdft  1[active][ready] sdam  2[active][ready] sdgc 
3[active][ready]
[root at gpfs2-gpfs01 ~]# mmfsadm test readdescraw /dev/dm-19 | grep " 
original format"
    original format version 1001, cur version 1600 (mgr 1600, helper 1600, 
mnode 1600)
 
The file system version is current however.
 
Bob Oesterlin
Sr Principal Storage Engineer, Nuance

 
From: <gpfsug-discuss-bounces at spectrumscale.org> on behalf of Scott Fadden 
<sfadden at us.ibm.com>
Reply-To: gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
Date: Thursday, July 27, 2017 at 3:33 PM
To: "gpfsug-discuss at spectrumscale.org" <gpfsug-discuss at spectrumscale.org>
Cc: "gpfsug-discuss at spectrumscale.org" <gpfsug-discuss at spectrumscale.org>
Subject: [EXTERNAL] Re: [gpfsug-discuss] NSD V2 vs V1 - how can you tell?
 
# mmfsadm test readdescraw /dev/dm-14 | grep " original format"
    original format version 1600, cur version 1700 (mgr 1700, helper 1700, 
mnode 1700)
 
The harder part is what version number = v2 and what matches version 1. 
The real answer is there is not a simple one, it is not really v1 vs v2 it 
is what feature you are interested in. Just one small example
 
4K Disk SECTOR support started in 1403
Dynamically enabling quotas started in 1404
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20170728/1f674338/attachment.htm>

From gcorneau at us.ibm.com  Fri Jul 28 20:07:54 2017
From: gcorneau at us.ibm.com (Glen Corneau)
Date: Fri, 28 Jul 2017 14:07:54 -0500
Subject: [gpfsug-discuss] NSD V2 vs V1 - how can you tell?
In-Reply-To: <OF73E3DE54.39B15A51-ON0025816B.00460D88@LocalDomain>
References: <6DDEF46C-D30E-4C68-B569-770DE04A9D5B@nuance.com>
	<OF73E3DE54.39B15A51-ON0025816B.00460D88@LocalDomain>
Message-ID: <mailman.4.1745504011.70945.gpfsug-discuss_gpfsug.org@gpfsug.org>

Just a note for my AIX folks out there (and I know there's at least one!):

When NSDv2 (version 1403) disks are defined in AIX we *don't* create GPTs 
on those LUNs.

However with GPFS (Spectrum Scale) installed on AIX we will place the NSD 
name in the "VG" column of lsvg.

But yes, we've had situations of customers creating new VGs on existing 
GPFS LUNs (force!) and destroying file systems.
------------------
Glen Corneau
Power Systems
Washington Systems Center
gcorneau at us.ibm.com


From:   "Felipe Knop" <knop at us.ibm.com>
To:     gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
Date:   07/28/2017 07:45 AM
Subject:        Re: [gpfsug-discuss] NSD V2 vs V1 - how can you tell?
Sent by:        gpfsug-discuss-bounces at spectrumscale.org


Bob,

I believe the NSD format version (v1 vs v2) is shown in the " format 
version" line that starts with "NSDid" :


# mmfsadm test readdescraw /dev/dm-11
NSD descriptor in sector 64 of /dev/dm-11
    NSDid: 9461C0A85788693A  format version: 1403 Label:

It should say "1403" when the format is v2.


  Felipe

----
Felipe Knop                                     knop at us.ibm.com
GPFS Development and Security
IBM Systems
IBM Building 008
2455 South Rd, Poughkeepsie, NY 12601
(845) 433-9314  T/L 293-9314


From:        "Oesterlin, Robert" <Robert.Oesterlin at nuance.com>
To:        gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
Date:        07/28/2017 07:47 AM
Subject:        Re: [gpfsug-discuss] NSD V2 vs V1 - how can you tell?
Sent by:        gpfsug-discuss-bounces at spectrumscale.org


Hi Scott
 
This refers to the file system format which is independent of the NSD 
version number. File systems can be upgraded but all the NSDs are still at 
V1. For instance, here is an NSD I know is still V1:
 
[root at gpfs2-gpfs01 ~]# grep msa0319VOL2 volmaps
msa0319VOL2   mpathel   (3600c0ff0001497e259ebac5001000000) dm-19     14T 
sdad  0[active][ready] sdft  1[active][ready] sdam  2[active][ready] sdgc 
3[active][ready]
[root at gpfs2-gpfs01 ~]# mmfsadm test readdescraw /dev/dm-19 | grep " 
original format"
    original format version 1001, cur version 1600 (mgr 1600, helper 1600, 
mnode 1600)
 
The file system version is current however.
 
Bob Oesterlin
Sr Principal Storage Engineer, Nuance 


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20170728/cae62e82/attachment.htm>

From scale at us.ibm.com  Sun Jul 30 04:22:25 2017
From: scale at us.ibm.com (IBM Spectrum Scale)
Date: Sat, 29 Jul 2017 23:22:25 -0400
Subject: [gpfsug-discuss] GPFS, LTFS/EE and data-in-inode?
In-Reply-To: <1500908233.4387.194.camel@buzzard.me.uk>
References: <33069.1500675853@turing-police.cc.vt.edu>,
	<28986.1500671597@turing-police.cc.vt.edu><CALssuR36fdFdpus2Js=TzfiH5oNwMYkYjN71gL=OzzoXq3z0Vg@mail.gmail.com><OF5380CE3B.211C61F7-ON00258167.00509905-00258167.005110B6@notes.na.collabserv.com>
	<1500908233.4387.194.camel@buzzard.me.uk>
Message-ID: <OFA858E000.7EA06D57-ON8525816D.001049F7-8525816D.00128053@notes.na.collabserv.com>

Jonathan, all,

We'll be introducing some clarification into the publications to highlight 
that data is not stored in the inode for encrypted files.


Regards, The Spectrum Scale (GPFS) team

------------------------------------------------------------------------------------------------------------------
If you feel that your question can benefit other users of  Spectrum Scale 
(GPFS), then please post it to the public IBM developerWroks Forum at 
https://www.ibm.com/developerworks/community/forums/html/forum?id=11111111-0000-0000-0000-000000000479
. 

If your query concerns a potential software error in Spectrum Scale (GPFS) 
and you have an IBM software maintenance contract please contact 
1-800-237-5511 in the United States or your local IBM Service Center in 
other countries. 

The forum is informally monitored as time permits and should not be used 
for priority messages to the Spectrum Scale (GPFS) team.


From:   Jonathan Buzzard <jonathan at buzzard.me.uk>
To:     gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
Date:   07/24/2017 10:57 AM
Subject:        Re: [gpfsug-discuss] GPFS, LTFS/EE and data-in-inode?
Sent by:        gpfsug-discuss-bounces at spectrumscale.org


On Mon, 2017-07-24 at 14:45 +0000, James Davis wrote:
> Hey all,
> 
> On the documentation of encryption restrictions and encryption/HAWC
> interplay...
> 
> The encryption documentation currently states:
> 
> "Secure storage uses encryption to make data unreadable to anyone who
> does not possess the necessary encryption keys...Only data, not
> metadata, is encrypted."
> 
> The HAWC restrictions include:
> 
> "Encrypted data is never stored in the recovery log..."
> 
> If this is unclear, I'm open to suggestions for improvements.
> 

Just because *DATA* is stored in the metadata does not make it magically
metadata. It's still data so you could quite reasonably conclude that it
is encrypted.

We have now been disabused of this, but the documentation is not clear
and needs clarifying. Perhaps say metadata blocks are not encrypted. Or
just a simple data stored in inodes is not encrypted would suffice.

JAB.

-- 
Jonathan A. Buzzard                 Email: jonathan (at) buzzard.me.uk
Fife, United Kingdom.

_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20170729/553410cd/attachment.htm>

From scale at us.ibm.com  Mon Jul 31 05:57:44 2017
From: scale at us.ibm.com (IBM Spectrum Scale)
Date: Mon, 31 Jul 2017 00:57:44 -0400
Subject: [gpfsug-discuss] Lost disks
In-Reply-To: <1501153088.26563.39.camel@buzzard.me.uk>
References: <B5B1EFA5-D0E7-4001-88C4-47D8A966C0BE@nuance.com>
	<1501153088.26563.39.camel@buzzard.me.uk>
Message-ID: <OF640C6AB4.45E7CD38-ON8525816E.001A89D1-8525816E.001B3A23@notes.na.collabserv.com>

Jonathan,

Regarding 

>> Thing is GPFS does not look at the NSD descriptors that much. So in my
>> case it was several days before it was noticed, and only then because I
>> rebooted the last NSD server as part of a rolling upgrade of GPFS. I
>> could have cruised for weeks/months with no NSD descriptors if I had 
not
>> restarted all the NSD servers. The moral of this is the overwrite could
>> have take place quite some time ago.

While GPFS does not normally read the NSD descriptors in the course of 
performing file system operations, as of 4.1.1 a periodic check is done on 
the content of various descriptors, and a message like

[E] On-disk NSD descriptor of <disk> is valid but has a different ID. ID 
in cache is <NSD ID> and ID on-disk is<NSD ID>

should get issued if the content of the descriptor on disk changes.


Regards, The Spectrum Scale (GPFS) team

------------------------------------------------------------------------------------------------------------------
If you feel that your question can benefit other users of  Spectrum Scale 
(GPFS), then please post it to the public IBM developerWroks Forum at 
https://www.ibm.com/developerworks/community/forums/html/forum?id=11111111-0000-0000-0000-000000000479
. 

If your query concerns a potential software error in Spectrum Scale (GPFS) 
and you have an IBM software maintenance contract please contact 
1-800-237-5511 in the United States or your local IBM Service Center in 
other countries. 

The forum is informally monitored as time permits and should not be used 
for priority messages to the Spectrum Scale (GPFS) team.


From:   Jonathan Buzzard <jonathan at buzzard.me.uk>
To:     gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
Date:   07/27/2017 06:58 AM
Subject:        Re: [gpfsug-discuss] Lost disks
Sent by:        gpfsug-discuss-bounces at spectrumscale.org


On Wed, 2017-07-26 at 17:45 +0000, Oesterlin, Robert wrote:
> One way this could possible happen would be a system is being
> installed (I?m assuming this is Linux) and the FC adapter is active;
> then the OS install will see disks and wipe out the NSD descriptor on
> those disks. (Which is why the NSD V2 format was invented, to prevent
> this from happening) If you don?t lose all of the descriptors, it?s
> sometimes possible to manually re-construct the missing header
> information - I?m assuming since you opened a PMR, IBM has looked at
> this. This is a scenario I?ve had to recover from - twice. Back-end
> array issue seems unlikely to me, I?d keep looking at the systems with
> access to those LUNs and see what commands/operations could have been
> run.

I would concur that this is the most likely scenario; an install where
for whatever reason the machine could see the disks and they are gone. I
know that RHEL6 and its derivatives will do that for you. Has happened
to me at previous place of work where another admin forgot to de-zone a
server, went to install CentOS6 as part of a cluster upgrade from
CentOS5 and overwrote all the NSD descriptors.

Thing is GPFS does not look at the NSD descriptors that much. So in my
case it was several days before it was noticed, and only then because I
rebooted the last NSD server as part of a rolling upgrade of GPFS. I
could have cruised for weeks/months with no NSD descriptors if I had not
restarted all the NSD servers. The moral of this is the overwrite could
have take place quite some time ago.

Basically if the disks are all missing then the NSD descriptor has been
overwritten, and the protestations of the client are irrelevant. The
chances of the disk array doing it to *ALL* the disks is somewhere
around ? IMHO.

JAB.

-- 
Jonathan A. Buzzard                 Email: jonathan (at) buzzard.me.uk
Fife, United Kingdom.

_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20170731/bdd7b03e/attachment.htm>

From Mark.Bush at siriuscom.com  Mon Jul 31 18:30:34 2017
From: Mark.Bush at siriuscom.com (Mark Bush)
Date: Mon, 31 Jul 2017 17:30:34 +0000
Subject: [gpfsug-discuss] Auditing
Message-ID: <BN6PR05MB3539A56FE59B73A44B3602EEE4B20@BN6PR05MB3539.namprd05.prod.outlook.com>

Does someone already have a policy that can extract typical file audit items (user_id, last written, opened/accessed, modified, deleted, etc)?  Am I barking up the wrong tree for this is there a better way to get this type of data from a Spectrum Scale filesystem?

Mark

This message (including any attachments) is intended only for the use of the individual or entity to which it is addressed and may contain information that is non-public, proprietary, privileged, confidential, and exempt from disclosure under applicable law. If you are not the intended recipient, you are hereby notified that any use, dissemination, distribution, or copying of this communication is strictly prohibited. This message may be viewed by parties at Sirius Computer Solutions other than those named in the message header. This message does not contain an official representation of Sirius Computer Solutions. If you have received this communication in error, notify Sirius Computer Solutions immediately and (i) destroy this message if a facsimile or (ii) delete this message immediately if this is an electronic communication. Thank you.

Sirius Computer Solutions<http://www.siriuscom.com>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20170731/637c697d/attachment.htm>

From Renar.Grunenberg at huk-coburg.de  Mon Jul 31 18:44:21 2017
From: Renar.Grunenberg at huk-coburg.de (Grunenberg, Renar)
Date: Mon, 31 Jul 2017 17:44:21 +0000
Subject: [gpfsug-discuss] Quota and hardlimit enforcement
Message-ID: <200a086c1740448da544e667c03887e5@SMXRF105.msg.hukrf.de>

Hallo All,
we are on Version 4.2.3.2 and see some missunderstandig in the enforcement of hardlimit definitions on a flieset quota. What we see is we put some 200 GB files on following quota definitions:  quota 150 GB Limit 250 GB Grace none.
After the creating of one 200 GB we hit the softquota limit, thats ok. But After the the second file was created!! we expect an io error but it don?t happen. We define all well know Parameters (-Q,..) on the filesystem . Is this a bug or a Feature? mmcheckquota are already running at first.
Regards Renar.


Renar Grunenberg
Abteilung Informatik ? Betrieb

HUK-COBURG
Bahnhofsplatz
96444 Coburg
Telefon:        09561 96-44110
Telefax:        09561 96-44104
E-Mail: Renar.Grunenberg at huk-coburg.de
Internet:       www.huk.de
________________________________
HUK-COBURG Haftpflicht-Unterst?tzungs-Kasse kraftfahrender Beamter Deutschlands a. G. in Coburg
Reg.-Gericht Coburg HRB 100; St.-Nr. 9212/101/00021
Sitz der Gesellschaft: Bahnhofsplatz, 96444 Coburg
Vorsitzender des Aufsichtsrats: Prof. Dr. Heinrich R. Schradin.
Vorstand: Klaus-J?rgen Heitmann (Sprecher), Stefan Gronbach, Dr. Hans Olav Her?y, Dr. J?rg Rheinl?nder (stv.), Sarah R?ssler, Daniel Thomas (stv.).
________________________________
Diese Nachricht enth?lt vertrauliche und/oder rechtlich gesch?tzte Informationen.
Wenn Sie nicht der richtige Adressat sind oder diese Nachricht irrt?mlich erhalten haben,
informieren Sie bitte sofort den Absender und vernichten Sie diese Nachricht.
Das unerlaubte Kopieren sowie die unbefugte Weitergabe dieser Nachricht ist nicht gestattet.

This information may contain confidential and/or privileged information.
If you are not the intended recipient (or have received this information in error) please notify the
sender immediately and destroy this information.
Any unauthorized copying, disclosure or distribution of the material in this information is strictly forbidden.
________________________________
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20170731/6db77e61/attachment.htm>

From eric.wonderley at vt.edu  Mon Jul 31 18:54:52 2017
From: eric.wonderley at vt.edu (J. Eric Wonderley)
Date: Mon, 31 Jul 2017 13:54:52 -0400
Subject: [gpfsug-discuss] Quota and hardlimit enforcement
In-Reply-To: <200a086c1740448da544e667c03887e5@SMXRF105.msg.hukrf.de>
References: <200a086c1740448da544e667c03887e5@SMXRF105.msg.hukrf.de>
Message-ID: <CABOSGQdZ7Cr07ZJgwhs4si+qrMgxb8fXwwv-FbUVA3mC=Qu=DA@mail.gmail.com>

Hi Renar:

What does 'mmlsquota -j fileset filesystem' report?

I did not think you would get a grace period of none unless the
hardlimit=softlimit.

On Mon, Jul 31, 2017 at 1:44 PM, Grunenberg, Renar <
Renar.Grunenberg at huk-coburg.de> wrote:

> Hallo All,
> we are on Version 4.2.3.2 and see some missunderstandig in the enforcement
> of hardlimit definitions on a flieset quota. What we see is we put some 200
> GB files on following quota definitions:  quota 150 GB Limit 250 GB Grace
> none.
> After the creating of one 200 GB we hit the softquota limit, thats ok. But
> After the the second file was created!! we expect an io error but it don?t
> happen. We define all well know Parameters (-Q,..) on the filesystem . Is
> this a bug or a Feature? mmcheckquota are already running at first.
> Regards Renar.
>
>
>
> Renar Grunenberg
> Abteilung Informatik ? Betrieb
>
> HUK-COBURG
> Bahnhofsplatz
> 96444 Coburg
> Telefon: 09561 96-44110
> Telefax: 09561 96-44104
> E-Mail: Renar.Grunenberg at huk-coburg.de
> Internet: www.huk.de
>
> ------------------------------
> HUK-COBURG Haftpflicht-Unterst?tzungs-Kasse kraftfahrender Beamter
> Deutschlands a. G. in Coburg
> Reg.-Gericht Coburg HRB 100; St.-Nr. 9212/101/00021
> Sitz der Gesellschaft: Bahnhofsplatz, 96444 Coburg
> Vorsitzender des Aufsichtsrats: Prof. Dr. Heinrich R. Schradin.
> Vorstand: Klaus-J?rgen Heitmann (Sprecher), Stefan Gronbach, Dr. Hans Olav
> Her?y, Dr. J?rg Rheinl?nder (stv.), Sarah R?ssler, Daniel Thomas (stv.).
> ------------------------------
> Diese Nachricht enth?lt vertrauliche und/oder rechtlich gesch?tzte
> Informationen.
> Wenn Sie nicht der richtige Adressat sind oder diese Nachricht irrt?mlich
> erhalten haben,
> informieren Sie bitte sofort den Absender und vernichten Sie diese
> Nachricht.
> Das unerlaubte Kopieren sowie die unbefugte Weitergabe dieser Nachricht
> ist nicht gestattet.
>
> This information may contain confidential and/or privileged information.
> If you are not the intended recipient (or have received this information
> in error) please notify the
> sender immediately and destroy this information.
> Any unauthorized copying, disclosure or distribution of the material in
> this information is strictly forbidden.
> ------------------------------
>
>
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20170731/09812f0b/attachment.htm>

From jfosburg at mdanderson.org  Mon Jul 31 18:56:46 2017
From: jfosburg at mdanderson.org (Fosburgh,Jonathan)
Date: Mon, 31 Jul 2017 17:56:46 +0000
Subject: [gpfsug-discuss] Auditing
In-Reply-To: <BN6PR05MB3539A56FE59B73A44B3602EEE4B20@BN6PR05MB3539.namprd05.prod.outlook.com>
References: <BN6PR05MB3539A56FE59B73A44B3602EEE4B20@BN6PR05MB3539.namprd05.prod.outlook.com>
Message-ID: <fd8a47c0-ae91-4554-9bf8-96900488dcad@mdanderson.org>

At present there is not a method to audit file access.

Jonathan Fosburgh
Principal Application Systems Analyst
Storage Team
IT Operations
jfosburg at mdanderson.org<mailto:jfosburg at mdanderson.org>
(713) 745-9346

On 07/31/2017 12:30 PM, Mark Bush wrote:
Does someone already have a policy that can extract typical file audit items (user_id, last written, opened/accessed, modified, deleted, etc)?  Am I barking up the wrong tree for this is there a better way to get this type of data from a Spectrum Scale filesystem?

Mark

This message (including any attachments) is intended only for the use of the individual or entity to which it is addressed and may contain information that is non-public, proprietary, privileged, confidential, and exempt from disclosure under applicable law. If you are not the intended recipient, you are hereby notified that any use, dissemination, distribution, or copying of this communication is strictly prohibited. This message may be viewed by parties at Sirius Computer Solutions other than those named in the message header. This message does not contain an official representation of Sirius Computer Solutions. If you have received this communication in error, notify Sirius Computer Solutions immediately and (i) destroy this message if a facsimile or (ii) delete this message immediately if this is an electronic communication. Thank you.

Sirius Computer Solutions<http://www.siriuscom.com>


_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


The information contained in this e-mail message may be privileged, confidential, and/or protected from disclosure. This e-mail message may contain protected health information (PHI); dissemination of PHI should comply with applicable federal and state laws. If you are not the intended recipient, or an authorized representative of the intended recipient, any further review, disclosure, use, dissemination, distribution, or copying of this message or any attachment (or the information contained therein) is strictly prohibited. If you think that you have received this e-mail message in error, please notify the sender by return e-mail and delete all references to it and its contents from your systems.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20170731/426292be/attachment.htm>

From Robert.Oesterlin at nuance.com  Mon Jul 31 19:02:30 2017
From: Robert.Oesterlin at nuance.com (Oesterlin, Robert)
Date: Mon, 31 Jul 2017 18:02:30 +0000
Subject: [gpfsug-discuss] Re  Auditing
Message-ID: <D1400798-7C82-4905-B0C3-4C56B6EA0D17@nuance.com>

We run a policy that looks like this:

-- cut here --
      define(daysToEpoch, days(timestamp('1970-01-01 00:00:00.0')))
      define(unixTS,
               char(int(
                 (( days(\$1) - daysToEpoch ) * 86400) +
                 (  hour(\$1) * 3600) +
                 (minute(\$1) * 60) +
                 (second(\$1))
               ))
            )

      rule 'dumpall' list '"$filesystem"' DIRECTORIES_PLUS
      SHOW(                                      '|' ||
            varchar(user_id)                  || '|' ||
            varchar(group_id)                 || '|' ||
            char(mode)                        || '|' ||
            varchar(file_size)                || '|' ||
            varchar(kb_allocated)             || '|' ||
            varchar(nlink)                    || '|' ||
            unixTS(access_time,19)            || '|' ||
            unixTS(modification_time)         || '|' ||
            unixTS(creation_time)             || '|' ||
            char(misc_attributes,1)           || '|'
          )
-- cut here --


Bob Oesterlin
Sr Principal Storage Engineer, Nuance


From: <gpfsug-discuss-bounces at spectrumscale.org> on behalf of Mark Bush <Mark.Bush at siriuscom.com>
Reply-To: gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
Date: Monday, July 31, 2017 at 12:31 PM
To: "gpfsug-discuss at spectrumscale.org" <gpfsug-discuss at spectrumscale.org>
Subject: [EXTERNAL] [gpfsug-discuss] Auditing

Does someone already have a policy that can extract typical file audit items (user_id, last written, opened/accessed, modified, deleted, etc)?  Am I barking up the wrong tree for this is there a better way to get this type of data from a Spectrum Scale filesystem?

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20170731/ab78e078/attachment.htm>

From Mark.Bush at siriuscom.com  Mon Jul 31 19:05:37 2017
From: Mark.Bush at siriuscom.com (Mark Bush)
Date: Mon, 31 Jul 2017 18:05:37 +0000
Subject: [gpfsug-discuss] Re  Auditing
In-Reply-To: <D1400798-7C82-4905-B0C3-4C56B6EA0D17@nuance.com>
References: <D1400798-7C82-4905-B0C3-4C56B6EA0D17@nuance.com>
Message-ID: <BN6PR05MB353909CE4954AE0B3D23D062E4B20@BN6PR05MB3539.namprd05.prod.outlook.com>

Brilliant.  Thanks Bob.

From: Oesterlin, Robert [mailto:Robert.Oesterlin at nuance.com]
Sent: Monday, July 31, 2017 1:03 PM
To: gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
Subject: [gpfsug-discuss] Re Auditing

We run a policy that looks like this:

-- cut here --
      define(daysToEpoch, days(timestamp('1970-01-01 00:00:00.0')))
      define(unixTS,
               char(int(
                 (( days(\$1) - daysToEpoch ) * 86400) +
                 (  hour(\$1) * 3600) +
                 (minute(\$1) * 60) +
                 (second(\$1))
               ))
            )

      rule 'dumpall' list '"$filesystem"' DIRECTORIES_PLUS
      SHOW(                                      '|' ||
            varchar(user_id)                  || '|' ||
            varchar(group_id)                 || '|' ||
            char(mode)                        || '|' ||
            varchar(file_size)                || '|' ||
            varchar(kb_allocated)             || '|' ||
            varchar(nlink)                    || '|' ||
            unixTS(access_time,19)            || '|' ||
            unixTS(modification_time)         || '|' ||
            unixTS(creation_time)             || '|' ||
            char(misc_attributes,1)           || '|'
          )
-- cut here --


Bob Oesterlin
Sr Principal Storage Engineer, Nuance


From: <gpfsug-discuss-bounces at spectrumscale.org<mailto:gpfsug-discuss-bounces at spectrumscale.org>> on behalf of Mark Bush <Mark.Bush at siriuscom.com<mailto:Mark.Bush at siriuscom.com>>
Reply-To: gpfsug main discussion list <gpfsug-discuss at spectrumscale.org<mailto:gpfsug-discuss at spectrumscale.org>>
Date: Monday, July 31, 2017 at 12:31 PM
To: "gpfsug-discuss at spectrumscale.org<mailto:gpfsug-discuss at spectrumscale.org>" <gpfsug-discuss at spectrumscale.org<mailto:gpfsug-discuss at spectrumscale.org>>
Subject: [EXTERNAL] [gpfsug-discuss] Auditing

Does someone already have a policy that can extract typical file audit items (user_id, last written, opened/accessed, modified, deleted, etc)?  Am I barking up the wrong tree for this is there a better way to get this type of data from a Spectrum Scale filesystem?


This message (including any attachments) is intended only for the use of the individual or entity to which it is addressed and may contain information that is non-public, proprietary, privileged, confidential, and exempt from disclosure under applicable law. If you are not the intended recipient, you are hereby notified that any use, dissemination, distribution, or copying of this communication is strictly prohibited. This message may be viewed by parties at Sirius Computer Solutions other than those named in the message header. This message does not contain an official representation of Sirius Computer Solutions. If you have received this communication in error, notify Sirius Computer Solutions immediately and (i) destroy this message if a facsimile or (ii) delete this message immediately if this is an electronic communication. Thank you.

Sirius Computer Solutions<http://www.siriuscom.com>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20170731/421667a8/attachment.htm>

From makaplan at us.ibm.com  Mon Jul 31 19:26:52 2017
From: makaplan at us.ibm.com (Marc A Kaplan)
Date: Mon, 31 Jul 2017 14:26:52 -0400
Subject: [gpfsug-discuss] Re  Auditing - timestamps
In-Reply-To: <BN6PR05MB353909CE4954AE0B3D23D062E4B20@BN6PR05MB3539.namprd05.prod.outlook.com>
References: <D1400798-7C82-4905-B0C3-4C56B6EA0D17@nuance.com>
	<BN6PR05MB353909CE4954AE0B3D23D062E4B20@BN6PR05MB3539.namprd05.prod.outlook.com>
Message-ID: <OF9B2C3B80.5A8867D9-ON8525816E.00651032-8525816E.00655507@notes.na.collabserv.com>

The "ILM" chapter in the Admin Guide has some tips, among which:

18. You can convert a time interval value to a number of seconds with the 
SQL cast syntax, as in the
following example:

define([toSeconds],[(($1) SECONDS(12,6))])
define([toUnixSeconds],[toSeconds($1 - ?1970-1-1 at 0:00?)])

RULE external list b
RULE list b SHOW(?sinceNow=? 
toSeconds(current_timestamp-modification_time) )
RULE external list c
RULE list c SHOW(?sinceUnixEpoch=? toUnixSeconds(modification_time) )

The following method is also supported:

define(access_age_in_days,( INTEGER(( (CURRENT_TIMESTAMP - ACCESS_TIME) 
SECONDS)) /(24*3600.0) ) )
RULE external list w exec ??
RULE list w weight(access_age_in_days) show(access_age_in_days)

--marc of GPFS

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20170731/04856369/attachment.htm>

From pinto at scinet.utoronto.ca  Mon Jul 31 19:46:53 2017
From: pinto at scinet.utoronto.ca (Jaime Pinto)
Date: Mon, 31 Jul 2017 14:46:53 -0400
Subject: [gpfsug-discuss] Quota and hardlimit enforcement
In-Reply-To: <200a086c1740448da544e667c03887e5@SMXRF105.msg.hukrf.de>
References: <200a086c1740448da544e667c03887e5@SMXRF105.msg.hukrf.de>
Message-ID: <20170731144653.160355y5whmerokd@support.scinet.utoronto.ca>

Renar

For as long as the usage is below the hard limit (space or inodes) and  
below the grace period you'll be able to write.

I don't think you can set the grace period to an specific value as a  
quota parameter, such as none. That is set at the filesystem creation  
time. BTW, grace period limit has been a mystery to me for many years.  
My impression is that GPFS keeps changing it internally depending on  
the position of the moon. I think ours is 2 hours, but at times I can  
see users writing for longer.

Jaime


Quoting "Grunenberg, Renar" <Renar.Grunenberg at huk-coburg.de>:

> Hallo All,
> we are on Version 4.2.3.2 and see some missunderstandig in the   
> enforcement of hardlimit definitions on a flieset quota. What we see  
>  is we put some 200 GB files on following quota definitions:  quota   
> 150 GB Limit 250 GB Grace none.
> After the creating of one 200 GB we hit the softquota limit, thats   
> ok. But After the the second file was created!! we expect an io   
> error but it don?t happen. We define all well know Parameters   
> (-Q,..) on the filesystem . Is this a bug or a Feature? mmcheckquota  
>  are already running at first.
> Regards Renar.
>
>
> Renar Grunenberg
> Abteilung Informatik ? Betrieb
>
> HUK-COBURG
> Bahnhofsplatz
> 96444 Coburg
> Telefon:        09561 96-44110
> Telefax:        09561 96-44104
> E-Mail: Renar.Grunenberg at huk-coburg.de
> Internet:       www.huk.de
> ________________________________
> HUK-COBURG Haftpflicht-Unterst?tzungs-Kasse kraftfahrender Beamter  
> Deutschlands a. G. in Coburg
> Reg.-Gericht Coburg HRB 100; St.-Nr. 9212/101/00021
> Sitz der Gesellschaft: Bahnhofsplatz, 96444 Coburg
> Vorsitzender des Aufsichtsrats: Prof. Dr. Heinrich R. Schradin.
> Vorstand: Klaus-J?rgen Heitmann (Sprecher), Stefan Gronbach, Dr.  
> Hans Olav Her?y, Dr. J?rg Rheinl?nder (stv.), Sarah R?ssler, Daniel  
> Thomas (stv.).
> ________________________________
> Diese Nachricht enth?lt vertrauliche und/oder rechtlich gesch?tzte  
> Informationen.
> Wenn Sie nicht der richtige Adressat sind oder diese Nachricht  
> irrt?mlich erhalten haben,
> informieren Sie bitte sofort den Absender und vernichten Sie diese Nachricht.
> Das unerlaubte Kopieren sowie die unbefugte Weitergabe dieser   
> Nachricht ist nicht gestattet.
>
> This information may contain confidential and/or privileged information.
> If you are not the intended recipient (or have received this   
> information in error) please notify the
> sender immediately and destroy this information.
> Any unauthorized copying, disclosure or distribution of the material  
>  in this information is strictly forbidden.
> ________________________________
>


          ************************************
           TELL US ABOUT YOUR SUCCESS STORIES
          http://www.scinethpc.ca/testimonials
          ************************************
---
Jaime Pinto
SciNet HPC Consortium - Compute/Calcul Canada
www.scinet.utoronto.ca - www.computecanada.ca
University of Toronto
661 University Ave. (MaRS), Suite 1140
Toronto, ON, M5G1M1
P: 416-978-2755
C: 416-505-1477

----------------------------------------------------------------
This message was sent using IMP at SciNet Consortium, University of Toronto.


From Renar.Grunenberg at huk-coburg.de  Mon Jul 31 20:04:56 2017
From: Renar.Grunenberg at huk-coburg.de (Grunenberg, Renar)
Date: Mon, 31 Jul 2017 19:04:56 +0000
Subject: [gpfsug-discuss] Quota and hardlimit enforcement
In-Reply-To: <CABOSGQdZ7Cr07ZJgwhs4si+qrMgxb8fXwwv-FbUVA3mC=Qu=DA@mail.gmail.com>
References: <200a086c1740448da544e667c03887e5@SMXRF105.msg.hukrf.de>
	<CABOSGQdZ7Cr07ZJgwhs4si+qrMgxb8fXwwv-FbUVA3mC=Qu=DA@mail.gmail.com>
Message-ID: <afc9e0c2e9bf423ba4e209c93c10fdee@SMXRF105.msg.hukrf.de>

Hallo J. Eric, hallo Jaime,
Ok after we hit the softlimit we see that the graceperiod are go to 7 days. I think that?s the default. But was does it mean.
After we reach the ?hard?-limit. we see additionaly the gbytes  in_doubt.
My interpretation now we can write many gb to the nospace-left event in the filesystem.
But our intention is to restricted some application to write only to the hardlimit in the fileset. Any hints to accomplish this?


Renar Grunenberg
Abteilung Informatik ? Betrieb

HUK-COBURG
Bahnhofsplatz
96444 Coburg
Telefon:        09561 96-44110
Telefax:        09561 96-44104
E-Mail: Renar.Grunenberg at huk-coburg.de
Internet:       www.huk.de
________________________________
HUK-COBURG Haftpflicht-Unterst?tzungs-Kasse kraftfahrender Beamter Deutschlands a. G. in Coburg
Reg.-Gericht Coburg HRB 100; St.-Nr. 9212/101/00021
Sitz der Gesellschaft: Bahnhofsplatz, 96444 Coburg
Vorsitzender des Aufsichtsrats: Prof. Dr. Heinrich R. Schradin.
Vorstand: Klaus-J?rgen Heitmann (Sprecher), Stefan Gronbach, Dr. Hans Olav Her?y, Dr. J?rg Rheinl?nder (stv.), Sarah R?ssler, Daniel Thomas (stv.).
________________________________
Diese Nachricht enth?lt vertrauliche und/oder rechtlich gesch?tzte Informationen.
Wenn Sie nicht der richtige Adressat sind oder diese Nachricht irrt?mlich erhalten haben,
informieren Sie bitte sofort den Absender und vernichten Sie diese Nachricht.
Das unerlaubte Kopieren sowie die unbefugte Weitergabe dieser Nachricht ist nicht gestattet.

This information may contain confidential and/or privileged information.
If you are not the intended recipient (or have received this information in error) please notify the
sender immediately and destroy this information.
Any unauthorized copying, disclosure or distribution of the material in this information is strictly forbidden.
________________________________
Von: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] Im Auftrag von J. Eric Wonderley
Gesendet: Montag, 31. Juli 2017 19:55
An: gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
Betreff: Re: [gpfsug-discuss] Quota and hardlimit enforcement

Hi Renar:
What does 'mmlsquota -j fileset filesystem' report?
I did not think you would get a grace period of none unless the hardlimit=softlimit.

On Mon, Jul 31, 2017 at 1:44 PM, Grunenberg, Renar <Renar.Grunenberg at huk-coburg.de<mailto:Renar.Grunenberg at huk-coburg.de>> wrote:
Hallo All,
we are on Version 4.2.3.2 and see some missunderstandig in the enforcement of hardlimit definitions on a flieset quota. What we see is we put some 200 GB files on following quota definitions:  quota 150 GB Limit 250 GB Grace none.
After the creating of one 200 GB we hit the softquota limit, thats ok. But After the the second file was created!! we expect an io error but it don?t happen. We define all well know Parameters (-Q,..) on the filesystem . Is this a bug or a Feature? mmcheckquota are already running at first.
Regards Renar.


Renar Grunenberg
Abteilung Informatik ? Betrieb

HUK-COBURG
Bahnhofsplatz
96444 Coburg
Telefon:

09561 96-44110

Telefax:

09561 96-44104

E-Mail:

Renar.Grunenberg at huk-coburg.de<mailto:Renar.Grunenberg at huk-coburg.de>

Internet:

www.huk.de<http://www.huk.de>

________________________________
HUK-COBURG Haftpflicht-Unterst?tzungs-Kasse kraftfahrender Beamter Deutschlands a. G. in Coburg
Reg.-Gericht Coburg HRB 100; St.-Nr. 9212/101/00021
Sitz der Gesellschaft: Bahnhofsplatz, 96444 Coburg
Vorsitzender des Aufsichtsrats: Prof. Dr. Heinrich R. Schradin.
Vorstand: Klaus-J?rgen Heitmann (Sprecher), Stefan Gronbach, Dr. Hans Olav Her?y, Dr. J?rg Rheinl?nder (stv.), Sarah R?ssler, Daniel Thomas (stv.).
________________________________
Diese Nachricht enth?lt vertrauliche und/oder rechtlich gesch?tzte Informationen.
Wenn Sie nicht der richtige Adressat sind oder diese Nachricht irrt?mlich erhalten haben,
informieren Sie bitte sofort den Absender und vernichten Sie diese Nachricht.
Das unerlaubte Kopieren sowie die unbefugte Weitergabe dieser Nachricht ist nicht gestattet.

This information may contain confidential and/or privileged information.
If you are not the intended recipient (or have received this information in error) please notify the
sender immediately and destroy this information.
Any unauthorized copying, disclosure or distribution of the material in this information is strictly forbidden.
________________________________

_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org<http://spectrumscale.org>
http://gpfsug.org/mailman/listinfo/gpfsug-discuss

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20170731/d4b999f3/attachment.htm>

From Kevin.Buterbaugh at Vanderbilt.Edu  Mon Jul 31 20:21:46 2017
From: Kevin.Buterbaugh at Vanderbilt.Edu (Buterbaugh, Kevin L)
Date: Mon, 31 Jul 2017 19:21:46 +0000
Subject: [gpfsug-discuss] Quota and hardlimit enforcement
In-Reply-To: <afc9e0c2e9bf423ba4e209c93c10fdee@SMXRF105.msg.hukrf.de>
References: <200a086c1740448da544e667c03887e5@SMXRF105.msg.hukrf.de>
	<CABOSGQdZ7Cr07ZJgwhs4si+qrMgxb8fXwwv-FbUVA3mC=Qu=DA@mail.gmail.com>
	<afc9e0c2e9bf423ba4e209c93c10fdee@SMXRF105.msg.hukrf.de>
Message-ID: <F9A6E5DC-1268-4BC2-84D5-67F3A48116C7@vanderbilt.edu>

Hi Renar,

I?m sure this is the case, but I don?t see anywhere in this thread where this is explicitly stated ? you?re not doing your tests as root, are you?  root, of course, is not bound by any quotas.

Kevin

On Jul 31, 2017, at 2:04 PM, Grunenberg, Renar <Renar.Grunenberg at huk-coburg.de<mailto:Renar.Grunenberg at huk-coburg.de>> wrote:


Hallo J. Eric, hallo Jaime,
Ok after we hit the softlimit we see that the graceperiod are go to 7 days. I think that?s the default. But was does it mean.
After we reach the ?hard?-limit. we see additionaly the gbytes  in_doubt.
My interpretation now we can write many gb to the nospace-left event in the filesystem.
But our intention is to restricted some application to write only to the hardlimit in the fileset. Any hints to accomplish this?


Renar Grunenberg
Abteilung Informatik ? Betrieb

HUK-COBURG
Bahnhofsplatz
96444 Coburg
Telefon:        09561 96-44110
Telefax:        09561 96-44104
E-Mail: Renar.Grunenberg at huk-coburg.de<mailto:Renar.Grunenberg at huk-coburg.de>
Internet:       www.huk.de<http://www.huk.de/>

________________________________
HUK-COBURG Haftpflicht-Unterst?tzungs-Kasse kraftfahrender Beamter Deutschlands a. G. in Coburg
Reg.-Gericht Coburg HRB 100; St.-Nr. 9212/101/00021
Sitz der Gesellschaft: Bahnhofsplatz, 96444 Coburg
Vorsitzender des Aufsichtsrats: Prof. Dr. Heinrich R. Schradin.
Vorstand: Klaus-J?rgen Heitmann (Sprecher), Stefan Gronbach, Dr. Hans Olav Her?y, Dr. J?rg Rheinl?nder (stv.), Sarah R?ssler, Daniel Thomas (stv.).
________________________________
Diese Nachricht enth?lt vertrauliche und/oder rechtlich gesch?tzte Informationen.
Wenn Sie nicht der richtige Adressat sind oder diese Nachricht irrt?mlich erhalten haben,
informieren Sie bitte sofort den Absender und vernichten Sie diese Nachricht.
Das unerlaubte Kopieren sowie die unbefugte Weitergabe dieser Nachricht ist nicht gestattet.

This information may contain confidential and/or privileged information.
If you are not the intended recipient (or have received this information in error) please notify the
sender immediately and destroy this information.
Any unauthorized copying, disclosure or distribution of the material in this information is strictly forbidden.
________________________________


Von: gpfsug-discuss-bounces at spectrumscale.org<mailto:gpfsug-discuss-bounces at spectrumscale.org> [mailto:gpfsug-discuss-bounces at spectrumscale.org] Im Auftrag von J. Eric Wonderley
Gesendet: Montag, 31. Juli 2017 19:55
An: gpfsug main discussion list <gpfsug-discuss at spectrumscale.org<mailto:gpfsug-discuss at spectrumscale.org>>
Betreff: Re: [gpfsug-discuss] Quota and hardlimit enforcement

Hi Renar:
What does 'mmlsquota -j fileset filesystem' report?
I did not think you would get a grace period of none unless the hardlimit=softlimit.

On Mon, Jul 31, 2017 at 1:44 PM, Grunenberg, Renar <Renar.Grunenberg at huk-coburg.de<mailto:Renar.Grunenberg at huk-coburg.de>> wrote:
Hallo All,
we are on Version 4.2.3.2 and see some missunderstandig in the enforcement of hardlimit definitions on a flieset quota. What we see is we put some 200 GB files on following quota definitions:  quota 150 GB Limit 250 GB Grace none.
After the creating of one 200 GB we hit the softquota limit, thats ok. But After the the second file was created!! we expect an io error but it don?t happen. We define all well know Parameters (-Q,..) on the filesystem . Is this a bug or a Feature? mmcheckquota are already running at first.
Regards Renar.


Renar Grunenberg
Abteilung Informatik ? Betrieb

HUK-COBURG
Bahnhofsplatz
96444 Coburg

Telefon:

09561 96-44110

Telefax:

09561 96-44104

E-Mail:

Renar.Grunenberg at huk-coburg.de<mailto:Renar.Grunenberg at huk-coburg.de>

Internet:

www.huk.de<http://www.huk.de/>

________________________________
HUK-COBURG Haftpflicht-Unterst?tzungs-Kasse kraftfahrender Beamter Deutschlands a. G. in Coburg
Reg.-Gericht Coburg HRB 100; St.-Nr. 9212/101/00021
Sitz der Gesellschaft: Bahnhofsplatz, 96444 Coburg
Vorsitzender des Aufsichtsrats: Prof. Dr. Heinrich R. Schradin.
Vorstand: Klaus-J?rgen Heitmann (Sprecher), Stefan Gronbach, Dr. Hans Olav Her?y, Dr. J?rg Rheinl?nder (stv.), Sarah R?ssler, Daniel Thomas (stv.).
________________________________
Diese Nachricht enth?lt vertrauliche und/oder rechtlich gesch?tzte Informationen.
Wenn Sie nicht der richtige Adressat sind oder diese Nachricht irrt?mlich erhalten haben,
informieren Sie bitte sofort den Absender und vernichten Sie diese Nachricht.
Das unerlaubte Kopieren sowie die unbefugte Weitergabe dieser Nachricht ist nicht gestattet.

This information may contain confidential and/or privileged information.
If you are not the intended recipient (or have received this information in error) please notify the
sender immediately and destroy this information.
Any unauthorized copying, disclosure or distribution of the material in this information is strictly forbidden.
________________________________

_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org<http://spectrumscale.org/>
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org<http://spectrumscale.org/>
http://gpfsug.org/mailman/listinfo/gpfsug-discuss

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20170731/bfbb464d/attachment.htm>

From Renar.Grunenberg at huk-coburg.de  Mon Jul 31 20:30:20 2017
From: Renar.Grunenberg at huk-coburg.de (Grunenberg, Renar)
Date: Mon, 31 Jul 2017 19:30:20 +0000
Subject: [gpfsug-discuss] Quota and hardlimit enforcement
In-Reply-To: <F9A6E5DC-1268-4BC2-84D5-67F3A48116C7@vanderbilt.edu>
References: <200a086c1740448da544e667c03887e5@SMXRF105.msg.hukrf.de>
	<CABOSGQdZ7Cr07ZJgwhs4si+qrMgxb8fXwwv-FbUVA3mC=Qu=DA@mail.gmail.com>
	<afc9e0c2e9bf423ba4e209c93c10fdee@SMXRF105.msg.hukrf.de>
	<F9A6E5DC-1268-4BC2-84D5-67F3A48116C7@vanderbilt.edu>
Message-ID: <a3ec7c197d1c4f3594f8e76830aa2b2f@SMXRF105.msg.hukrf.de>

Hallo Kevin,
thanks for your hint i will check these tomorrow, and yes as root, lol.
Regards Renar


Renar Grunenberg
Abteilung Informatik ? Betrieb

HUK-COBURG
Bahnhofsplatz
96444 Coburg
Telefon:        09561 96-44110
Telefax:        09561 96-44104
E-Mail: Renar.Grunenberg at huk-coburg.de
Internet:       www.huk.de
________________________________
HUK-COBURG Haftpflicht-Unterst?tzungs-Kasse kraftfahrender Beamter Deutschlands a. G. in Coburg
Reg.-Gericht Coburg HRB 100; St.-Nr. 9212/101/00021
Sitz der Gesellschaft: Bahnhofsplatz, 96444 Coburg
Vorsitzender des Aufsichtsrats: Prof. Dr. Heinrich R. Schradin.
Vorstand: Klaus-J?rgen Heitmann (Sprecher), Stefan Gronbach, Dr. Hans Olav Her?y, Dr. J?rg Rheinl?nder (stv.), Sarah R?ssler, Daniel Thomas (stv.).
________________________________
Diese Nachricht enth?lt vertrauliche und/oder rechtlich gesch?tzte Informationen.
Wenn Sie nicht der richtige Adressat sind oder diese Nachricht irrt?mlich erhalten haben,
informieren Sie bitte sofort den Absender und vernichten Sie diese Nachricht.
Das unerlaubte Kopieren sowie die unbefugte Weitergabe dieser Nachricht ist nicht gestattet.

This information may contain confidential and/or privileged information.
If you are not the intended recipient (or have received this information in error) please notify the
sender immediately and destroy this information.
Any unauthorized copying, disclosure or distribution of the material in this information is strictly forbidden.
________________________________
Von: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] Im Auftrag von Buterbaugh, Kevin L
Gesendet: Montag, 31. Juli 2017 21:22
An: gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
Betreff: Re: [gpfsug-discuss] Quota and hardlimit enforcement

Hi Renar,

I?m sure this is the case, but I don?t see anywhere in this thread where this is explicitly stated ? you?re not doing your tests as root, are you?  root, of course, is not bound by any quotas.

Kevin

On Jul 31, 2017, at 2:04 PM, Grunenberg, Renar <Renar.Grunenberg at huk-coburg.de<mailto:Renar.Grunenberg at huk-coburg.de>> wrote:


Hallo J. Eric, hallo Jaime,
Ok after we hit the softlimit we see that the graceperiod are go to 7 days. I think that?s the default. But was does it mean.
After we reach the ?hard?-limit. we see additionaly the gbytes  in_doubt.
My interpretation now we can write many gb to the nospace-left event in the filesystem.
But our intention is to restricted some application to write only to the hardlimit in the fileset. Any hints to accomplish this?


Renar Grunenberg
Abteilung Informatik ? Betrieb

HUK-COBURG
Bahnhofsplatz
96444 Coburg
Telefon:

09561 96-44110

Telefax:

09561 96-44104

E-Mail:

Renar.Grunenberg at huk-coburg.de<mailto:Renar.Grunenberg at huk-coburg.de>

Internet:

www.huk.de<http://www.huk.de/>

________________________________
HUK-COBURG Haftpflicht-Unterst?tzungs-Kasse kraftfahrender Beamter Deutschlands a. G. in Coburg
Reg.-Gericht Coburg HRB 100; St.-Nr. 9212/101/00021
Sitz der Gesellschaft: Bahnhofsplatz, 96444 Coburg
Vorsitzender des Aufsichtsrats: Prof. Dr. Heinrich R. Schradin.
Vorstand: Klaus-J?rgen Heitmann (Sprecher), Stefan Gronbach, Dr. Hans Olav Her?y, Dr. J?rg Rheinl?nder (stv.), Sarah R?ssler, Daniel Thomas (stv.).
________________________________
Diese Nachricht enth?lt vertrauliche und/oder rechtlich gesch?tzte Informationen.
Wenn Sie nicht der richtige Adressat sind oder diese Nachricht irrt?mlich erhalten haben,
informieren Sie bitte sofort den Absender und vernichten Sie diese Nachricht.
Das unerlaubte Kopieren sowie die unbefugte Weitergabe dieser Nachricht ist nicht gestattet.

This information may contain confidential and/or privileged information.
If you are not the intended recipient (or have received this information in error) please notify the
sender immediately and destroy this information.
Any unauthorized copying, disclosure or distribution of the material in this information is strictly forbidden.

________________________________


Von: gpfsug-discuss-bounces at spectrumscale.org<mailto:gpfsug-discuss-bounces at spectrumscale.org> [mailto:gpfsug-discuss-bounces at spectrumscale.org] Im Auftrag von J. Eric Wonderley
Gesendet: Montag, 31. Juli 2017 19:55
An: gpfsug main discussion list <gpfsug-discuss at spectrumscale.org<mailto:gpfsug-discuss at spectrumscale.org>>
Betreff: Re: [gpfsug-discuss] Quota and hardlimit enforcement

Hi Renar:
What does 'mmlsquota -j fileset filesystem' report?
I did not think you would get a grace period of none unless the hardlimit=softlimit.

On Mon, Jul 31, 2017 at 1:44 PM, Grunenberg, Renar <Renar.Grunenberg at huk-coburg.de<mailto:Renar.Grunenberg at huk-coburg.de>> wrote:
Hallo All,
we are on Version 4.2.3.2 and see some missunderstandig in the enforcement of hardlimit definitions on a flieset quota. What we see is we put some 200 GB files on following quota definitions:  quota 150 GB Limit 250 GB Grace none.
After the creating of one 200 GB we hit the softquota limit, thats ok. But After the the second file was created!! we expect an io error but it don?t happen. We define all well know Parameters (-Q,..) on the filesystem . Is this a bug or a Feature? mmcheckquota are already running at first.
Regards Renar.

Renar Grunenberg
Abteilung Informatik ? Betrieb

HUK-COBURG
Bahnhofsplatz
96444 Coburg
Telefon:

09561 96-44110

Telefax:

09561 96-44104

E-Mail:

Renar.Grunenberg at huk-coburg.de<mailto:Renar.Grunenberg at huk-coburg.de>

Internet:

www.huk.de<http://www.huk.de/>

________________________________
HUK-COBURG Haftpflicht-Unterst?tzungs-Kasse kraftfahrender Beamter Deutschlands a. G. in Coburg
Reg.-Gericht Coburg HRB 100; St.-Nr. 9212/101/00021
Sitz der Gesellschaft: Bahnhofsplatz, 96444 Coburg
Vorsitzender des Aufsichtsrats: Prof. Dr. Heinrich R. Schradin.
Vorstand: Klaus-J?rgen Heitmann (Sprecher), Stefan Gronbach, Dr. Hans Olav Her?y, Dr. J?rg Rheinl?nder (stv.), Sarah R?ssler, Daniel Thomas (stv.).
________________________________
Diese Nachricht enth?lt vertrauliche und/oder rechtlich gesch?tzte Informationen.
Wenn Sie nicht der richtige Adressat sind oder diese Nachricht irrt?mlich erhalten haben,
informieren Sie bitte sofort den Absender und vernichten Sie diese Nachricht.
Das unerlaubte Kopieren sowie die unbefugte Weitergabe dieser Nachricht ist nicht gestattet.

This information may contain confidential and/or privileged information.
If you are not the intended recipient (or have received this information in error) please notify the
sender immediately and destroy this information.
Any unauthorized copying, disclosure or distribution of the material in this information is strictly forbidden.
________________________________

_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org<http://spectrumscale.org/>
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org<http://spectrumscale.org/>
http://gpfsug.org/mailman/listinfo/gpfsug-discuss

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20170731/9558f894/attachment.htm>

From pinto at scinet.utoronto.ca  Mon Jul 31 21:03:53 2017
From: pinto at scinet.utoronto.ca (Jaime Pinto)
Date: Mon, 31 Jul 2017 16:03:53 -0400
Subject: [gpfsug-discuss] Quota and hardlimit enforcement
In-Reply-To: <F9A6E5DC-1268-4BC2-84D5-67F3A48116C7@vanderbilt.edu>
References: <200a086c1740448da544e667c03887e5@SMXRF105.msg.hukrf.de>
	<CABOSGQdZ7Cr07ZJgwhs4si+qrMgxb8fXwwv-FbUVA3mC=Qu=DA@mail.gmail.com>
	<afc9e0c2e9bf423ba4e209c93c10fdee@SMXRF105.msg.hukrf.de>
	<F9A6E5DC-1268-4BC2-84D5-67F3A48116C7@vanderbilt.edu>
Message-ID: <20170731160353.54412s4i1r957eax@support.scinet.utoronto.ca>

In addition, the in_doubt column is a function of the data turn-over  
and the internal gpfs accounting synchronization period (beyond root  
control). The higher the in_doubt values the less accurate the real  
amount of space/inodes a user/group/fileset has in the filesystem.

What I noticed in practice is the the in_doubt values only get worst  
overtime, and work against the quotas, making them hit the limits  
sooner. Therefore, you may wish to run a 'mmcheckquota' crontab job  
once or twice a day, to reset the in_doubt column to zero mover often.  
GPFS has a very high lag to do this on its own in the most recent  
versions, and seldom really catches up on a very active filesystem.

If your grace period is set to 7 days I can assure you that in an HPC  
environment it's the equivalent of not having quotas effectively. You  
should set it to 2 hours or 4 hours.

In an environment such as ours a runway process can easily generate  
500TB of data or 1 billion inodes in few hours, and choke the file  
system to all users/jobs.

Jaime


Quoting "Buterbaugh, Kevin L" <Kevin.Buterbaugh at Vanderbilt.Edu>:

> Hi Renar,
>
> I?m sure this is the case, but I don?t see anywhere in this thread   
> where this is explicitly stated ? you?re not doing your tests as   
> root, are you?  root, of course, is not bound by any quotas.
>
> Kevin
>
> On Jul 31, 2017, at 2:04 PM, Grunenberg, Renar   
> <Renar.Grunenberg at huk-coburg.de<mailto:Renar.Grunenberg at huk-coburg.de>>   
> wrote:
>
>
> Hallo J. Eric, hallo Jaime,
> Ok after we hit the softlimit we see that the graceperiod are go to   
> 7 days. I think that?s the default. But was does it mean.
> After we reach the ?hard?-limit. we see additionaly the gbytes  in_doubt.
> My interpretation now we can write many gb to the nospace-left event  
>  in the filesystem.
> But our intention is to restricted some application to write only to  
>  the hardlimit in the fileset. Any hints to accomplish this?
>
>
>
> Renar Grunenberg
> Abteilung Informatik ? Betrieb
>
> HUK-COBURG
> Bahnhofsplatz
> 96444 Coburg
> Telefon:        09561 96-44110
> Telefax:        09561 96-44104
> E-Mail: Renar.Grunenberg at huk-coburg.de<mailto:Renar.Grunenberg at huk-coburg.de>
> Internet:       www.huk.de<http://www.huk.de/>
>
> ________________________________
> HUK-COBURG Haftpflicht-Unterst?tzungs-Kasse kraftfahrender Beamter  
> Deutschlands a. G. in Coburg
> Reg.-Gericht Coburg HRB 100; St.-Nr. 9212/101/00021
> Sitz der Gesellschaft: Bahnhofsplatz, 96444 Coburg
> Vorsitzender des Aufsichtsrats: Prof. Dr. Heinrich R. Schradin.
> Vorstand: Klaus-J?rgen Heitmann (Sprecher), Stefan Gronbach, Dr.  
> Hans Olav Her?y, Dr. J?rg Rheinl?nder (stv.), Sarah R?ssler, Daniel  
> Thomas (stv.).
> ________________________________
> Diese Nachricht enth?lt vertrauliche und/oder rechtlich gesch?tzte  
> Informationen.
> Wenn Sie nicht der richtige Adressat sind oder diese Nachricht  
> irrt?mlich erhalten haben,
> informieren Sie bitte sofort den Absender und vernichten Sie diese Nachricht.
> Das unerlaubte Kopieren sowie die unbefugte Weitergabe dieser   
> Nachricht ist nicht gestattet.
>
> This information may contain confidential and/or privileged information.
> If you are not the intended recipient (or have received this   
> information in error) please notify the
> sender immediately and destroy this information.
> Any unauthorized copying, disclosure or distribution of the material  
>  in this information is strictly forbidden.
> ________________________________
>
>
>
> Von:   
> gpfsug-discuss-bounces at spectrumscale.org<mailto:gpfsug-discuss-bounces at spectrumscale.org> [mailto:gpfsug-discuss-bounces at spectrumscale.org] Im Auftrag von J. Eric   
> Wonderley
> Gesendet: Montag, 31. Juli 2017 19:55
> An: gpfsug main discussion list   
> <gpfsug-discuss at spectrumscale.org<mailto:gpfsug-discuss at spectrumscale.org>>
> Betreff: Re: [gpfsug-discuss] Quota and hardlimit enforcement
>
> Hi Renar:
> What does 'mmlsquota -j fileset filesystem' report?
> I did not think you would get a grace period of none unless the   
> hardlimit=softlimit.
>
> On Mon, Jul 31, 2017 at 1:44 PM, Grunenberg, Renar   
> <Renar.Grunenberg at huk-coburg.de<mailto:Renar.Grunenberg at huk-coburg.de>>   
> wrote:
> Hallo All,
> we are on Version 4.2.3.2 and see some missunderstandig in the   
> enforcement of hardlimit definitions on a flieset quota. What we see  
>  is we put some 200 GB files on following quota definitions:  quota   
> 150 GB Limit 250 GB Grace none.
> After the creating of one 200 GB we hit the softquota limit, thats   
> ok. But After the the second file was created!! we expect an io   
> error but it don?t happen. We define all well know Parameters   
> (-Q,..) on the filesystem . Is this a bug or a Feature? mmcheckquota  
>  are already running at first.
> Regards Renar.
>
>
> Renar Grunenberg
> Abteilung Informatik ? Betrieb
>
> HUK-COBURG
> Bahnhofsplatz
> 96444 Coburg
>
> Telefon:
>
> 09561 96-44110
>
> Telefax:
>
> 09561 96-44104
>
> E-Mail:
>
> Renar.Grunenberg at huk-coburg.de<mailto:Renar.Grunenberg at huk-coburg.de>
>
> Internet:
>
> www.huk.de<http://www.huk.de/>
>
> ________________________________
> HUK-COBURG Haftpflicht-Unterst?tzungs-Kasse kraftfahrender Beamter  
> Deutschlands a. G. in Coburg
> Reg.-Gericht Coburg HRB 100; St.-Nr. 9212/101/00021
> Sitz der Gesellschaft: Bahnhofsplatz, 96444 Coburg
> Vorsitzender des Aufsichtsrats: Prof. Dr. Heinrich R. Schradin.
> Vorstand: Klaus-J?rgen Heitmann (Sprecher), Stefan Gronbach, Dr.  
> Hans Olav Her?y, Dr. J?rg Rheinl?nder (stv.), Sarah R?ssler, Daniel  
> Thomas (stv.).
> ________________________________
> Diese Nachricht enth?lt vertrauliche und/oder rechtlich gesch?tzte  
> Informationen.
> Wenn Sie nicht der richtige Adressat sind oder diese Nachricht  
> irrt?mlich erhalten haben,
> informieren Sie bitte sofort den Absender und vernichten Sie diese Nachricht.
> Das unerlaubte Kopieren sowie die unbefugte Weitergabe dieser   
> Nachricht ist nicht gestattet.
>
> This information may contain confidential and/or privileged information.
> If you are not the intended recipient (or have received this   
> information in error) please notify the
> sender immediately and destroy this information.
> Any unauthorized copying, disclosure or distribution of the material  
>  in this information is strictly forbidden.
> ________________________________
>
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org<http://spectrumscale.org/>
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>
>
>
>
>
>
>
>
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org<http://spectrumscale.org/>
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>
>


          ************************************
           TELL US ABOUT YOUR SUCCESS STORIES
          http://www.scinethpc.ca/testimonials
          ************************************
---
Jaime Pinto
SciNet HPC Consortium - Compute/Calcul Canada
www.scinet.utoronto.ca - www.computecanada.ca
University of Toronto
661 University Ave. (MaRS), Suite 1140
Toronto, ON, M5G1M1
P: 416-978-2755
C: 416-505-1477

----------------------------------------------------------------
This message was sent using IMP at SciNet Consortium, University of Toronto.


From Kevin.Buterbaugh at Vanderbilt.Edu  Mon Jul 31 21:11:14 2017
From: Kevin.Buterbaugh at Vanderbilt.Edu (Buterbaugh, Kevin L)
Date: Mon, 31 Jul 2017 20:11:14 +0000
Subject: [gpfsug-discuss] Quota and hardlimit enforcement
In-Reply-To: <20170731160353.54412s4i1r957eax@support.scinet.utoronto.ca>
References: <200a086c1740448da544e667c03887e5@SMXRF105.msg.hukrf.de>
	<CABOSGQdZ7Cr07ZJgwhs4si+qrMgxb8fXwwv-FbUVA3mC=Qu=DA@mail.gmail.com>
	<afc9e0c2e9bf423ba4e209c93c10fdee@SMXRF105.msg.hukrf.de>
	<F9A6E5DC-1268-4BC2-84D5-67F3A48116C7@vanderbilt.edu>
	<20170731160353.54412s4i1r957eax@support.scinet.utoronto.ca>
Message-ID: <3789811E-523F-47AE-93F3-E2985DD84D60@vanderbilt.edu>

Jaime,

That?s heavily workload dependent.  We run a traditional HPC cluster and have a 7 day grace on home and 14 days on scratch.  By setting the soft and hard limits appropriately we?ve slammed the door on many a runaway user / group / fileset.  YMMV?

Kevin

On Jul 31, 2017, at 3:03 PM, Jaime Pinto <pinto at scinet.utoronto.ca<mailto:pinto at scinet.utoronto.ca>> wrote:

If your grace period is set to 7 days I can assure you that in an HPC environment it's the equivalent of not having quotas effectively. You should set it to 2 hours or 4 hours.

In an environment such as ours a runway process can easily generate 500TB of data or 1 billion inodes in few hours, and choke the file system to all users/jobs.

Jaime


?
Kevin Buterbaugh - Senior System Administrator
Vanderbilt University - Advanced Computing Center for Research and Education
Kevin.Buterbaugh at vanderbilt.edu<mailto:Kevin.Buterbaugh at vanderbilt.edu> - (615)875-9633


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20170731/afc44c76/attachment.htm>