From andreas.mattsson at maxiv.lu.se  Fri Jan  4 09:09:03 2019
From: andreas.mattsson at maxiv.lu.se (Andreas Mattsson)
Date: Fri, 4 Jan 2019 09:09:03 +0000
Subject: [gpfsug-discuss] Filesystem access issues via CES NFS
In-Reply-To: <af909f36-46a5-fdd3-bc88-f0ba8f992f7a@science-computing.de>
References: <717f49aade0b439eb1b99fc620a21cac@maxiv.lu.se>
	<OFECA70069.3479DE09-ON0025834D.0039C75B-0025834D.0039E756@notes.na.collabserv.com>
	<9456645b0a1f4b488b13874ea672b9b8@maxiv.lu.se>,
	<af909f36-46a5-fdd3-bc88-f0ba8f992f7a@science-computing.de>
Message-ID: <dbbb31c88c4f46eab71c9e52db958a2a@maxiv.lu.se>

Just reporting back that the issue we had seems to have been solved. In our case it was fixed by applying hotfix-packages from IBM. Did this in December and I can no longer trigger the issue. Hopefully, it'll stay fixed when we get full production load on the system again now in January.

Also, as far as I can see, it looks like Scale 5.0.2.2 includes these packages already.


Regards,

Andreas mattsson

____________________________________________

[X]

Andreas Mattsson

Systems Engineer


MAX IV Laboratory
Lund University
P.O. Box 118, SE-221 00 Lund, Sweden
Visiting address: Fotongatan 2, 224 84 Lund
Mobile: +46 706 64 95 44
<mailto:andreas.mattsson at maxiv.se>andreas.mattsson at maxiv.lu.se<mailto:andreas.mattsson at maxiv.lu.se>
www.maxiv.se<http://www.maxiv.se/>

________________________________
Fr?n: gpfsug-discuss-bounces at spectrumscale.org <gpfsug-discuss-bounces at spectrumscale.org> f?r Ulrich Sibiller <u.sibiller at science-computing.de>
Skickat: den 13 december 2018 14:52:42
Till: gpfsug-discuss at spectrumscale.org
?mne: Re: [gpfsug-discuss] Filesystem access issues via CES NFS

On 23.11.2018 14:41, Andreas Mattsson wrote:
> Yes, this is repeating.
>
> We?ve ascertained that it has nothing to do at all with file operations on the GPFS side.
>
> Randomly throughout the filesystem mounted via NFS, ls or file access will give
>
> ?
>
>  > ls: reading directory /gpfs/filessystem/test/testdir: Invalid argument
>
> ?
>
> Trying again later might work on that folder, but might fail somewhere else.
>
> We have tried exporting the same filesystem via a standard kernel NFS instead of the CES
> Ganesha-NFS, and then the problem doesn?t exist.
>
> So it is definitely related to the Ganesha NFS server, or its interaction with the file system.
>  > Will see if I can get a tcpdump of the issue.

We see this, too. We cannot trigger it. Fortunately I have managed to capture some logs with
debugging enabled. I have now dug into the ganesha 2.5.3 code and I think the netgroup caching is
the culprit.

Here some FULL_DEBUG output:
2018-12-13 11:53:41 : epoch 0009008d : server1 : gpfs.ganesha.nfsd-258762[work-250]
export_check_access :EXPORT :M_DBG :Check for address 1.2.3.4 for export id 1 path /gpfsexport
2018-12-13 11:53:41 : epoch 0009008d : server1 : gpfs.ganesha.nfsd-258762[work-250] client_match
:EXPORT :M_DBG :Match V4: 0xcf7fe0 NETGROUP_CLIENT: netgroup1 (options=421021e2root_squash   , RWrw,
3--, ---, TCP, ----, Manage_Gids   , -- Deleg, anon_uid=    -2, anon_gid=    -2, sys)
2018-12-13 11:53:41 : epoch 0009008d : server1 : gpfs.ganesha.nfsd-258762[work-250] nfs_ip_name_get
:DISP :F_DBG :Cache get hit for 1.2.3.4->client1.domain
2018-12-13 11:53:41 : epoch 0009008d : server1 : gpfs.ganesha.nfsd-258762[work-250] client_match
:EXPORT :M_DBG :Match V4: 0xcfe320 NETGROUP_CLIENT: netgroup2 (options=421021e2root_squash   , RWrw,
3--, ---, TCP, ----, Manage_Gids   , -- Deleg, anon_uid=    -2, anon_gid=    -2, sys)
2018-12-13 11:53:41 : epoch 0009008d : server1 : gpfs.ganesha.nfsd-258762[work-250] nfs_ip_name_get
:DISP :F_DBG :Cache get hit for 1.2.3.4->client1.domain
2018-12-13 11:53:41 : epoch 0009008d : server1 : gpfs.ganesha.nfsd-258762[work-250] client_match
:EXPORT :M_DBG :Match V4: 0xcfe380 NETGROUP_CLIENT: netgroup3 (options=421021e2root_squash   , RWrw,
3--, ---, TCP, ----, Manage_Gids   , -- Deleg, anon_uid=    -2, anon_gid=    -2, sys)
2018-12-13 11:53:41 : epoch 0009008d : server1 : gpfs.ganesha.nfsd-258762[work-250] nfs_ip_name_get
:DISP :F_DBG :Cache get hit for 1.2.3.4->client1.domain
2018-12-13 11:53:41 : epoch 0009008d : server1 : gpfs.ganesha.nfsd-258762[work-250]
export_check_access :EXPORT :M_DBG :EXPORT          (options=03303002              ,     ,    ,
      ,               , -- Deleg,                ,                )
2018-12-13 11:53:41 : epoch 0009008d : server1 : gpfs.ganesha.nfsd-258762[work-250]
export_check_access :EXPORT :M_DBG :EXPORT_DEFAULTS (options=42102002root_squash   , ----, 3--, ---,
TCP, ----, Manage_Gids   ,         , anon_uid=    -2, anon_gid=    -2, sys)
2018-12-13 11:53:41 : epoch 0009008d : server1 : gpfs.ganesha.nfsd-258762[work-250]
export_check_access :EXPORT :M_DBG :default options (options=03303002root_squash   , ----, 34-, UDP,
TCP, ----, No Manage_Gids, -- Deleg, anon_uid=    -2, anon_gid=    -2, none, sys)
2018-12-13 11:53:41 : epoch 0009008d : server1 : gpfs.ganesha.nfsd-258762[work-250]
export_check_access :EXPORT :M_DBG :Final options   (options=42102002root_squash   , ----, 3--, ---,
TCP, ----, Manage_Gids   , -- Deleg, anon_uid=    -2, anon_gid=    -2, sys)
2018-12-13 11:53:41 : epoch 0009008d : server1 : gpfs.ganesha.nfsd-258762[work-250] nfs_rpc_execute
:DISP :INFO :DISP: INFO: Client ::ffff:1.2.3.4 is not allowed to access Export_Id 1 /gpfsexport,
vers=3, proc=18

The client "client1" is definitely a member of the "netgroup1". But the NETGROUP_CLIENT lookups for
"netgroup2" and "netgroup3" can only happen if the netgroup caching code reports that "client1" is
NOT a member of "netgroup1".

I have also opened a support case at IBM for this.

@Malahal: Looks like you have written the netgroup caching code, feel free to ask for further
details if required.

Kind regards,

Ulrich Sibiller

--
Dipl.-Inf. Ulrich Sibiller           science + computing ag
System Administration                    Hagellocher Weg 73
                                     72070 Tuebingen, Germany
                           https://atos.net/de/deutschland/sc
--
Science + Computing AG
Vorstandsvorsitzender/Chairman of the board of management:
Dr. Martin Matzke
Vorstand/Board of Management:
Matthias Schempp, Sabine Hohenstein
Vorsitzender des Aufsichtsrats/
Chairman of the Supervisory Board:
Philippe Miltin
Aufsichtsrat/Supervisory Board:
Martin Wibbe, Ursula Morgenstern
Sitz/Registered Office: Tuebingen
Registergericht/Registration Court: Stuttgart
Registernummer/Commercial Register No.: HRB 382196
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20190104/2084ad3b/attachment.htm>

From roblogie at au1.ibm.com  Tue Jan  8 21:49:51 2019
From: roblogie at au1.ibm.com (Rob Logie)
Date: Tue, 8 Jan 2019 21:49:51 +0000
Subject: [gpfsug-discuss] User Login Active Directory authentication on CES
	nodes with SMB protocol
Message-ID: <OFC65C4DA1.E1658C56-ON0025837C.0077EB6E-1546984191417@notes.na.collabserv.com>

Hi All

Is there a way to enable User Login Active Directory authentication on CES
nodes with SMB protocol that are joined to an AD domain. ?  The AD
authentication is working for access to the SMB shares, but not for user
login authentication on the CES nodes.

 
Thanks !

 
Regards,

Rob Logie
IT Specialist


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20190108/24236064/attachment.htm>

From lgayne at us.ibm.com  Tue Jan  8 21:53:51 2019
From: lgayne at us.ibm.com (Lyle Gayne)
Date: Tue, 8 Jan 2019 16:53:51 -0500
Subject: [gpfsug-discuss] User Login Active Directory authentication on
	CES	nodes with SMB protocol
In-Reply-To: <OFC65C4DA1.E1658C56-ON0025837C.0077EB6E-1546984191417@notes.na.collabserv.com>
References: <OFC65C4DA1.E1658C56-ON0025837C.0077EB6E-1546984191417@notes.na.collabserv.com>
Message-ID: <OFE8BB25EF.C3F238EA-ON0025837C.00783DD0-8525837C.00784976@notes.na.collabserv.com>


Adding Ingo Meents for response


From:	"Rob Logie" <roblogie at au1.ibm.com>
To:	gpfsug-discuss at spectrumscale.org
Date:	01/08/2019 04:50 PM
Subject:	[gpfsug-discuss] User Login Active Directory authentication on
            CES	nodes with SMB protocol
Sent by:	gpfsug-discuss-bounces at spectrumscale.org


Hi All
Is there a way to enable User Login Active Directory authentication on CES
nodes with SMB protocol that are joined to an AD domain. ?  The AD
authentication is working for access to the SMB shares, but not for user
login authentication on the CES nodes.


Thanks !


 Regards,                                                                                   
 Rob Logie                                                                                  
 IT Specialist                                                                              
                                                                                            
                                                                                            
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20190108/b05a7781/attachment.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: graycol.gif
Type: image/gif
Size: 105 bytes
Desc: not available
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20190108/b05a7781/attachment.gif>

From arc at b4restore.com  Wed Jan  9 10:25:13 2019
From: arc at b4restore.com (Andi Rhod Christiansen)
Date: Wed, 9 Jan 2019 10:25:13 +0000
Subject: [gpfsug-discuss] Spectrum Scale protocol node service separation.
Message-ID: <aa0c0d511f584364b7e33a7db9f63f9e@B4RWEX01.internal.b4restore.com>

Hi,

I seem to be unable to find any information on separating protocol services on specific CES nodes within a cluster. Does anyone know if it is possible to take, lets say 4 of the ces nodes within a cluster and dividing them into two and have two of the running SMB and the other two running OBJ instead of having them all run both services?

If it is possible it would be great to hear pros and cons about doing this ?

Thanks in advance!

Venlig hilsen / Best Regards

Andi Christiansen
IT Solution Specialist


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20190109/54d69147/attachment.htm>

From abeattie at au1.ibm.com  Wed Jan  9 11:16:49 2019
From: abeattie at au1.ibm.com (Andrew Beattie)
Date: Wed, 9 Jan 2019 11:16:49 +0000
Subject: [gpfsug-discuss] Spectrum Scale protocol node service
	separation.
In-Reply-To: <aa0c0d511f584364b7e33a7db9f63f9e@B4RWEX01.internal.b4restore.com>
References: <aa0c0d511f584364b7e33a7db9f63f9e@B4RWEX01.internal.b4restore.com>
Message-ID: <OF9559B450.89F5DE3F-ON0025837D.003DD227-0025837D.003DF6F6@notes.na.collabserv.com>

An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20190109/94167a28/attachment.htm>

From S.J.Thompson at bham.ac.uk  Wed Jan  9 12:19:30 2019
From: S.J.Thompson at bham.ac.uk (Simon Thompson)
Date: Wed, 9 Jan 2019 12:19:30 +0000
Subject: [gpfsug-discuss] Spectrum Scale protocol node service
 separation.
Message-ID: <5ABB423F-71AF-4469-9FDA-589EA8872B86@bham.ac.uk>

You have to run all services on all nodes ( ? ) actually its technically possible to remove the packages once protocols is running on the node, but next time you reboot the node, it will get marked unhealthy and you spend an hour working out why? <AHEM>

But what we do to split load is have different IPs assigned to different CES groups and then assign the SMB nodes to the SMB group IPs etc ?

Technically a user could still connect to the NFS (in our case) IPs with SMB protocol, but there?s not a lot we can do about that ? though our upstream firewall drops said traffic.

Simon

From: <gpfsug-discuss-bounces at spectrumscale.org> on behalf of "arc at b4restore.com" <arc at b4restore.com>
Reply-To: "gpfsug-discuss at spectrumscale.org" <gpfsug-discuss at spectrumscale.org>
Date: Wednesday, 9 January 2019 at 10:31
To: "gpfsug-discuss at spectrumscale.org" <gpfsug-discuss at spectrumscale.org>
Subject: [gpfsug-discuss] Spectrum Scale protocol node service separation.

Hi,

I seem to be unable to find any information on separating protocol services on specific CES nodes within a cluster. Does anyone know if it is possible to take, lets say 4 of the ces nodes within a cluster and dividing them into two and have two of the running SMB and the other two running OBJ instead of having them all run both services?

If it is possible it would be great to hear pros and cons about doing this ?

Thanks in advance!

Venlig hilsen / Best Regards

Andi Christiansen
IT Solution Specialist


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20190109/03435dfb/attachment.htm>

From arc at b4restore.com  Wed Jan  9 13:23:17 2019
From: arc at b4restore.com (Andi Rhod Christiansen)
Date: Wed, 9 Jan 2019 13:23:17 +0000
Subject: [gpfsug-discuss] Spectrum Scale protocol node
	service	separation.
In-Reply-To: <OF9559B450.89F5DE3F-ON0025837D.003DD227-0025837D.003DF6F6@notes.na.collabserv.com>
References: <aa0c0d511f584364b7e33a7db9f63f9e@B4RWEX01.internal.b4restore.com>
	<OF9559B450.89F5DE3F-ON0025837D.003DD227-0025837D.003DF6F6@notes.na.collabserv.com>
Message-ID: <1886db2cdf074bf0aaa151c395d300d5@B4RWEX01.internal.b4restore.com>

Hi Andrew,

Where can I request such a feature? ?

Venlig hilsen / Best Regards

Andi Rhod Christiansen

Fra: gpfsug-discuss-bounces at spectrumscale.org <gpfsug-discuss-bounces at spectrumscale.org> P? vegne af Andrew Beattie
Sendt: 9. januar 2019 12:17
Til: gpfsug-discuss at spectrumscale.org
Cc: gpfsug-discuss at spectrumscale.org
Emne: Re: [gpfsug-discuss] Spectrum Scale protocol node service separation.

Andi,

All the CES nodes in the same cluster will share the same protocol exports
if you want to separate them you need to create remote mount clusters and export the additional protocols via the remote mount

it would actually be a useful RFE to have the ablity to create CES groups attached to the base cluster and by group create exports of different protocols, but its not available today.
Andrew Beattie
Software Defined Storage  - IT Specialist
Phone: 614-2133-7927
E-mail: abeattie at au1.ibm.com<mailto:abeattie at au1.ibm.com>


----- Original message -----
From: Andi Rhod Christiansen <arc at b4restore.com<mailto:arc at b4restore.com>>
Sent by: gpfsug-discuss-bounces at spectrumscale.org<mailto:gpfsug-discuss-bounces at spectrumscale.org>
To: gpfsug main discussion list <gpfsug-discuss at spectrumscale.org<mailto:gpfsug-discuss at spectrumscale.org>>
Cc:
Subject: [gpfsug-discuss] Spectrum Scale protocol node service separation.
Date: Wed, Jan 9, 2019 8:31 PM


Hi,


I seem to be unable to find any information on separating protocol services on specific CES nodes within a cluster. Does anyone know if it is possible to take, lets say 4 of the ces nodes within a cluster and dividing them into two and have two of the running SMB and the other two running OBJ instead of having them all run both services?


If it is possible it would be great to hear pros and cons about doing this ?


Thanks in advance!


Venlig hilsen / Best Regards

Andi Christiansen
IT Solution Specialist


_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20190109/bc6432f2/attachment.htm>

From arc at b4restore.com  Wed Jan  9 13:24:30 2019
From: arc at b4restore.com (Andi Rhod Christiansen)
Date: Wed, 9 Jan 2019 13:24:30 +0000
Subject: [gpfsug-discuss] Spectrum Scale protocol node service
 separation.
In-Reply-To: <5ABB423F-71AF-4469-9FDA-589EA8872B86@bham.ac.uk>
References: <5ABB423F-71AF-4469-9FDA-589EA8872B86@bham.ac.uk>
Message-ID: <a92739a568ec4e73b68de624127d2319@B4RWEX01.internal.b4restore.com>

Hi Simon,

It was actually also the only solution I found if I want to keep them within the same cluster ?

Thanks for the reply, I will see what we figure out !

Venlig hilsen / Best Regards

Andi Rhod Christiansen

Fra: gpfsug-discuss-bounces at spectrumscale.org <gpfsug-discuss-bounces at spectrumscale.org> P? vegne af Simon Thompson
Sendt: 9. januar 2019 13:20
Til: gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
Emne: Re: [gpfsug-discuss] Spectrum Scale protocol node service separation.

You have to run all services on all nodes ( ? ) actually its technically possible to remove the packages once protocols is running on the node, but next time you reboot the node, it will get marked unhealthy and you spend an hour working out why? <AHEM>

But what we do to split load is have different IPs assigned to different CES groups and then assign the SMB nodes to the SMB group IPs etc ?

Technically a user could still connect to the NFS (in our case) IPs with SMB protocol, but there?s not a lot we can do about that ? though our upstream firewall drops said traffic.

Simon

From: <gpfsug-discuss-bounces at spectrumscale.org<mailto:gpfsug-discuss-bounces at spectrumscale.org>> on behalf of "arc at b4restore.com<mailto:arc at b4restore.com>" <arc at b4restore.com<mailto:arc at b4restore.com>>
Reply-To: "gpfsug-discuss at spectrumscale.org<mailto:gpfsug-discuss at spectrumscale.org>" <gpfsug-discuss at spectrumscale.org<mailto:gpfsug-discuss at spectrumscale.org>>
Date: Wednesday, 9 January 2019 at 10:31
To: "gpfsug-discuss at spectrumscale.org<mailto:gpfsug-discuss at spectrumscale.org>" <gpfsug-discuss at spectrumscale.org<mailto:gpfsug-discuss at spectrumscale.org>>
Subject: [gpfsug-discuss] Spectrum Scale protocol node service separation.

Hi,

I seem to be unable to find any information on separating protocol services on specific CES nodes within a cluster. Does anyone know if it is possible to take, lets say 4 of the ces nodes within a cluster and dividing them into two and have two of the running SMB and the other two running OBJ instead of having them all run both services?

If it is possible it would be great to hear pros and cons about doing this ?

Thanks in advance!

Venlig hilsen / Best Regards

Andi Christiansen
IT Solution Specialist


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20190109/83819399/attachment.htm>

From Paul.Sanchez at deshaw.com  Wed Jan  9 14:05:48 2019
From: Paul.Sanchez at deshaw.com (Sanchez, Paul)
Date: Wed, 9 Jan 2019 14:05:48 +0000
Subject: [gpfsug-discuss] Spectrum Scale protocol node service
	separation.
In-Reply-To: <aa0c0d511f584364b7e33a7db9f63f9e@B4RWEX01.internal.b4restore.com>
References: <aa0c0d511f584364b7e33a7db9f63f9e@B4RWEX01.internal.b4restore.com>
Message-ID: <53ec54bb621242109a789e51d61b1377@mbxtoa1.winmail.deshaw.com>

The docs say: ?CES supports the following export protocols: NFS, SMB, object, and iSCSI (block). Each protocol can be enabled or disabled in the cluster. If a protocol is enabled in the CES cluster, all CES nodes serve that protocol.? Which would seem to indicate that the answer is ?no?.

This kind of thing is another good reason to license Scale by storage capacity rather than by sockets (PVU).  This approach was already a good idea due to the flexibility it allows to scale manager, quorum, and NSD server nodes for performance and high-availability without affecting your software licensing costs.  This can result in better design and the flexibility to more quickly respond to new problems by adding server nodes.

So assuming you?re not on the old PVU licensing model, it is trivial to deploy as many gateway nodes as needed to separate these into distinct remote clusters.  You can create an object gateway cluster, and a CES gateway cluster each which only mounts and exports what is necessary.  You can even virtualize these servers and host them on the same hardware, if you?re into that.

-Paul

From: gpfsug-discuss-bounces at spectrumscale.org <gpfsug-discuss-bounces at spectrumscale.org> On Behalf Of Andi Rhod Christiansen
Sent: Wednesday, January 9, 2019 5:25 AM
To: gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
Subject: [gpfsug-discuss] Spectrum Scale protocol node service separation.

Hi,

I seem to be unable to find any information on separating protocol services on specific CES nodes within a cluster. Does anyone know if it is possible to take, lets say 4 of the ces nodes within a cluster and dividing them into two and have two of the running SMB and the other two running OBJ instead of having them all run both services?

If it is possible it would be great to hear pros and cons about doing this ?

Thanks in advance!

Venlig hilsen / Best Regards

Andi Christiansen
IT Solution Specialist

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20190109/7f9ad3f8/attachment.htm>

From S.J.Thompson at bham.ac.uk  Wed Jan  9 16:35:37 2019
From: S.J.Thompson at bham.ac.uk (Simon Thompson)
Date: Wed, 9 Jan 2019 16:35:37 +0000
Subject: [gpfsug-discuss] Spectrum Scale protocol node
	service	separation.
In-Reply-To: <53ec54bb621242109a789e51d61b1377@mbxtoa1.winmail.deshaw.com>
References: <aa0c0d511f584364b7e33a7db9f63f9e@B4RWEX01.internal.b4restore.com>,
	<53ec54bb621242109a789e51d61b1377@mbxtoa1.winmail.deshaw.com>
Message-ID: <CF45EE16DEF2FE4B9AA7FF2B6EE265450129793888@EX13.adf.bham.ac.uk>

I think only recently was remote cluster support added (though we have been doing it since CES was released).

I agree that capacity licenses have freed us to implement a better solution.. no longer do we run quorum/token managers on nsd nodes to reduce socket costs.

I believe socket based licenses are also about to or already no longer available for new customers (existing customers can continue to buy).

Carl can probably comment on this?

Simon
________________________________________
From: gpfsug-discuss-bounces at spectrumscale.org [gpfsug-discuss-bounces at spectrumscale.org] on behalf of Paul.Sanchez at deshaw.com [Paul.Sanchez at deshaw.com]
Sent: 09 January 2019 14:05
To: gpfsug main discussion list
Subject: Re: [gpfsug-discuss] Spectrum Scale protocol node service      separation.

The docs say: ?CES supports the following export protocols: NFS, SMB, object, and iSCSI (block). Each protocol can be enabled or disabled in the cluster. If a protocol is enabled in the CES cluster, all CES nodes serve that protocol.? Which would seem to indicate that the answer is ?no?.

This kind of thing is another good reason to license Scale by storage capacity rather than by sockets (PVU).  This approach was already a good idea due to the flexibility it allows to scale manager, quorum, and NSD server nodes for performance and high-availability without affecting your software licensing costs.  This can result in better design and the flexibility to more quickly respond to new problems by adding server nodes.

So assuming you?re not on the old PVU licensing model, it is trivial to deploy as many gateway nodes as needed to separate these into distinct remote clusters.  You can create an object gateway cluster, and a CES gateway cluster each which only mounts and exports what is necessary.  You can even virtualize these servers and host them on the same hardware, if you?re into that.

-Paul

From: gpfsug-discuss-bounces at spectrumscale.org <gpfsug-discuss-bounces at spectrumscale.org> On Behalf Of Andi Rhod Christiansen
Sent: Wednesday, January 9, 2019 5:25 AM
To: gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
Subject: [gpfsug-discuss] Spectrum Scale protocol node service separation.

Hi,

I seem to be unable to find any information on separating protocol services on specific CES nodes within a cluster. Does anyone know if it is possible to take, lets say 4 of the ces nodes within a cluster and dividing them into two and have two of the running SMB and the other two running OBJ instead of having them all run both services?

If it is possible it would be great to hear pros and cons about doing this ?

Thanks in advance!

Venlig hilsen / Best Regards

Andi Christiansen
IT Solution Specialist


From aspalazz at us.ibm.com  Wed Jan  9 17:21:03 2019
From: aspalazz at us.ibm.com (Aaron S Palazzolo)
Date: Wed, 9 Jan 2019 17:21:03 +0000
Subject: [gpfsug-discuss] Spectrum Scale protocol node service separation
In-Reply-To: <mailman.42.1547043146.1125.gpfsug-discuss@spectrumscale.org>
Message-ID: <OFE9E2E24B.D50D18A9-ON0025837D.005D7A26-0025837D.005F4FC8@notes.na.collabserv.com>

An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20190109/4e1aafb6/attachment.htm>

From S.J.Thompson at bham.ac.uk  Wed Jan  9 18:04:47 2019
From: S.J.Thompson at bham.ac.uk (Simon Thompson)
Date: Wed, 9 Jan 2019 18:04:47 +0000
Subject: [gpfsug-discuss] Spectrum Scale protocol node service separation
In-Reply-To: <OFE9E2E24B.D50D18A9-ON0025837D.005D7A26-0025837D.005F4FC8@notes.na.collabserv.com>
References: <mailman.42.1547043146.1125.gpfsug-discuss@spectrumscale.org>,
	<OFE9E2E24B.D50D18A9-ON0025837D.005D7A26-0025837D.005F4FC8@notes.na.collabserv.com>
Message-ID: <CF45EE16DEF2FE4B9AA7FF2B6EE2654501297938FA@EX13.adf.bham.ac.uk>

Can you use node affinity within CES groups?

For example I have some shiny new servers I want to normally use. If I plan maintenance, I move the IP to another shiny box. But I also have some old off support legacy hardware that I'm happy to use in a DR situation (e.g. they are in another site). So I want a group for my SMB boxes and NFS boxes, but have affinity normally, and then have old hardware in case of failure.

Whilst we're on protocols, are there any restrictions on using mixed architectures? I don't recall seeing this but... E.g. my new shiny boxes are ppc64le systems and my old legacy nodes are x86. It's all ctdb locking right .. (ok maybe mixing be and le hosts would be bad)

(Sure I'll take a performance hit when I fail to the old nodes, but that is better than no service).

Simon
________________________________________
From: gpfsug-discuss-bounces at spectrumscale.org [gpfsug-discuss-bounces at spectrumscale.org] on behalf of aspalazz at us.ibm.com [aspalazz at us.ibm.com]
Sent: 09 January 2019 17:21
To: gpfsug-discuss at spectrumscale.org
Cc: gpfsug-discuss at spectrumscale.org
Subject: Re: [gpfsug-discuss] Spectrum Scale protocol node service separation

Hey guys - I wanted to reply from the Scale development side.....

First off, consider CES as a stack and the implications of such:
- all protocols are installed on all nodes
- if a specific protocol is enabled (SMB, NFS, OBJ, Block), it's enabled for all protocol nodes
- if a specific protocol is started (SMB, NFS, OBJ, Block), it's started on all nodes by default, unless manually specified.

As was indicated in the e-mail chain, you don't want to be removing rpms to create a subset of nodes serving various protocols as this will cause overall issues.  You also don't want to manually be disabling protocols on some nodes/not others in order to achieve nodes that are 'only serving' SMB, for instance.  Doing this manual stopping/starting of protocols isn't something that will adhere to failover.

===============================================================
A few possible solutions if you want to segregate protocols to specific nodes are:
===============================================================
1) CES-Groups in combination with specific IPs / DNS hostnames that correspond to each protocol.
- As mentioned, this can still be bypassed if someone attempts a mount using an IP/DNS name not set for their protocol.  However, you could probably prevent some of this with an external firewall rule.
- Using CES-Groups confines the IPs/DNS hostnames to very specific nodes

2) Firewall rules
- This is best if done external to the cluster, and at a level that can restrict specific protocol traffic to specific IPs/hostnames
- combine this with #1 for the best results.
- Although it may work, try to stay away from crazy firewall rules on each protocol node itself as this can get confusing very quickly.  It's easier if you can set this up external to the nodes.

3) Similar to above but using Node Affinity CES-IP policy - but no CES groups.
- Upside is node-affinity will attempt to keep your CES-IPs associated with specific nodes.  So if you restrict specific protocol traffic to specific IPs, then they'll stay on nodes you designate
- Watch out for failovers.  In error cases (or upgrades) where an IP needs to move to another node, it obviously can't remain on the node that's having issues.  This means you may have protocol trafffic crossover when this occurs.

4) A separate remote cluster for each CES protocol
- In this example, you could make fairly small remote clusters (although we recommend 2->3nodes at least for failover purposes).  The local cluster would provide the storage.  The remote clusters would mount it.  One remote cluster could have only SMB enabled.  Another remote cluster could have only OBJ enabled.  etc...

------
I hope this helps a bit....


Regards,

Aaron Palazzolo
IBM Spectrum Scale Deployment, Infrastructure, Virtualization
9042 S Rita Road, Tucson AZ 85744
Phone: 520-799-5161, T/L: 321-5161
E-mail: aspalazz at us.ibm.com


----- Original message -----
From: gpfsug-discuss-request at spectrumscale.org
Sent by: gpfsug-discuss-bounces at spectrumscale.org
To: gpfsug-discuss at spectrumscale.org
Cc:
Subject: gpfsug-discuss Digest, Vol 84, Issue 4
Date: Wed, Jan 9, 2019 7:13 AM

Send gpfsug-discuss mailing list submissions to
gpfsug-discuss at spectrumscale.org

To subscribe or unsubscribe via the World Wide Web, visit
http://gpfsug.org/mailman/listinfo/gpfsug-discuss
or, via email, send a message with subject or body 'help' to
gpfsug-discuss-request at spectrumscale.org

You can reach the person managing the list at
gpfsug-discuss-owner at spectrumscale.org

When replying, please edit your Subject line so it is more specific
than "Re: Contents of gpfsug-discuss digest..."


Today's Topics:

   1. Re: Spectrum Scale protocol node service separation.
      (Andi Rhod Christiansen)
   2. Re: Spectrum Scale protocol node service separation.
      (Sanchez, Paul)


----------------------------------------------------------------------

Message: 1
Date: Wed, 9 Jan 2019 13:24:30 +0000
From: Andi Rhod Christiansen <arc at b4restore.com>
To: gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
Subject: Re: [gpfsug-discuss] Spectrum Scale protocol node service
separation.
Message-ID:
<a92739a568ec4e73b68de624127d2319 at B4RWEX01.internal.b4restore.com>
Content-Type: text/plain; charset="utf-8"

Hi Simon,

It was actually also the only solution I found if I want to keep them within the same cluster ?

Thanks for the reply, I will see what we figure out !

Venlig hilsen / Best Regards

Andi Rhod Christiansen

Fra: gpfsug-discuss-bounces at spectrumscale.org <gpfsug-discuss-bounces at spectrumscale.org> P? vegne af Simon Thompson
Sendt: 9. januar 2019 13:20
Til: gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
Emne: Re: [gpfsug-discuss] Spectrum Scale protocol node service separation.

You have to run all services on all nodes ( ? ) actually its technically possible to remove the packages once protocols is running on the node, but next time you reboot the node, it will get marked unhealthy and you spend an hour working out why? <AHEM>

But what we do to split load is have different IPs assigned to different CES groups and then assign the SMB nodes to the SMB group IPs etc ?

Technically a user could still connect to the NFS (in our case) IPs with SMB protocol, but there?s not a lot we can do about that ? though our upstream firewall drops said traffic.

Simon

From: <gpfsug-discuss-bounces at spectrumscale.org<mailto:gpfsug-discuss-bounces at spectrumscale.org>> on behalf of "arc at b4restore.com<mailto:arc at b4restore.com>" <arc at b4restore.com<mailto:arc at b4restore.com>>
Reply-To: "gpfsug-discuss at spectrumscale.org<mailto:gpfsug-discuss at spectrumscale.org>" <gpfsug-discuss at spectrumscale.org<mailto:gpfsug-discuss at spectrumscale.org>>
Date: Wednesday, 9 January 2019 at 10:31
To: "gpfsug-discuss at spectrumscale.org<mailto:gpfsug-discuss at spectrumscale.org>" <gpfsug-discuss at spectrumscale.org<mailto:gpfsug-discuss at spectrumscale.org>>
Subject: [gpfsug-discuss] Spectrum Scale protocol node service separation.

Hi,

I seem to be unable to find any information on separating protocol services on specific CES nodes within a cluster. Does anyone know if it is possible to take, lets say 4 of the ces nodes within a cluster and dividing them into two and have two of the running SMB and the other two running OBJ instead of having them all run both services?

If it is possible it would be great to hear pros and cons about doing this ?

Thanks in advance!

Venlig hilsen / Best Regards

Andi Christiansen
IT Solution Specialist


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss/attachments/20190109/83819399/attachment-0001.html>

------------------------------

Message: 2
Date: Wed, 9 Jan 2019 14:05:48 +0000
From: "Sanchez, Paul" <Paul.Sanchez at deshaw.com>
To: gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
Subject: Re: [gpfsug-discuss] Spectrum Scale protocol node service
separation.
Message-ID:
<53ec54bb621242109a789e51d61b1377 at mbxtoa1.winmail.deshaw.com>
Content-Type: text/plain; charset="utf-8"

The docs say: ?CES supports the following export protocols: NFS, SMB, object, and iSCSI (block). Each protocol can be enabled or disabled in the cluster. If a protocol is enabled in the CES cluster, all CES nodes serve that protocol.? Which would seem to indicate that the answer is ?no?.

This kind of thing is another good reason to license Scale by storage capacity rather than by sockets (PVU).  This approach was already a good idea due to the flexibility it allows to scale manager, quorum, and NSD server nodes for performance and high-availability without affecting your software licensing costs.  This can result in better design and the flexibility to more quickly respond to new problems by adding server nodes.

So assuming you?re not on the old PVU licensing model, it is trivial to deploy as many gateway nodes as needed to separate these into distinct remote clusters.  You can create an object gateway cluster, and a CES gateway cluster each which only mounts and exports what is necessary.  You can even virtualize these servers and host them on the same hardware, if you?re into that.

-Paul

From: gpfsug-discuss-bounces at spectrumscale.org <gpfsug-discuss-bounces at spectrumscale.org> On Behalf Of Andi Rhod Christiansen
Sent: Wednesday, January 9, 2019 5:25 AM
To: gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
Subject: [gpfsug-discuss] Spectrum Scale protocol node service separation.

Hi,

I seem to be unable to find any information on separating protocol services on specific CES nodes within a cluster. Does anyone know if it is possible to take, lets say 4 of the ces nodes within a cluster and dividing them into two and have two of the running SMB and the other two running OBJ instead of having them all run both services?

If it is possible it would be great to hear pros and cons about doing this ?

Thanks in advance!

Venlig hilsen / Best Regards

Andi Christiansen
IT Solution Specialist

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss/attachments/20190109/7f9ad3f8/attachment.html>

------------------------------

_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


End of gpfsug-discuss Digest, Vol 84, Issue 4
*********************************************


From christof.schmitt at us.ibm.com  Wed Jan  9 18:10:13 2019
From: christof.schmitt at us.ibm.com (Christof Schmitt)
Date: Wed, 9 Jan 2019 18:10:13 +0000
Subject: [gpfsug-discuss] Spectrum Scale protocol node service separation
In-Reply-To: <CF45EE16DEF2FE4B9AA7FF2B6EE2654501297938FA@EX13.adf.bham.ac.uk>
References: <CF45EE16DEF2FE4B9AA7FF2B6EE2654501297938FA@EX13.adf.bham.ac.uk>,
	<mailman.42.1547043146.1125.gpfsug-discuss@spectrumscale.org>,
	<OFE9E2E24B.D50D18A9-ON0025837D.005D7A26-0025837D.005F4FC8@notes.na.collabserv.com>
Message-ID: <OF8E323DFE.B039C4DC-ON0025837D.0063C4B8-0025837D.0063D021@notes.na.collabserv.com>

An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20190109/490766e3/attachment.htm>

From christof.schmitt at us.ibm.com  Wed Jan  9 19:03:25 2019
From: christof.schmitt at us.ibm.com (Christof Schmitt)
Date: Wed, 9 Jan 2019 19:03:25 +0000
Subject: [gpfsug-discuss] User Login Active Directory authentication
	on	CES	nodes with SMB protocol
In-Reply-To: <OFE8BB25EF.C3F238EA-ON0025837C.00783DD0-8525837C.00784976@notes.na.collabserv.com>
References: <OFE8BB25EF.C3F238EA-ON0025837C.00783DD0-8525837C.00784976@notes.na.collabserv.com>,
	<OFC65C4DA1.E1658C56-ON0025837C.0077EB6E-1546984191417@notes.na.collabserv.com>
Message-ID: <OFD8512186.EE82F1AA-ON0025837D.00686283-0025837D.0068AF25@notes.na.collabserv.com>

An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20190109/bd8d81d9/attachment.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: Image.1__=8FBB09EFDFEBBB408f9e8a93df938690918c8FB at .gif
Type: image/gif
Size: 105 bytes
Desc: not available
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20190109/bd8d81d9/attachment.gif>

From carlz at us.ibm.com  Wed Jan  9 19:19:20 2019
From: carlz at us.ibm.com (Carl Zetie)
Date: Wed, 9 Jan 2019 19:19:20 +0000
Subject: [gpfsug-discuss] Spectrum Scale protocol node service separation
Message-ID: <OFEAABB085.30E31E69-ON0025837D.00686CEF-0025837D.006A2423@notes.na.collabserv.com>

ST>I believe socket based licenses are also about to or already no longer available 
ST>for new customers (existing customers can continue to buy).

ST>Carl can probably comment on this?
 
That is correct. Friday Jan 11 is the last chance for *new* customers to buy Standard Edition sockets. 
 
And as Simon says, those of you who are currently Sockets customers can remain on Sockets, buying additional licenses and renewing existing licenses. (IBM Legal requires me to add, any statement about the future is an intention, not a commitment -- but, as I've said before, as long as it's my decision to make, my intent is to keep Sockets as long as existing customers want them). 

And yes, one of the reasons I wanted to get away from Socket pricing is the kind of scenarios some of you brought up. Implementing the best deployment topology for your needs shouldn't be a licensing transaction. (Don't even get me started on client licenses).
 
 
regards,

 
Carl Zetie  
Program Director  
Offering Management for Spectrum Scale, IBM  
----  
(540) 882 9353 ][ Research Triangle Park
 carlz at us.ibm.com             


From cblack at nygenome.org  Wed Jan  9 19:11:40 2019
From: cblack at nygenome.org (Christopher Black)
Date: Wed, 9 Jan 2019 19:11:40 +0000
Subject: [gpfsug-discuss] User Login Active Directory authentication on
 CES nodes with SMB protocol
In-Reply-To: <OFD8512186.EE82F1AA-ON0025837D.00686283-0025837D.0068AF25@notes.na.collabserv.com>
References: <OFE8BB25EF.C3F238EA-ON0025837C.00783DD0-8525837C.00784976@notes.na.collabserv.com>
	<OFC65C4DA1.E1658C56-ON0025837C.0077EB6E-1546984191417@notes.na.collabserv.com>
	<OFD8512186.EE82F1AA-ON0025837D.00686283-0025837D.0068AF25@notes.na.collabserv.com>
Message-ID: <7399F5C1-A23F-4852-B912-0965E111D191@nygenome.org>

We use realmd and some automation for sssd configs to get linux hosts to have local login and ssh tied to AD accounts, however we do not apply these configs on our protocol nodes.

From: <gpfsug-discuss-bounces at spectrumscale.org> on behalf of Christof Schmitt <christof.schmitt at us.ibm.com>
Reply-To: gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
Date: Wednesday, January 9, 2019 at 2:03 PM
To: "gpfsug-discuss at spectrumscale.org" <gpfsug-discuss at spectrumscale.org>
Cc: "gpfsug-discuss at spectrumscale.org" <gpfsug-discuss at spectrumscale.org>, Ingo Meents <MEENTS at de.ibm.com>
Subject: Re: [gpfsug-discuss] User Login Active Directory authentication on CES nodes with SMB protocol

There is the PAM module that would forward authentication requests to winbindd:
/usr/lpp/mmfs/lib64/security/pam_gpfs-winbind.so
In theory that can be added to the PAM configuration in /etc/pam.d/. On the other hand, we have never tested this nor claimed support, so there might be reasons why this won't work.

Other customers have configured sssd manually in addition to the Scale authentication to allow user logon and authentication for sudo.

If the request here is to configure AD authentication through mmuserauth and that should also provide user logon, that should probably be treated as a feature request through RFE.

Regards,

Christof Schmitt || IBM || Spectrum Scale Development || Tucson, AZ
christof.schmitt at us.ibm.com  ||  +1-520-799-2469    (T/L: 321-2469)


----- Original message -----
From: "Lyle Gayne" <lgayne at us.ibm.com>
Sent by: gpfsug-discuss-bounces at spectrumscale.org
To: gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
Cc: Ingo Meents <MEENTS at de.ibm.com>
Subject: Re: [gpfsug-discuss] User Login Active Directory authentication on CES nodes with SMB protocol
Date: Tue, Jan 8, 2019 2:54 PM


Adding Ingo Meents for response

[Inactive hide details for "Rob Logie" ---01/08/2019 04:50:22 PM---Hi All Is there a way to enable User Login Active Directory a]"Rob Logie" ---01/08/2019 04:50:22 PM---Hi All Is there a way to enable User Login Active Directory authentication on CES

From: "Rob Logie" <roblogie at au1.ibm.com>
To: gpfsug-discuss at spectrumscale.org
Date: 01/08/2019 04:50 PM
Subject: [gpfsug-discuss] User Login Active Directory authentication on CES nodes with SMB protocol
Sent by: gpfsug-discuss-bounces at spectrumscale.org

________________________________


Hi All
Is there a way to enable User Login Active Directory authentication on CES nodes with SMB protocol that are joined to an AD domain. ? The AD authentication is working for access to the SMB shares, but not for user login authentication on the CES nodes.


Thanks !


Regards,
Rob Logie
IT Specialist


_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss<https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwMFaQ&c=C9X8xNkG_lwP_-eFHTGejw&r=DopWM-bvfskhBn2zeglfyyw5U2pumni6m_QzQFYFepU&m=-xC5HBbNzLewkCoWiX54NDV2Ot9cHR8JqqV263Adf6A&s=0hU9OcUPXitAEavSzopApCsO0Or1PRmKCRO9SHr50o0&e=>


_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss<https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwMFaQ&c=C9X8xNkG_lwP_-eFHTGejw&r=DopWM-bvfskhBn2zeglfyyw5U2pumni6m_QzQFYFepU&m=-xC5HBbNzLewkCoWiX54NDV2Ot9cHR8JqqV263Adf6A&s=0hU9OcUPXitAEavSzopApCsO0Or1PRmKCRO9SHr50o0&e=>


________________________________
This message is for the recipient?s use only, and may contain confidential, privileged or protected information. Any unauthorized use or dissemination of this communication is prohibited. If you received this message in error, please immediately notify the sender and destroy all copies of this message. The recipient should check this email and any attachments for the presence of viruses, as we accept no liability for any damage caused by any virus transmitted by this email.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20190109/1a528981/attachment.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image001.gif
Type: image/gif
Size: 106 bytes
Desc: image001.gif
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20190109/1a528981/attachment.gif>

From Kevin.Buterbaugh at Vanderbilt.Edu  Tue Jan  8 22:12:22 2019
From: Kevin.Buterbaugh at Vanderbilt.Edu (Buterbaugh, Kevin L)
Date: Tue, 8 Jan 2019 22:12:22 +0000
Subject: [gpfsug-discuss] Get list of filesets _without_ running mmlsfileset?
Message-ID: <30C84AD4-D2BB-4923-A8BD-B51C8D8D347A@vanderbilt.edu>

Hi All,

Happy New Year to all!  Personally, I?ll gladly and gratefully settle for 2019 not being a dumpster fire like 2018 was (those who attended my talk at the user group meeting at SC18 know what I?m referring to), but I certainly wish all of you the best!

Is there a way to get a list of the filesets in a filesystem without running mmlsfileset?  I was kind of expecting to find them in one of the config files somewhere under /var/mmfs but haven?t found them yet in the searching I?ve done.

The reason I?m asking is that we have a Python script that users can run that needs to get a list of all the filesets in a filesystem.  There are obviously multiple issues with that, so the workaround we?re using for now is to have a cron job which runs mmlsfileset once a day and dumps it out to a text file, which the script then reads.  That?s sub-optimal for any day on which a fileset gets created or deleted, so I?m looking for a better way ? one which doesn?t require root privileges and preferably doesn?t involve running a GPFS command at all.

Thanks in advance.

Kevin

P.S.  I am still working on metadata and iSCSI testing and will report back on that when complete.
P.P.S.  We ended up adding our new NSDs comprised of (not really) 12 TB disks to the capacity pool and things are working fine.

?
Kevin Buterbaugh - Senior System Administrator
Vanderbilt University - Advanced Computing Center for Research and Education
Kevin.Buterbaugh at vanderbilt.edu<mailto:Kevin.Buterbaugh at vanderbilt.edu> - (615)875-9633


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20190108/0ec9ccbd/attachment.htm>

From skylar2 at uw.edu  Wed Jan  9 21:37:04 2019
From: skylar2 at uw.edu (Skylar Thompson)
Date: Wed, 9 Jan 2019 21:37:04 +0000
Subject: [gpfsug-discuss] Get list of filesets _without_ running
 mmlsfileset?
In-Reply-To: <30C84AD4-D2BB-4923-A8BD-B51C8D8D347A@vanderbilt.edu>
References: <30C84AD4-D2BB-4923-A8BD-B51C8D8D347A@vanderbilt.edu>
Message-ID: <20190109213704.bbbqbuqzkrotcjpu@utumno.gs.washington.edu>

I suppose you could run the underlying tslsfileset, though that's probably
not the answer you're looking for.

Out of curiousity, what are you hoping to gain by not running mmlsfileset?
Is the problem scaling due to the number of filesets that you have defined?

On Tue, Jan 08, 2019 at 10:12:22PM +0000, Buterbaugh, Kevin L wrote:
> Hi All,
> 
> Happy New Year to all!  Personally, I???ll gladly and gratefully settle for 2019 not being a dumpster fire like 2018 was (those who attended my talk at the user group meeting at SC18 know what I???m referring to), but I certainly wish all of you the best!
> 
> Is there a way to get a list of the filesets in a filesystem without running mmlsfileset?  I was kind of expecting to find them in one of the config files somewhere under /var/mmfs but haven???t found them yet in the searching I???ve done.
> 
> The reason I???m asking is that we have a Python script that users can run that needs to get a list of all the filesets in a filesystem.  There are obviously multiple issues with that, so the workaround we???re using for now is to have a cron job which runs mmlsfileset once a day and dumps it out to a text file, which the script then reads.  That???s sub-optimal for any day on which a fileset gets created or deleted, so I???m looking for a better way ??? one which doesn???t require root privileges and preferably doesn???t involve running a GPFS command at all.
> 
> Thanks in advance.
> 
> Kevin
> 
> P.S.  I am still working on metadata and iSCSI testing and will report back on that when complete.
> P.P.S.  We ended up adding our new NSDs comprised of (not really) 12 TB disks to the capacity pool and things are working fine.
> 
> ???
> Kevin Buterbaugh - Senior System Administrator
> Vanderbilt University - Advanced Computing Center for Research and Education
> Kevin.Buterbaugh at vanderbilt.edu<mailto:Kevin.Buterbaugh at vanderbilt.edu> - (615)875-9633
> 
> 
> 

> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss


-- 
-- Skylar Thompson (skylar2 at u.washington.edu)
-- Genome Sciences Department, System Administrator
-- Foege Building S046, (206)-685-7354
-- University of Washington School of Medicine


From S.J.Thompson at bham.ac.uk  Wed Jan  9 22:42:01 2019
From: S.J.Thompson at bham.ac.uk (Simon Thompson)
Date: Wed, 9 Jan 2019 22:42:01 +0000
Subject: [gpfsug-discuss] Get list of filesets _without_ running
	mmlsfileset?
In-Reply-To: <30C84AD4-D2BB-4923-A8BD-B51C8D8D347A@vanderbilt.edu>
References: <30C84AD4-D2BB-4923-A8BD-B51C8D8D347A@vanderbilt.edu>
Message-ID: <CF45EE16DEF2FE4B9AA7FF2B6EE265450129793978@EX13.adf.bham.ac.uk>

Hi Kevin,

Have you looked at the rest API?

https://www.ibm.com/support/knowledgecenter/en/STXKQY_5.0.2/com.ibm.spectrum.scale.v5r02.doc/bl1adm_listofapicommands.htm

I don't know how much access control there is available in the API so not sure if you could lock some sort of service user down to just the get filesets command?

Simon
_______________________________________
From: gpfsug-discuss-bounces at spectrumscale.org [gpfsug-discuss-bounces at spectrumscale.org] on behalf of Buterbaugh, Kevin L [Kevin.Buterbaugh at Vanderbilt.Edu]
Sent: 08 January 2019 22:12
To: gpfsug main discussion list
Subject: [gpfsug-discuss] Get list of filesets _without_ running mmlsfileset?

Hi All,

Happy New Year to all!  Personally, I?ll gladly and gratefully settle for 2019 not being a dumpster fire like 2018 was (those who attended my talk at the user group meeting at SC18 know what I?m referring to), but I certainly wish all of you the best!

Is there a way to get a list of the filesets in a filesystem without running mmlsfileset?  I was kind of expecting to find them in one of the config files somewhere under /var/mmfs but haven?t found them yet in the searching I?ve done.

The reason I?m asking is that we have a Python script that users can run that needs to get a list of all the filesets in a filesystem.  There are obviously multiple issues with that, so the workaround we?re using for now is to have a cron job which runs mmlsfileset once a day and dumps it out to a text file, which the script then reads.  That?s sub-optimal for any day on which a fileset gets created or deleted, so I?m looking for a better way ? one which doesn?t require root privileges and preferably doesn?t involve running a GPFS command at all.

Thanks in advance.

Kevin

P.S.  I am still working on metadata and iSCSI testing and will report back on that when complete.
P.P.S.  We ended up adding our new NSDs comprised of (not really) 12 TB disks to the capacity pool and things are working fine.

?
Kevin Buterbaugh - Senior System Administrator
Vanderbilt University - Advanced Computing Center for Research and Education
Kevin.Buterbaugh at vanderbilt.edu<mailto:Kevin.Buterbaugh at vanderbilt.edu> - (615)875-9633


From Paul.Sanchez at deshaw.com  Wed Jan  9 23:03:08 2019
From: Paul.Sanchez at deshaw.com (Sanchez, Paul)
Date: Wed, 9 Jan 2019 23:03:08 +0000
Subject: [gpfsug-discuss] Get list of filesets _without_ running
 mmlsfileset?
In-Reply-To: <20190109213704.bbbqbuqzkrotcjpu@utumno.gs.washington.edu>
References: <30C84AD4-D2BB-4923-A8BD-B51C8D8D347A@vanderbilt.edu>
	<20190109213704.bbbqbuqzkrotcjpu@utumno.gs.washington.edu>
Message-ID: <3d408800d50648dfae25c3c95c1f04c1@mbxtoa1.winmail.deshaw.com>

You could also wrap whatever provisioning script you're using (the thing that runs mmcrfileset), which must already be running as root, so that it also updates the cached text file afterward.

-Paul

-----Original Message-----
From: gpfsug-discuss-bounces at spectrumscale.org <gpfsug-discuss-bounces at spectrumscale.org> On Behalf Of Skylar Thompson
Sent: Wednesday, January 9, 2019 4:37 PM
To: Kevin.Buterbaugh at Vanderbilt.Edu
Cc: gpfsug-discuss at spectrumscale.org
Subject: Re: [gpfsug-discuss] Get list of filesets _without_ running mmlsfileset?

I suppose you could run the underlying tslsfileset, though that's probably not the answer you're looking for.

Out of curiousity, what are you hoping to gain by not running mmlsfileset?
Is the problem scaling due to the number of filesets that you have defined?

On Tue, Jan 08, 2019 at 10:12:22PM +0000, Buterbaugh, Kevin L wrote:
> Hi All,
> 
> Happy New Year to all!  Personally, I???ll gladly and gratefully settle for 2019 not being a dumpster fire like 2018 was (those who attended my talk at the user group meeting at SC18 know what I???m referring to), but I certainly wish all of you the best!
> 
> Is there a way to get a list of the filesets in a filesystem without running mmlsfileset?  I was kind of expecting to find them in one of the config files somewhere under /var/mmfs but haven???t found them yet in the searching I???ve done.
> 
> The reason I???m asking is that we have a Python script that users can run that needs to get a list of all the filesets in a filesystem.  There are obviously multiple issues with that, so the workaround we???re using for now is to have a cron job which runs mmlsfileset once a day and dumps it out to a text file, which the script then reads.  That???s sub-optimal for any day on which a fileset gets created or deleted, so I???m looking for a better way ??? one which doesn???t require root privileges and preferably doesn???t involve running a GPFS command at all.
> 
> Thanks in advance.
> 
> Kevin
> 
> P.S.  I am still working on metadata and iSCSI testing and will report back on that when complete.
> P.P.S.  We ended up adding our new NSDs comprised of (not really) 12 TB disks to the capacity pool and things are working fine.
> 
> ???
> Kevin Buterbaugh - Senior System Administrator Vanderbilt University - 
> Advanced Computing Center for Research and Education 
> Kevin.Buterbaugh at vanderbilt.edu<mailto:Kevin.Buterbaugh at vanderbilt.edu
> > - (615)875-9633
> 
> 
> 

> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss


--
-- Skylar Thompson (skylar2 at u.washington.edu)
-- Genome Sciences Department, System Administrator
-- Foege Building S046, (206)-685-7354
-- University of Washington School of Medicine _______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


From Kevin.Buterbaugh at Vanderbilt.Edu  Wed Jan  9 23:07:00 2019
From: Kevin.Buterbaugh at Vanderbilt.Edu (Buterbaugh, Kevin L)
Date: Wed, 9 Jan 2019 23:07:00 +0000
Subject: [gpfsug-discuss] Get list of filesets _without_
	running	mmlsfileset?
In-Reply-To: <CF45EE16DEF2FE4B9AA7FF2B6EE265450129793978@EX13.adf.bham.ac.uk>
References: <30C84AD4-D2BB-4923-A8BD-B51C8D8D347A@vanderbilt.edu>
	<CF45EE16DEF2FE4B9AA7FF2B6EE265450129793978@EX13.adf.bham.ac.uk>
Message-ID: <E3617DD4-EA9B-4C1D-A0BA-CF369F0F93E9@vanderbilt.edu>

Hi All,

Let me answer Skylar?s questions in another e-mail, which may also tell whether the rest API is a possibility or not.

The Python script in question is to display quota information for a user.  The mmlsquota command has a couple of issues:  1) its output is confusing to some of our users, 2) more significantly, it displays a ton of information that doesn?t apply to the user running it.  For example, it will display all the filesets in a filesystem whether or not the user has access to them.  So the Python script figures out what group(s) the user is a member of and only displays information pertinent to them (i.e. the group of the fileset junction path is a group this user is a member of) ? and in a simplified (and potentially colorized) output format.

And typing that preceding paragraph caused the lightbulb to go off ? I know the answer to my own question ? have the script run mmlsquota and get the full list of filesets from that, then parse that to determine which ones I actually need to display quota information for.  Thanks!

Kevin
?
Kevin Buterbaugh - Senior System Administrator
Vanderbilt University - Advanced Computing Center for Research and Education
Kevin.Buterbaugh at vanderbilt.edu<mailto:Kevin.Buterbaugh at vanderbilt.edu> - (615)875-9633

On Jan 9, 2019, at 4:42 PM, Simon Thompson <S.J.Thompson at bham.ac.uk<mailto:S.J.Thompson at bham.ac.uk>> wrote:

Hi Kevin,

Have you looked at the rest API?

https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.ibm.com%2Fsupport%2Fknowledgecenter%2Fen%2FSTXKQY_5.0.2%2Fcom.ibm.spectrum.scale.v5r02.doc%2Fbl1adm_listofapicommands.htm&amp;data=02%7C01%7CKevin.Buterbaugh%40vanderbilt.edu%7C36fb451ce9a945f5e0cb08d67683af85%7Cba5a7f39e3be4ab3b45067fa80faecad%7C0%7C1%7C636826705300525885&amp;sdata=uotWilntiZa2E9RIBE2ikhxxBm3Mk3y%2FW%2FKUHovaJpY%3D&amp;reserved=0

I don't know how much access control there is available in the API so not sure if you could lock some sort of service user down to just the get filesets command?

Simon
_______________________________________
From: gpfsug-discuss-bounces at spectrumscale.org [gpfsug-discuss-bounces at spectrumscale.org] on behalf of Buterbaugh, Kevin L [Kevin.Buterbaugh at Vanderbilt.Edu]
Sent: 08 January 2019 22:12
To: gpfsug main discussion list
Subject: [gpfsug-discuss] Get list of filesets _without_ running mmlsfileset?

Hi All,

Happy New Year to all!  Personally, I?ll gladly and gratefully settle for 2019 not being a dumpster fire like 2018 was (those who attended my talk at the user group meeting at SC18 know what I?m referring to), but I certainly wish all of you the best!

Is there a way to get a list of the filesets in a filesystem without running mmlsfileset?  I was kind of expecting to find them in one of the config files somewhere under /var/mmfs but haven?t found them yet in the searching I?ve done.

The reason I?m asking is that we have a Python script that users can run that needs to get a list of all the filesets in a filesystem.  There are obviously multiple issues with that, so the workaround we?re using for now is to have a cron job which runs mmlsfileset once a day and dumps it out to a text file, which the script then reads.  That?s sub-optimal for any day on which a fileset gets created or deleted, so I?m looking for a better way ? one which doesn?t require root privileges and preferably doesn?t involve running a GPFS command at all.

Thanks in advance.

Kevin

P.S.  I am still working on metadata and iSCSI testing and will report back on that when complete.
P.P.S.  We ended up adding our new NSDs comprised of (not really) 12 TB disks to the capacity pool and things are working fine.

?
Kevin Buterbaugh - Senior System Administrator
Vanderbilt University - Advanced Computing Center for Research and Education
Kevin.Buterbaugh at vanderbilt.edu<mailto:Kevin.Buterbaugh at vanderbilt.edu> - (615)875-9633


_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fgpfsug.org%2Fmailman%2Flistinfo%2Fgpfsug-discuss&amp;data=02%7C01%7CKevin.Buterbaugh%40vanderbilt.edu%7C36fb451ce9a945f5e0cb08d67683af85%7Cba5a7f39e3be4ab3b45067fa80faecad%7C0%7C1%7C636826705300525885&amp;sdata=WSijRrjhOgQyuWsh9K8ckpjf%2F2CkXfZW1n%2BJw5Gw5tw%3D&amp;reserved=0

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20190109/f052e8e7/attachment.htm>

From abeattie at au1.ibm.com  Thu Jan 10 01:13:55 2019
From: abeattie at au1.ibm.com (Andrew Beattie)
Date: Thu, 10 Jan 2019 01:13:55 +0000
Subject: [gpfsug-discuss] Get list of filesets
	_without_runningmmlsfileset?
In-Reply-To: <E3617DD4-EA9B-4C1D-A0BA-CF369F0F93E9@vanderbilt.edu>
References: <E3617DD4-EA9B-4C1D-A0BA-CF369F0F93E9@vanderbilt.edu>,
	<30C84AD4-D2BB-4923-A8BD-B51C8D8D347A@vanderbilt.edu><CF45EE16DEF2FE4B9AA7FF2B6EE265450129793978@EX13.adf.bham.ac.uk>
Message-ID: <OF71EA19F2.D6CC0FE3-ON0025837E.0006C4AC-0025837E.0006C4B2@notes.na.collabserv.com>

An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20190110/774f7cfa/attachment.htm>

From Kevin.Buterbaugh at Vanderbilt.Edu  Thu Jan 10 20:42:50 2019
From: Kevin.Buterbaugh at Vanderbilt.Edu (Buterbaugh, Kevin L)
Date: Thu, 10 Jan 2019 20:42:50 +0000
Subject: [gpfsug-discuss] Get list of
	filesets	_without_runningmmlsfileset?
In-Reply-To: <OF71EA19F2.D6CC0FE3-ON0025837E.0006C4AC-0025837E.0006C4B2@notes.na.collabserv.com>
References: <E3617DD4-EA9B-4C1D-A0BA-CF369F0F93E9@vanderbilt.edu>
	<30C84AD4-D2BB-4923-A8BD-B51C8D8D347A@vanderbilt.edu>
	<CF45EE16DEF2FE4B9AA7FF2B6EE265450129793978@EX13.adf.bham.ac.uk>
	<OF71EA19F2.D6CC0FE3-ON0025837E.0006C4AC-0025837E.0006C4B2@notes.na.collabserv.com>
Message-ID: <6A909228-87E7-468E-A51C-086B9C75BB18@vanderbilt.edu>

Hi Andrew / All,

Well, it does _sound_ useful, but in its current state it?s really not for several reasons, mainly having to do with it being coded in a moderately site-specific way.  It needs an overhaul anyway, so I?m going to look at getting rid of as much of that as possible (there?s some definite low-hanging fruit there) and, for the site-specific things that can?t be gotten rid of, maybe consolidating them into one place in the code so that the script could be more generally useful if you just change those values.

If I can accomplish those things, then yes, we?d be glad to share the script.

But I?ve also realized that I didn?t _entirely_ answer my original question.  Yes, mmlsquota will show me all the filesets ? but I also need to know the junction path for each of those filesets.  One of the main reasons we wrote this script in the first place is that if you run mmlsquota you see that you have no limits on about 60 filesets (currently we use fileset quotas only on our filesets) ? and that?s because there are no user (or group) quotas in those filesets.  The script, however, reads in that text file that is created nightly by root that is nothing more than the output of ?mmlsfileset <filesystem name>?, gets the junction path, looks up the GID of the junction path, and sees if you?re a member of that group.  If you?re not, well, no sense in showing you anything about that fileset.  But, of course, if you are a member of that group, then we do want to show you the fileset quota for that fileset.

So ? my question now is, ?Is there a way for a non-root user? to get the junction path for the fileset(s)?  Thanks?

Kevin

?
Kevin Buterbaugh - Senior System Administrator
Vanderbilt University - Advanced Computing Center for Research and Education
Kevin.Buterbaugh at vanderbilt.edu<mailto:Kevin.Buterbaugh at vanderbilt.edu> - (615)875-9633

On Jan 9, 2019, at 7:13 PM, Andrew Beattie <abeattie at au1.ibm.com<mailto:abeattie at au1.ibm.com>> wrote:

Kevin,

That sounds like a useful script
would you care to share?

Thanks
Andrew Beattie
Software Defined Storage  - IT Specialist
Phone: 614-2133-7927
E-mail: abeattie at au1.ibm.com<mailto:abeattie at au1.ibm.com>


----- Original message -----
From: "Buterbaugh, Kevin L" <Kevin.Buterbaugh at Vanderbilt.Edu<mailto:Kevin.Buterbaugh at Vanderbilt.Edu>>
Sent by: gpfsug-discuss-bounces at spectrumscale.org<mailto:gpfsug-discuss-bounces at spectrumscale.org>
To: gpfsug main discussion list <gpfsug-discuss at spectrumscale.org<mailto:gpfsug-discuss at spectrumscale.org>>
Cc:
Subject: Re: [gpfsug-discuss] Get list of filesets _without_ running mmlsfileset?
Date: Thu, Jan 10, 2019 9:22 AM

Hi All,

Let me answer Skylar?s questions in another e-mail, which may also tell whether the rest API is a possibility or not.

The Python script in question is to display quota information for a user.  The mmlsquota command has a couple of issues:  1) its output is confusing to some of our users, 2) more significantly, it displays a ton of information that doesn?t apply to the user running it.  For example, it will display all the filesets in a filesystem whether or not the user has access to them.  So the Python script figures out what group(s) the user is a member of and only displays information pertinent to them (i.e. the group of the fileset junction path is a group this user is a member of) ? and in a simplified (and potentially colorized) output format.

And typing that preceding paragraph caused the lightbulb to go off ? I know the answer to my own question ? have the script run mmlsquota and get the full list of filesets from that, then parse that to determine which ones I actually need to display quota information for.  Thanks!

Kevin
?
Kevin Buterbaugh - Senior System Administrator
Vanderbilt University - Advanced Computing Center for Research and Education
Kevin.Buterbaugh at vanderbilt.edu<mailto:Kevin.Buterbaugh at vanderbilt.edu> - (615)875-9633

On Jan 9, 2019, at 4:42 PM, Simon Thompson <S.J.Thompson at bham.ac.uk<mailto:S.J.Thompson at bham.ac.uk>> wrote:

Hi Kevin,

Have you looked at the rest API?

https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.ibm.com%2Fsupport%2Fknowledgecenter%2Fen%2FSTXKQY_5.0.2%2Fcom.ibm.spectrum.scale.v5r02.doc%2Fbl1adm_listofapicommands.htm&amp;data=02%7C01%7CKevin.Buterbaugh%40vanderbilt.edu%7C36fb451ce9a945f5e0cb08d67683af85%7Cba5a7f39e3be4ab3b45067fa80faecad%7C0%7C1%7C636826705300525885&amp;sdata=uotWilntiZa2E9RIBE2ikhxxBm3Mk3y%2FW%2FKUHovaJpY%3D&amp;reserved=0<https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.ibm.com%2Fsupport%2Fknowledgecenter%2Fen%2FSTXKQY_5.0.2%2Fcom.ibm.spectrum.scale.v5r02.doc%2Fbl1adm_listofapicommands.htm&data=02%7C01%7CKevin.Buterbaugh%40vanderbilt.edu%7Cc1ffac821c5f4524104908d67698e948%7Cba5a7f39e3be4ab3b45067fa80faecad%7C0%7C0%7C636826796466979672&sdata=0%2FsV4sgFJmyJjaETdNtmsRP21pm5zFSYdpr9eNtDSs4%3D&reserved=0>

I don't know how much access control there is available in the API so not sure if you could lock some sort of service user down to just the get filesets command?

Simon
_______________________________________
From: gpfsug-discuss-bounces at spectrumscale.org<mailto:gpfsug-discuss-bounces at spectrumscale.org> [gpfsug-discuss-bounces at spectrumscale.org<mailto:gpfsug-discuss-bounces at spectrumscale.org>] on behalf of Buterbaugh, Kevin L [Kevin.Buterbaugh at Vanderbilt.Edu<mailto:Kevin.Buterbaugh at Vanderbilt.Edu>]
Sent: 08 January 2019 22:12
To: gpfsug main discussion list
Subject: [gpfsug-discuss] Get list of filesets _without_ running mmlsfileset?

Hi All,

Happy New Year to all!  Personally, I?ll gladly and gratefully settle for 2019 not being a dumpster fire like 2018 was (those who attended my talk at the user group meeting at SC18 know what I?m referring to), but I certainly wish all of you the best!

Is there a way to get a list of the filesets in a filesystem without running mmlsfileset?  I was kind of expecting to find them in one of the config files somewhere under /var/mmfs but haven?t found them yet in the searching I?ve done.

The reason I?m asking is that we have a Python script that users can run that needs to get a list of all the filesets in a filesystem.  There are obviously multiple issues with that, so the workaround we?re using for now is to have a cron job which runs mmlsfileset once a day and dumps it out to a text file, which the script then reads.  That?s sub-optimal for any day on which a fileset gets created or deleted, so I?m looking for a better way ? one which doesn?t require root privileges and preferably doesn?t involve running a GPFS command at all.

Thanks in advance.

Kevin

P.S.  I am still working on metadata and iSCSI testing and will report back on that when complete.
P.P.S.  We ended up adding our new NSDs comprised of (not really) 12 TB disks to the capacity pool and things are working fine.

?
Kevin Buterbaugh - Senior System Administrator
Vanderbilt University - Advanced Computing Center for Research and Education
Kevin.Buterbaugh at vanderbilt.edu<mailto:Kevin.Buterbaugh at vanderbilt.edu><mailto:Kevin.Buterbaugh at vanderbilt.edu> - (615)875-9633


_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org<http://spectrumscale.org>
https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fgpfsug.org%2Fmailman%2Flistinfo%2Fgpfsug-discuss&amp;data=02%7C01%7CKevin.Buterbaugh%40vanderbilt.edu%7C36fb451ce9a945f5e0cb08d67683af85%7Cba5a7f39e3be4ab3b45067fa80faecad%7C0%7C1%7C636826705300525885&amp;sdata=WSijRrjhOgQyuWsh9K8ckpjf%2F2CkXfZW1n%2BJw5Gw5tw%3D&amp;reserved=0
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org<http://spectrumscale.org>
http://gpfsug.org/mailman/listinfo/gpfsug-discuss<https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fgpfsug.org%2Fmailman%2Flistinfo%2Fgpfsug-discuss&data=02%7C01%7CKevin.Buterbaugh%40vanderbilt.edu%7Cc1ffac821c5f4524104908d67698e948%7Cba5a7f39e3be4ab3b45067fa80faecad%7C0%7C1%7C636826796466989681&sdata=u32oTmCl3WXoCpPQQplOtAeAuzfNfHBjUmg%2BU194rbI%3D&reserved=0>


_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org<http://spectrumscale.org>
https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fgpfsug.org%2Fmailman%2Flistinfo%2Fgpfsug-discuss&amp;data=02%7C01%7CKevin.Buterbaugh%40vanderbilt.edu%7Cc1ffac821c5f4524104908d67698e948%7Cba5a7f39e3be4ab3b45067fa80faecad%7C0%7C1%7C636826796467009700&amp;sdata=Xfz4JiItI8ukHgnvO5YoN27jVpk6Ngsk03NtMrKJcHk%3D&amp;reserved=0

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20190110/ac7ec2aa/attachment.htm>

From p.childs at qmul.ac.uk  Fri Jan 11 12:50:17 2019
From: p.childs at qmul.ac.uk (Peter Childs)
Date: Fri, 11 Jan 2019 12:50:17 +0000
Subject: [gpfsug-discuss] Get list of filesets _without_ running
 mmlsfileset?
In-Reply-To: <30C84AD4-D2BB-4923-A8BD-B51C8D8D347A@vanderbilt.edu>
References: <30C84AD4-D2BB-4923-A8BD-B51C8D8D347A@vanderbilt.edu>
Message-ID: <300e6e1f8fb4daf9278f0b67262c4046127e0bbf.camel@qmul.ac.uk>

We have a similar issue, I'm wondering if getting mmlsfileset to work as a user is a reasonable "request for enhancement" I suspect it would need better wording.

We too have a rather complex script to report on quota's that I suspect does a similar job. It works by having all the filesets mounted in known locations and names matching mount point names. It then works out which ones are needed by looking at the group ownership, Its very slow and a little cumbersome. Not least because it was written ages ago in a mix of bash, sed, awk and find.


On Tue, 2019-01-08 at 22:12 +0000, Buterbaugh, Kevin L wrote:
Hi All,

Happy New Year to all!  Personally, I?ll gladly and gratefully settle for 2019 not being a dumpster fire like 2018 was (those who attended my talk at the user group meeting at SC18 know what I?m referring to), but I certainly wish all of you the best!

Is there a way to get a list of the filesets in a filesystem without running mmlsfileset?  I was kind of expecting to find them in one of the config files somewhere under /var/mmfs but haven?t found them yet in the searching I?ve done.

The reason I?m asking is that we have a Python script that users can run that needs to get a list of all the filesets in a filesystem.  There are obviously multiple issues with that, so the workaround we?re using for now is to have a cron job which runs mmlsfileset once a day and dumps it out to a text file, which the script then reads.  That?s sub-optimal for any day on which a fileset gets created or deleted, so I?m looking for a better way ? one which doesn?t require root privileges and preferably doesn?t involve running a GPFS command at all.

Thanks in advance.

Kevin

P.S.  I am still working on metadata and iSCSI testing and will report back on that when complete.
P.P.S.  We ended up adding our new NSDs comprised of (not really) 12 TB disks to the capacity pool and things are working fine.

?
Kevin Buterbaugh - Senior System Administrator
Vanderbilt University - Advanced Computing Center for Research and Education
Kevin.Buterbaugh at vanderbilt.edu<mailto:Kevin.Buterbaugh at vanderbilt.edu> - (615)875-9633


_______________________________________________

gpfsug-discuss mailing list

gpfsug-discuss at spectrumscale.org

http://gpfsug.org/mailman/listinfo/gpfsug-discuss


--

Peter Childs
ITS Research Storage
Queen Mary, University of London

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20190111/31536a7d/attachment.htm>

From S.J.Thompson at bham.ac.uk  Fri Jan 11 14:19:28 2019
From: S.J.Thompson at bham.ac.uk (Simon Thompson)
Date: Fri, 11 Jan 2019 14:19:28 +0000
Subject: [gpfsug-discuss] A cautionary tale of upgrades
Message-ID: <7085B14A-92B4-47DE-9411-FB544FD4A610@bham.ac.uk>


I?ll start by saying this is our experience, maybe we did something stupid along the way, but just in case others see similar issues ?

We have a cluster which contains protocol nodes, these were all happily running GPFS 5.0.1-2 code. But the cluster was a only 4 nodes + 1 quorum node ? manager and quorum functions were handled by the 4 protocol nodes.

Then one day we needed to reboot a protocol node. We did so and its disk controller appeared to have failed. Oh well, we thought we?ll fix that another day, we still have three other quorum nodes.

As they are all getting a little long in the tooth and were starting to struggle, we thought, well we have DME, lets add some new nodes for quorum and token functions. Being shiny and new they were all installed with GPFS 5.0.2-1 code.

All was well.

The some-time later, we needed to restart another of the CES nodes, when we started GPFS on the node, it was causing havock in our cluster ? CES IPs were constantly being assigned, then removed from the remaining nodes in the cluster. Crap we thought and disabled the node in the cluster. This made things stabilise and as we?d been having other GPFS issues, we didn?t want service to be interrupted whilst we dug into this. Besides, it was nearly Christmas and we had conferences and other work to content with.

More time passes and we?re about to cut over all our backend storage to some shiny new DSS-G kit, so we plan a whole system maintenance window. We finish all our data sync?s and then try to start our protocol nodes to test them. No dice ? we can?t get any of the nodes to bring up IPs, the logs look like they start the assignment process, but then gave up.

A lot of digging in the mm korn shell scripts, and some studious use of DEBUG=1 when testing, we find that mmcesnetmvaddress is calling ?tsctl shownodes up?. On our protocol nodes, we find output of the form:
bear-er-dtn01.bb2.cluster.cluster,rds-aw-ctdb01-data.bb2.cluster.cluster,rds-er-ctdb01-data.bb2.cluster.cluster,bber-irods-ires01-data.bb2.cluster.cluster,bber-irods-icat01-data.bb2.cluster.cluster,bbaw-irods-icat01-data.bb2.cluster.cluster,proto-pg-mgr01.bear.cluster.cluster,proto-pg-pf01.bear.cluster.cluster,proto-pg-dtn01.bear.cluster.cluster,proto-er-mgr01.bear.cluster.cluster,proto-er-pf01.bear.cluster.cluster,proto-aw-mgr01.bear.cluster.cluster,proto-aw-pf01.bear.cluster.cluster

Now our DNS name for these nodes is bb2.cluster ? something is repeating the DNS name.

So we dig around, resolv.conf, /etc/hosts etc all look good and name resolution seems fine.

We look around on the manager/quorum nodes and they don?t do this cluster.cluster thing. We can?t find anything else Linux config wise that looks bad. In fact the only difference is that our CES nodes are running 5.0.1-2 and the manager nodes 5.0.2-1. Given we?re changing the whole storage hardware, we didn?t want to change the GPFS/NFS/SMB code on the CES nodes, (we?ve been bitten before with SMB packages not working properly in our environment), but we go ahead and do GPFS and NFS packages.

Suddenly, magically all is working again. CES starts fine and IPs get assigned OK. And tsctl gives the correct output.

So, my supposition is that there is some incompatibility between 5.0.1-2 and 5.0.2-1 when running CES and the cluster manager is running on 5.0.2-1. As I said before, I don?t have hard evidence we did something stupid, but it certainly is fishy. We?re guessing this same ?feature? was the cause of the CES issues we saw when we rebooted a CES node and the IPs kept deassigning? It looks like all was well as we added the manager nodes after CES was started, but when a CES node restarted, things broke.

We got everything working again in house so didn?t raise a PMR, but if you find yourself in this upgrade path, beware!

Simon


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20190111/c9d4a7fb/attachment.htm>

From MDIETZ at de.ibm.com  Fri Jan 11 14:58:20 2019
From: MDIETZ at de.ibm.com (Mathias Dietz)
Date: Fri, 11 Jan 2019 15:58:20 +0100
Subject: [gpfsug-discuss] A cautionary tale of upgrades
In-Reply-To: <7085B14A-92B4-47DE-9411-FB544FD4A610@bham.ac.uk>
References: <7085B14A-92B4-47DE-9411-FB544FD4A610@bham.ac.uk>
Message-ID: <OFCD21C556.30D43FB7-ONC125837F.0051BFA9-C125837F.00523EAD@notes.na.collabserv.com>

Hi Simon, 

you likely run into the following issue:

APAR IV93896 - https://www-01.ibm.com/support/docview.wss?uid=isg1IV93896

This problem happens only if you use different host domains within a 
cluster and will mostly impact CES. It is unrelated to upgrade or mixed 
version clusters.

Its has been fixed with 5.0.2, therefore I recommend to upgrade soon. 


Mit freundlichen Gr??en / Kind regards

Mathias Dietz

Spectrum Scale Development - Release Lead Architect (4.2.x)
Spectrum Scale RAS Architect
---------------------------------------------------------------------------
IBM Deutschland
Am Weiher 24
65451 Kelsterbach
Phone: +49 70342744105
Mobile: +49-15152801035
E-Mail: mdietz at de.ibm.com
-----------------------------------------------------------------------------
IBM Deutschland Research & Development GmbH
Vorsitzender des Aufsichtsrats: Martina Koederitz, Gesch?ftsf?hrung: Dirk 
WittkoppSitz der Gesellschaft: B?blingen / Registergericht: Amtsgericht 
Stuttgart, HRB 243294


From:   Simon Thompson <S.J.Thompson at bham.ac.uk>
To:     "gpfsug-discuss at spectrumscale.org" 
<gpfsug-discuss at spectrumscale.org>
Date:   11/01/2019 15:19
Subject:        [gpfsug-discuss] A cautionary tale of upgrades
Sent by:        gpfsug-discuss-bounces at spectrumscale.org


I?ll start by saying this is our experience, maybe we did something stupid 
along the way, but just in case others see similar issues ?
 
We have a cluster which contains protocol nodes, these were all happily 
running GPFS 5.0.1-2 code. But the cluster was a only 4 nodes + 1 quorum 
node ? manager and quorum functions were handled by the 4 protocol nodes.
 
Then one day we needed to reboot a protocol node. We did so and its disk 
controller appeared to have failed. Oh well, we thought we?ll fix that 
another day, we still have three other quorum nodes.
 
As they are all getting a little long in the tooth and were starting to 
struggle, we thought, well we have DME, lets add some new nodes for quorum 
and token functions. Being shiny and new they were all installed with GPFS 
5.0.2-1 code.
 
All was well.
 
The some-time later, we needed to restart another of the CES nodes, when 
we started GPFS on the node, it was causing havock in our cluster ? CES 
IPs were constantly being assigned, then removed from the remaining nodes 
in the cluster. Crap we thought and disabled the node in the cluster. This 
made things stabilise and as we?d been having other GPFS issues, we didn?t 
want service to be interrupted whilst we dug into this. Besides, it was 
nearly Christmas and we had conferences and other work to content with.
 
More time passes and we?re about to cut over all our backend storage to 
some shiny new DSS-G kit, so we plan a whole system maintenance window. We 
finish all our data sync?s and then try to start our protocol nodes to 
test them. No dice ? we can?t get any of the nodes to bring up IPs, the 
logs look like they start the assignment process, but then gave up.
 
A lot of digging in the mm korn shell scripts, and some studious use of 
DEBUG=1 when testing, we find that mmcesnetmvaddress is calling ?tsctl 
shownodes up?. On our protocol nodes, we find output of the form:
bear-er-dtn01.bb2.cluster.cluster,rds-aw-ctdb01-data.bb2.cluster.cluster,rds-er-ctdb01-data.bb2.cluster.cluster,bber-irods-ires01-data.bb2.cluster.cluster,bber-irods-icat01-data.bb2.cluster.cluster,bbaw-irods-icat01-data.bb2.cluster.cluster,proto-pg-mgr01.bear.cluster.cluster,proto-pg-pf01.bear.cluster.cluster,proto-pg-dtn01.bear.cluster.cluster,proto-er-mgr01.bear.cluster.cluster,proto-er-pf01.bear.cluster.cluster,proto-aw-mgr01.bear.cluster.cluster,proto-aw-pf01.bear.cluster.cluster
 
Now our DNS name for these nodes is bb2.cluster ? something is repeating 
the DNS name.
 
So we dig around, resolv.conf, /etc/hosts etc all look good and name 
resolution seems fine.
 
We look around on the manager/quorum nodes and they don?t do this 
cluster.cluster thing. We can?t find anything else Linux config wise that 
looks bad. In fact the only difference is that our CES nodes are running 
5.0.1-2 and the manager nodes 5.0.2-1. Given we?re changing the whole 
storage hardware, we didn?t want to change the GPFS/NFS/SMB code on the 
CES nodes, (we?ve been bitten before with SMB packages not working 
properly in our environment), but we go ahead and do GPFS and NFS 
packages.
 
Suddenly, magically all is working again. CES starts fine and IPs get 
assigned OK. And tsctl gives the correct output.
 
So, my supposition is that there is some incompatibility between 5.0.1-2 
and 5.0.2-1 when running CES and the cluster manager is running on 
5.0.2-1. As I said before, I don?t have hard evidence we did something 
stupid, but it certainly is fishy. We?re guessing this same ?feature? was 
the cause of the CES issues we saw when we rebooted a CES node and the IPs 
kept deassigning? It looks like all was well as we added the manager nodes 
after CES was started, but when a CES node restarted, things broke.
 
We got everything working again in house so didn?t raise a PMR, but if you 
find yourself in this upgrade path, beware!
 
Simon
 
 _______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20190111/bd9402b2/attachment.htm>

From Renar.Grunenberg at huk-coburg.de  Fri Jan 11 15:00:51 2019
From: Renar.Grunenberg at huk-coburg.de (Grunenberg, Renar)
Date: Fri, 11 Jan 2019 15:00:51 +0000
Subject: [gpfsug-discuss] A cautionary tale of upgrades
In-Reply-To: <7085B14A-92B4-47DE-9411-FB544FD4A610@bham.ac.uk>
References: <7085B14A-92B4-47DE-9411-FB544FD4A610@bham.ac.uk>
Message-ID: <AB66FAB6-82ED-4022-863B-30E3743FAA39@huk-coburg.de>

Hallo Simon,
Welcome to the Club. These behavior are a Bug in tsctl to change the DNS names . We had this already  4 weeks  ago. The fix was Update to 5.0.2.1.
Regards Renar


Von meinem iPhone gesendet


Renar Grunenberg
Abteilung Informatik - Betrieb

HUK-COBURG
Bahnhofsplatz
96444 Coburg
Telefon:        09561 96-44110
Telefax:        09561 96-44104
E-Mail: Renar.Grunenberg at huk-coburg.de
Internet:       www.huk.de
________________________________
HUK-COBURG Haftpflicht-Unterst?tzungs-Kasse kraftfahrender Beamter Deutschlands a. G. in Coburg
Reg.-Gericht Coburg HRB 100; St.-Nr. 9212/101/00021
Sitz der Gesellschaft: Bahnhofsplatz, 96444 Coburg
Vorsitzender des Aufsichtsrats: Prof. Dr. Heinrich R. Schradin.
Vorstand: Klaus-J?rgen Heitmann (Sprecher), Stefan Gronbach, Dr. Hans Olav Her?y, Dr. J?rg Rheinl?nder (stv.), Sarah R?ssler, Daniel Thomas.
________________________________
Diese Nachricht enth?lt vertrauliche und/oder rechtlich gesch?tzte Informationen.
Wenn Sie nicht der richtige Adressat sind oder diese Nachricht irrt?mlich erhalten haben,
informieren Sie bitte sofort den Absender und vernichten Sie diese Nachricht.
Das unerlaubte Kopieren sowie die unbefugte Weitergabe dieser Nachricht ist nicht gestattet.

This information may contain confidential and/or privileged information.
If you are not the intended recipient (or have received this information in error) please notify the
sender immediately and destroy this information.
Any unauthorized copying, disclosure or distribution of the material in this information is strictly forbidden.
________________________________

Am 11.01.2019 um 15:19 schrieb Simon Thompson <S.J.Thompson at bham.ac.uk<mailto:S.J.Thompson at bham.ac.uk>>:


I?ll start by saying this is our experience, maybe we did something stupid along the way, but just in case others see similar issues ?

We have a cluster which contains protocol nodes, these were all happily running GPFS 5.0.1-2 code. But the cluster was a only 4 nodes + 1 quorum node ? manager and quorum functions were handled by the 4 protocol nodes.

Then one day we needed to reboot a protocol node. We did so and its disk controller appeared to have failed. Oh well, we thought we?ll fix that another day, we still have three other quorum nodes.

As they are all getting a little long in the tooth and were starting to struggle, we thought, well we have DME, lets add some new nodes for quorum and token functions. Being shiny and new they were all installed with GPFS 5.0.2-1 code.

All was well.

The some-time later, we needed to restart another of the CES nodes, when we started GPFS on the node, it was causing havock in our cluster ? CES IPs were constantly being assigned, then removed from the remaining nodes in the cluster. Crap we thought and disabled the node in the cluster. This made things stabilise and as we?d been having other GPFS issues, we didn?t want service to be interrupted whilst we dug into this. Besides, it was nearly Christmas and we had conferences and other work to content with.

More time passes and we?re about to cut over all our backend storage to some shiny new DSS-G kit, so we plan a whole system maintenance window. We finish all our data sync?s and then try to start our protocol nodes to test them. No dice ? we can?t get any of the nodes to bring up IPs, the logs look like they start the assignment process, but then gave up.

A lot of digging in the mm korn shell scripts, and some studious use of DEBUG=1 when testing, we find that mmcesnetmvaddress is calling ?tsctl shownodes up?. On our protocol nodes, we find output of the form:
bear-er-dtn01.bb2.cluster.cluster,rds-aw-ctdb01-data.bb2.cluster.cluster,rds-er-ctdb01-data.bb2.cluster.cluster,bber-irods-ires01-data.bb2.cluster.cluster,bber-irods-icat01-data.bb2.cluster.cluster,bbaw-irods-icat01-data.bb2.cluster.cluster,proto-pg-mgr01.bear.cluster.cluster,proto-pg-pf01.bear.cluster.cluster,proto-pg-dtn01.bear.cluster.cluster,proto-er-mgr01.bear.cluster.cluster,proto-er-pf01.bear.cluster.cluster,proto-aw-mgr01.bear.cluster.cluster,proto-aw-pf01.bear.cluster.cluster

Now our DNS name for these nodes is bb2.cluster ? something is repeating the DNS name.

So we dig around, resolv.conf, /etc/hosts etc all look good and name resolution seems fine.

We look around on the manager/quorum nodes and they don?t do this cluster.cluster thing. We can?t find anything else Linux config wise that looks bad. In fact the only difference is that our CES nodes are running 5.0.1-2 and the manager nodes 5.0.2-1. Given we?re changing the whole storage hardware, we didn?t want to change the GPFS/NFS/SMB code on the CES nodes, (we?ve been bitten before with SMB packages not working properly in our environment), but we go ahead and do GPFS and NFS packages.

Suddenly, magically all is working again. CES starts fine and IPs get assigned OK. And tsctl gives the correct output.

So, my supposition is that there is some incompatibility between 5.0.1-2 and 5.0.2-1 when running CES and the cluster manager is running on 5.0.2-1. As I said before, I don?t have hard evidence we did something stupid, but it certainly is fishy. We?re guessing this same ?feature? was the cause of the CES issues we saw when we rebooted a CES node and the IPs kept deassigning? It looks like all was well as we added the manager nodes after CES was started, but when a CES node restarted, things broke.

We got everything working again in house so didn?t raise a PMR, but if you find yourself in this upgrade path, beware!

Simon


_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org<http://spectrumscale.org>
http://gpfsug.org/mailman/listinfo/gpfsug-discuss
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20190111/b524af8e/attachment.htm>

From S.J.Thompson at bham.ac.uk  Fri Jan 11 15:48:50 2019
From: S.J.Thompson at bham.ac.uk (Simon Thompson)
Date: Fri, 11 Jan 2019 15:48:50 +0000
Subject: [gpfsug-discuss] A cautionary tale of upgrades
In-Reply-To: <OFCD21C556.30D43FB7-ONC125837F.0051BFA9-C125837F.00523EAD@notes.na.collabserv.com>
References: <7085B14A-92B4-47DE-9411-FB544FD4A610@bham.ac.uk>,
	<OFCD21C556.30D43FB7-ONC125837F.0051BFA9-C125837F.00523EAD@notes.na.collabserv.com>
Message-ID: <CF45EE16DEF2FE4B9AA7FF2B6EE2654501297AEFF5@EX13.adf.bham.ac.uk>

Could well be.

Still it's pretty scary that this sort of thing could hit you way after the different DNS name nodes were added. It might be months before you restart the CES nodes.

Simon
________________________________________
From: gpfsug-discuss-bounces at spectrumscale.org [gpfsug-discuss-bounces at spectrumscale.org] on behalf of MDIETZ at de.ibm.com [MDIETZ at de.ibm.com]
Sent: 11 January 2019 14:58
To: gpfsug main discussion list
Subject: Re: [gpfsug-discuss] A cautionary tale of upgrades

Hi Simon,

you likely run into the following issue:

APAR IV93896 - https://www-01.ibm.com/support/docview.wss?uid=isg1IV93896

This problem happens only if you use different host domains within a cluster and will mostly impact CES. It is unrelated to upgrade or mixed version clusters.

Its has been fixed with 5.0.2, therefore I recommend to upgrade soon.


Mit freundlichen Gr??en / Kind regards

Mathias Dietz

Spectrum Scale Development - Release Lead Architect (4.2.x)
Spectrum Scale RAS Architect
---------------------------------------------------------------------------
IBM Deutschland
Am Weiher 24
65451 Kelsterbach
Phone: +49 70342744105
Mobile: +49-15152801035
E-Mail: mdietz at de.ibm.com
-----------------------------------------------------------------------------
IBM Deutschland Research & Development GmbH
Vorsitzender des Aufsichtsrats: Martina Koederitz, Gesch?ftsf?hrung: Dirk WittkoppSitz der Gesellschaft: B?blingen / Registergericht: Amtsgericht Stuttgart, HRB 243294


From:        Simon Thompson <S.J.Thompson at bham.ac.uk>
To:        "gpfsug-discuss at spectrumscale.org" <gpfsug-discuss at spectrumscale.org>
Date:        11/01/2019 15:19
Subject:        [gpfsug-discuss] A cautionary tale of upgrades
Sent by:        gpfsug-discuss-bounces at spectrumscale.org
________________________________


I?ll start by saying this is our experience, maybe we did something stupid along the way, but just in case others see similar issues ?

We have a cluster which contains protocol nodes, these were all happily running GPFS 5.0.1-2 code. But the cluster was a only 4 nodes + 1 quorum node ? manager and quorum functions were handled by the 4 protocol nodes.

Then one day we needed to reboot a protocol node. We did so and its disk controller appeared to have failed. Oh well, we thought we?ll fix that another day, we still have three other quorum nodes.

As they are all getting a little long in the tooth and were starting to struggle, we thought, well we have DME, lets add some new nodes for quorum and token functions. Being shiny and new they were all installed with GPFS 5.0.2-1 code.

All was well.

The some-time later, we needed to restart another of the CES nodes, when we started GPFS on the node, it was causing havock in our cluster ? CES IPs were constantly being assigned, then removed from the remaining nodes in the cluster. Crap we thought and disabled the node in the cluster. This made things stabilise and as we?d been having other GPFS issues, we didn?t want service to be interrupted whilst we dug into this. Besides, it was nearly Christmas and we had conferences and other work to content with.

More time passes and we?re about to cut over all our backend storage to some shiny new DSS-G kit, so we plan a whole system maintenance window. We finish all our data sync?s and then try to start our protocol nodes to test them. No dice ? we can?t get any of the nodes to bring up IPs, the logs look like they start the assignment process, but then gave up.

A lot of digging in the mm korn shell scripts, and some studious use of DEBUG=1 when testing, we find that mmcesnetmvaddress is calling ?tsctl shownodes up?. On our protocol nodes, we find output of the form:
bear-er-dtn01.bb2.cluster.cluster,rds-aw-ctdb01-data.bb2.cluster.cluster,rds-er-ctdb01-data.bb2.cluster.cluster,bber-irods-ires01-data.bb2.cluster.cluster,bber-irods-icat01-data.bb2.cluster.cluster,bbaw-irods-icat01-data.bb2.cluster.cluster,proto-pg-mgr01.bear.cluster.cluster,proto-pg-pf01.bear.cluster.cluster,proto-pg-dtn01.bear.cluster.cluster,proto-er-mgr01.bear.cluster.cluster,proto-er-pf01.bear.cluster.cluster,proto-aw-mgr01.bear.cluster.cluster,proto-aw-pf01.bear.cluster.cluster

Now our DNS name for these nodes is bb2.cluster ? something is repeating the DNS name.

So we dig around, resolv.conf, /etc/hosts etc all look good and name resolution seems fine.

We look around on the manager/quorum nodes and they don?t do this cluster.cluster thing. We can?t find anything else Linux config wise that looks bad. In fact the only difference is that our CES nodes are running 5.0.1-2 and the manager nodes 5.0.2-1. Given we?re changing the whole storage hardware, we didn?t want to change the GPFS/NFS/SMB code on the CES nodes, (we?ve been bitten before with SMB packages not working properly in our environment), but we go ahead and do GPFS and NFS packages.

Suddenly, magically all is working again. CES starts fine and IPs get assigned OK. And tsctl gives the correct output.

So, my supposition is that there is some incompatibility between 5.0.1-2 and 5.0.2-1 when running CES and the cluster manager is running on 5.0.2-1. As I said before, I don?t have hard evidence we did something stupid, but it certainly is fishy. We?re guessing this same ?feature? was the cause of the CES issues we saw when we rebooted a CES node and the IPs kept deassigning? It looks like all was well as we added the manager nodes after CES was started, but when a CES node restarted, things broke.

We got everything working again in house so didn?t raise a PMR, but if you find yourself in this upgrade path, beware!

Simon

 _______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


From makaplan at us.ibm.com  Fri Jan 11 17:31:35 2019
From: makaplan at us.ibm.com (Marc A Kaplan)
Date: Fri, 11 Jan 2019 14:31:35 -0300
Subject: [gpfsug-discuss] Get list offilesets_without_runningmmlsfileset?
In-Reply-To: <6A909228-87E7-468E-A51C-086B9C75BB18@vanderbilt.edu>
References: <E3617DD4-EA9B-4C1D-A0BA-CF369F0F93E9@vanderbilt.edu><30C84AD4-D2BB-4923-A8BD-B51C8D8D347A@vanderbilt.edu><CF45EE16DEF2FE4B9AA7FF2B6EE265450129793978@EX13.adf.bham.ac.uk><OF71EA19F2.D6CC0FE3-ON0025837E.0006C4AC-0025837E.0006C4B2@notes.na.collabserv.com>
	<6A909228-87E7-468E-A51C-086B9C75BB18@vanderbilt.edu>
Message-ID: <OF84E91E2E.636F509A-ON0325837F.005F9E2A-0325837F.006047B1@notes.na.collabserv.com>

?Is there a way for a non-root user? to get the junction path for the 
fileset(s)? 

Presuming the user has some path to some file in the fileset...
Issue `mmlsattr -L path` then "walk" back towards the root by discarding 
successive path suffixes and watch for changes in the fileset name field 

Why doesn't mmlsfileset work for non-root users? I don't know.  Perhaps 
the argument has to do with security or confidentiality.
On my test system it gives a bogus error, when it should say something 
about root or super-user.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20190111/3c7181ff/attachment.htm>

From JRLang at uwyo.edu  Fri Jan 11 16:24:17 2019
From: JRLang at uwyo.edu (Jeffrey R. Lang)
Date: Fri, 11 Jan 2019 16:24:17 +0000
Subject: [gpfsug-discuss] Get list of filesets _without_ running
 mmlsfileset?
In-Reply-To: <300e6e1f8fb4daf9278f0b67262c4046127e0bbf.camel@qmul.ac.uk>
References: <30C84AD4-D2BB-4923-A8BD-B51C8D8D347A@vanderbilt.edu>
	<300e6e1f8fb4daf9278f0b67262c4046127e0bbf.camel@qmul.ac.uk>
Message-ID: <BL0PR05MB50118B21ECB103E3A45711ECA8850@BL0PR05MB5011.namprd05.prod.outlook.com>

What we do is the use ?mmlsquota -Y <device>? which will list out all the filesets in an easily parseable format.   And the command can be run by the user.


From: gpfsug-discuss-bounces at spectrumscale.org <gpfsug-discuss-bounces at spectrumscale.org> On Behalf Of Peter Childs
Sent: Friday, January 11, 2019 6:50 AM
To: gpfsug-discuss at spectrumscale.org
Subject: Re: [gpfsug-discuss] Get list of filesets _without_ running mmlsfileset?

? This message was sent from a non-UWYO address. Please exercise caution when clicking links or opening attachments from external sources.

We have a similar issue, I'm wondering if getting mmlsfileset to work as a user is a reasonable "request for enhancement" I suspect it would need better wording.


We too have a rather complex script to report on quota's that I suspect does a similar job. It works by having all the filesets mounted in known locations and names matching mount point names. It then works out which ones are needed by looking at the group ownership, Its very slow and a little cumbersome. Not least because it was written ages ago in a mix of bash, sed, awk and find.


On Tue, 2019-01-08 at 22:12 +0000, Buterbaugh, Kevin L wrote:
Hi All,

Happy New Year to all!  Personally, I?ll gladly and gratefully settle for 2019 not being a dumpster fire like 2018 was (those who attended my talk at the user group meeting at SC18 know what I?m referring to), but I certainly wish all of you the best!

Is there a way to get a list of the filesets in a filesystem without running mmlsfileset?  I was kind of expecting to find them in one of the config files somewhere under /var/mmfs but haven?t found them yet in the searching I?ve done.

The reason I?m asking is that we have a Python script that users can run that needs to get a list of all the filesets in a filesystem.  There are obviously multiple issues with that, so the workaround we?re using for now is to have a cron job which runs mmlsfileset once a day and dumps it out to a text file, which the script then reads.  That?s sub-optimal for any day on which a fileset gets created or deleted, so I?m looking for a better way ? one which doesn?t require root privileges and preferably doesn?t involve running a GPFS command at all.

Thanks in advance.

Kevin

P.S.  I am still working on metadata and iSCSI testing and will report back on that when complete.
P.P.S.  We ended up adding our new NSDs comprised of (not really) 12 TB disks to the capacity pool and things are working fine.

?
Kevin Buterbaugh - Senior System Administrator
Vanderbilt University - Advanced Computing Center for Research and Education
Kevin.Buterbaugh at vanderbilt.edu<mailto:Kevin.Buterbaugh at vanderbilt.edu> - (615)875-9633


_______________________________________________

gpfsug-discuss mailing list

gpfsug-discuss at spectrumscale.org

http://gpfsug.org/mailman/listinfo/gpfsug-discuss


--
Peter Childs
ITS Research Storage
Queen Mary, University of London

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20190111/44007842/attachment.htm>

From Kevin.Buterbaugh at Vanderbilt.Edu  Sat Jan 12 03:07:29 2019
From: Kevin.Buterbaugh at Vanderbilt.Edu (Buterbaugh, Kevin L)
Date: Sat, 12 Jan 2019 03:07:29 +0000
Subject: [gpfsug-discuss] Get list of filesets _without_ running
 mmlsfileset?
In-Reply-To: <BL0PR05MB50118B21ECB103E3A45711ECA8850@BL0PR05MB5011.namprd05.prod.outlook.com>
References: <30C84AD4-D2BB-4923-A8BD-B51C8D8D347A@vanderbilt.edu>
	<300e6e1f8fb4daf9278f0b67262c4046127e0bbf.camel@qmul.ac.uk>
	<BL0PR05MB50118B21ECB103E3A45711ECA8850@BL0PR05MB5011.namprd05.prod.outlook.com>
Message-ID: <1CD7EBDE-F39D-4410-9028-EF9FBF22C6EC@vanderbilt.edu>

Hi All,

I appreciate the time several of you have taken to respond to my inquiry.  However, unless I?m missing something - and my apologies if I am - none so far appear to allow me to obtain the list of junction paths as a non-root user.  Yes, mmlsquota shows all the filesets.  But from there I need to then be able to find out where that fileset is mounted in the directory tree so that I can see who the owner and group of that directory are.  Only if the user running the script is either the owner or a member of the group do I want to display the fileset quota for that fileset to the user.

Thanks again?

Kevin

On Jan 11, 2019, at 10:24 AM, Jeffrey R. Lang <JRLang at uwyo.edu<mailto:JRLang at uwyo.edu>> wrote:

What we do is the use ?mmlsquota -Y <device>? which will list out all the filesets in an easily parseable format.   And the command can be run by the user.


From: gpfsug-discuss-bounces at spectrumscale.org<mailto:gpfsug-discuss-bounces at spectrumscale.org> <gpfsug-discuss-bounces at spectrumscale.org<mailto:gpfsug-discuss-bounces at spectrumscale.org>> On Behalf Of Peter Childs
Sent: Friday, January 11, 2019 6:50 AM
To: gpfsug-discuss at spectrumscale.org<mailto:gpfsug-discuss at spectrumscale.org>
Subject: Re: [gpfsug-discuss] Get list of filesets _without_ running mmlsfileset?

? This message was sent from a non-UWYO address. Please exercise caution when clicking links or opening attachments from external sources.

We have a similar issue, I'm wondering if getting mmlsfileset to work as a user is a reasonable "request for enhancement" I suspect it would need better wording.


We too have a rather complex script to report on quota's that I suspect does a similar job. It works by having all the filesets mounted in known locations and names matching mount point names. It then works out which ones are needed by looking at the group ownership, Its very slow and a little cumbersome. Not least because it was written ages ago in a mix of bash, sed, awk and find.


On Tue, 2019-01-08 at 22:12 +0000, Buterbaugh, Kevin L wrote:
Hi All,

Happy New Year to all!  Personally, I?ll gladly and gratefully settle for 2019 not being a dumpster fire like 2018 was (those who attended my talk at the user group meeting at SC18 know what I?m referring to), but I certainly wish all of you the best!

Is there a way to get a list of the filesets in a filesystem without running mmlsfileset?  I was kind of expecting to find them in one of the config files somewhere under /var/mmfs but haven?t found them yet in the searching I?ve done.

The reason I?m asking is that we have a Python script that users can run that needs to get a list of all the filesets in a filesystem.  There are obviously multiple issues with that, so the workaround we?re using for now is to have a cron job which runs mmlsfileset once a day and dumps it out to a text file, which the script then reads.  That?s sub-optimal for any day on which a fileset gets created or deleted, so I?m looking for a better way ? one which doesn?t require root privileges and preferably doesn?t involve running a GPFS command at all.

Thanks in advance.

Kevin

P.S.  I am still working on metadata and iSCSI testing and will report back on that when complete.
P.P.S.  We ended up adding our new NSDs comprised of (not really) 12 TB disks to the capacity pool and things are working fine.

?
Kevin Buterbaugh - Senior System Administrator
Vanderbilt University - Advanced Computing Center for Research and Education
Kevin.Buterbaugh at vanderbilt.edu<mailto:Kevin.Buterbaugh at vanderbilt.edu> - (615)875-9633


_______________________________________________

gpfsug-discuss mailing list

gpfsug-discuss at spectrumscale.org<http://spectrumscale.org/>

http://gpfsug.org/mailman/listinfo/gpfsug-discuss<https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fgpfsug.org%2Fmailman%2Flistinfo%2Fgpfsug-discuss&data=02%7C01%7CKevin.Buterbaugh%40vanderbilt.edu%7Cee10c1e22a474fedceb408d678318231%7Cba5a7f39e3be4ab3b45067fa80faecad%7C0%7C1%7C636828551398045995&sdata=m0nd3Ln0qalNEUCGQmj%2B2ZEQXYCTPzKiYJcSmFXkGZQ%3D&reserved=0>


--

Peter Childs
ITS Research Storage
Queen Mary, University of London

_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org<http://spectrumscale.org/>
https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fgpfsug.org%2Fmailman%2Flistinfo%2Fgpfsug-discuss&amp;data=02%7C01%7CKevin.Buterbaugh%40vanderbilt.edu%7Cee10c1e22a474fedceb408d678318231%7Cba5a7f39e3be4ab3b45067fa80faecad%7C0%7C1%7C636828551398056004&amp;sdata=F56RKhMef0zYjAj2dKFu3bAuq7xQvFoulYhwDnfN1Ms%3D&amp;reserved=0

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20190112/1d1ba6dd/attachment.htm>

From valdis.kletnieks at vt.edu  Sat Jan 12 20:42:42 2019
From: valdis.kletnieks at vt.edu (valdis.kletnieks at vt.edu)
Date: Sat, 12 Jan 2019 15:42:42 -0500
Subject: [gpfsug-discuss] Get list of filesets _without_ running
	mmlsfileset?
In-Reply-To: <1CD7EBDE-F39D-4410-9028-EF9FBF22C6EC@vanderbilt.edu>
References: <30C84AD4-D2BB-4923-A8BD-B51C8D8D347A@vanderbilt.edu>
	<300e6e1f8fb4daf9278f0b67262c4046127e0bbf.camel@qmul.ac.uk>
	<BL0PR05MB50118B21ECB103E3A45711ECA8850@BL0PR05MB5011.namprd05.prod.outlook.com>
	<1CD7EBDE-F39D-4410-9028-EF9FBF22C6EC@vanderbilt.edu>
Message-ID: <13713.1547325762@turing-police.cc.vt.edu>

On Sat, 12 Jan 2019 03:07:29 +0000, "Buterbaugh, Kevin L" said:
> But from there I need to then be able to find out where that fileset is
> mounted in the directory tree so that I can see who the owner and group of that
> directory are.

You're not able to leverage a local naming scheme? There's no connection between
the name of the fileset and where it is in the tree?  I would hope there is, because
otherwise when your tool says 'Fileset ag5eg19 is over quota', your poor user will
now be confused over what director(y/ies) need to be cleaned up.  If your tool
says 'Fileset foo_bar_baz is over quota' and the user knows that's mounted at
/gpfs/foo/bar/baz then it's actionable.

And if the user knows what the mapping is, your script can know it too....


From scottg at emailhosting.com  Mon Jan 14 04:09:57 2019
From: scottg at emailhosting.com (Scott Goldman)
Date: Sun, 13 Jan 2019 23:09:57 -0500
Subject: [gpfsug-discuss] Get list of filesets _without_
	running	mmlsfileset?
In-Reply-To: <13713.1547325762@turing-police.cc.vt.edu>
Message-ID: <u569ppa8lavhogh5slk01oju.1547438997683@emailhosting.com>

Kevin,
Something I've done in the past is to create a service that once an hour/day/week that would build a static file that consists of the needed output.

As long as you can take the update delay (or perhaps trigger the update with a callback), this should work and could actually be lighter on the system.


Sent from my BlackBerry - the most secure mobile device

? Original Message ?
From: valdis.kletnieks at vt.edu
Sent: January 12, 2019 4:07 PM
To: gpfsug-discuss at spectrumscale.org
Reply-to: gpfsug-discuss at spectrumscale.org
Subject: Re: [gpfsug-discuss] Get list of filesets _without_ running	mmlsfileset?

On Sat, 12 Jan 2019 03:07:29 +0000, "Buterbaugh, Kevin L" said:
> But from there I need to then be able to find out where that fileset is
> mounted in the directory tree so that I can see who the owner and group of that
> directory are.

You're not able to leverage a local naming scheme? There's no connection between
the name of the fileset and where it is in the tree?? I would hope there is, because
otherwise when your tool says 'Fileset ag5eg19 is over quota', your poor user will
now be confused over what director(y/ies) need to be cleaned up.? If your tool
says 'Fileset foo_bar_baz is over quota' and the user knows that's mounted at
/gpfs/foo/bar/baz then it's actionable.

And if the user knows what the mapping is, your script can know it too....

_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss

From olaf.weiser at de.ibm.com  Mon Jan 14 06:31:28 2019
From: olaf.weiser at de.ibm.com (Olaf Weiser)
Date: Mon, 14 Jan 2019 07:31:28 +0100
Subject: [gpfsug-discuss] A cautionary tale of upgrades
In-Reply-To: <7085B14A-92B4-47DE-9411-FB544FD4A610@bham.ac.uk>
References: <7085B14A-92B4-47DE-9411-FB544FD4A610@bham.ac.uk>
Message-ID: <OF6D5DC3FB.FA9709D1-ONC1258382.0023353E-C1258382.0023D6C5@notes.na.collabserv.com>

An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20190114/c63e57c0/attachment.htm>

From sandeep.patil at in.ibm.com  Mon Jan 14 12:54:29 2019
From: sandeep.patil at in.ibm.com (Sandeep Ramesh)
Date: Mon, 14 Jan 2019 12:54:29 +0000
Subject: [gpfsug-discuss] Latest Technical Blogs on IBM Spectrum Scale (Q4
	2018)
In-Reply-To: <OF0BEA5F18.0E4A8655-ON6525831B.0051B859-6525831B.00540D1A@LocalDomain>
References: <OF7A360CDE.FA6DB691-ON652581DA.005047B1-652581DA.00510C76@LocalDomain>
	<OF574EC5A3.432467EB-ON65258211.00247AF9-65258211.0024E8C2@LocalDomain>
	<OF3AFFA28C.972DCC84-ON6525825D.0040EC76-6525825D.004159E3@LocalDomain>
	<OFA6EC728F.FF378285-ON652582BE.00649A77-652582BE.0066D779@LocalDomain>
	<OF0BEA5F18.0E4A8655-ON6525831B.0051B859-6525831B.00540D1A@LocalDomain>
Message-ID: <OFDAD8861F.EBFB80F2-ON65258382.0045FED0-65258382.0046E7E2@notes.na.collabserv.com>

Dear User Group Members,

In continuation, here are list of development blogs in the this quarter 
(Q4 2018). We now have over 100+ developer blogs on Spectrum Scale/ESS. As 
discussed in User Groups, passing it along to the emailing list.

Redpaper: IBM Spectrum Scale and IBM StoredIQ: Identifying and securing 
your business data to support regulatory requirements
http://www.redbooks.ibm.com/abstracts/redp5525.html?Open

IBM Spectrum Scale Memory Usage
https://www.slideshare.net/tomerperry/ibm-spectrum-scale-memory-usage?qid=50a1dfda-3102-484f-b9d0-14b69fc4800b&v=&b=&from_search=2

Spectrum Scale and Containers
https://developer.ibm.com/storage/2018/12/20/spectrum-scale-and-containers/

IBM Elastic Storage Server Performance Graphical Visualization with 
Grafana
https://developer.ibm.com/storage/2018/12/18/ibm-elastic-storage-server-performance-graphical-visualization-with-grafana/

Hadoop Performance for disaggregated compute and storage configurations 
based on IBM Spectrum Scale Storage
https://developer.ibm.com/storage/2018/12/13/hadoop-performance-for-disaggregated-compute-and-storage-configurations-based-on-ibm-spectrum-scale-storage/

EMS HA in ESS LE (Little Endian) environment
https://developer.ibm.com/storage/2018/12/07/ems-ha-in-ess-le-little-endian-environment/

What?s new in ESS 5.3.2
https://developer.ibm.com/storage/2018/12/04/whats-new-in-ess-5-3-2/

Administer your Spectrum Scale cluster easily
https://developer.ibm.com/storage/2018/11/13/administer-your-spectrum-scale-cluster-easily/

Disaster Recovery using Spectrum Scale?s Active File Management
https://developer.ibm.com/storage/2018/11/13/disaster-recovery-using-spectrum-scales-active-file-management/

Recovery Group Failover Procedure of IBM Elastic Storage Server (ESS)
https://developer.ibm.com/storage/2018/10/08/recovery-group-failover-procedure-ibm-elastic-storage-server-ess/

Whats new in IBM Elastic Storage Server (ESS) Version 5.3.1 and 5.3.1.1
https://developer.ibm.com/storage/2018/10/04/whats-new-ibm-elastic-storage-server-ess-version-5-3-1-5-3-1-1/

For more : Search /browse here: https://developer.ibm.com/storage/blog

User Group Presentations: 
https://www.spectrumscale.org/presentations/

Consolidation list: 
https://www.ibm.com/developerworks/community/wikis/home?lang=en#!/wiki/General%20Parallel%20File%20System%20(GPFS)/page/Blogs%2C%20White%20Papers%20%26%20Media


From:   Sandeep Ramesh/India/IBM
To:     gpfsug-discuss at spectrumscale.org
Date:   10/03/2018 08:48 PM
Subject:        Latest Technical Blogs on IBM Spectrum Scale (Q3 2018)


Dear User Group Members,

In continuation, here are list of development blogs in the this quarter 
(Q3 2018). We now have over 100+ developer blogs on Spectrum Scale/ESS. As 
discussed in User Groups, passing it along to the emailing list.

How NFS exports became more dynamic with Spectrum Scale 5.0.2
https://developer.ibm.com/storage/2018/10/02/nfs-exports-became-dynamic-spectrum-scale-5-0-2/

HPC storage on AWS (IBM Spectrum Scale)
https://developer.ibm.com/storage/2018/10/02/hpc-storage-aws-ibm-spectrum-scale/

Upgrade with Excluding the node(s) using Install-toolkit
https://developer.ibm.com/storage/2018/09/30/upgrade-excluding-nodes-using-install-toolkit/

Offline upgrade using Install-toolkit
https://developer.ibm.com/storage/2018/09/30/offline-upgrade-using-install-toolkit/

IBM Spectrum Scale for Linux on IBM Z ? What?s new in IBM Spectrum Scale 
5.0.2 ?
https://developer.ibm.com/storage/2018/09/21/ibm-spectrum-scale-for-linux-on-ibm-z-whats-new-in-ibm-spectrum-scale-5-0-2/

What?s New in IBM Spectrum Scale 5.0.2 ?
https://developer.ibm.com/storage/2018/09/15/whats-new-ibm-spectrum-scale-5-0-2/

Starting IBM Spectrum Scale 5.0.2 release, the installation toolkit 
supports upgrade rerun if fresh upgrade fails.
https://developer.ibm.com/storage/2018/09/15/starting-ibm-spectrum-scale-5-0-2-release-installation-toolkit-supports-upgrade-rerun-fresh-upgrade-fails/

IBM Spectrum Scale installation toolkit ? enhancements over releases ? 
5.0.2.0
https://developer.ibm.com/storage/2018/09/15/ibm-spectrum-scale-installation-toolkit-enhancements-releases-5-0-2-0/

Announcing HDP 3.0 support with IBM Spectrum Scale
https://developer.ibm.com/storage/2018/08/31/announcing-hdp-3-0-support-ibm-spectrum-scale/

IBM Spectrum Scale Tuning Overview for Hadoop Workload
https://developer.ibm.com/storage/2018/08/20/ibm-spectrum-scale-tuning-overview-hadoop-workload/

Making the Most of Multicloud Storage
https://developer.ibm.com/storage/2018/08/13/making-multicloud-storage/

Disaster Recovery for Transparent Cloud Tiering using SOBAR
https://developer.ibm.com/storage/2018/08/13/disaster-recovery-transparent-cloud-tiering-using-sobar/

Your Optimal Choice of AI Storage for Today and Tomorrow
https://developer.ibm.com/storage/2018/08/10/spectrum-scale-ai-workloads/

Analyze IBM Spectrum Scale File Access Audit with ELK Stack
https://developer.ibm.com/storage/2018/07/30/analyze-ibm-spectrum-scale-file-access-audit-elk-stack/

Mellanox SX1710 40G switch MLAG configuration for IBM ESS
https://developer.ibm.com/storage/2018/07/12/mellanox-sx1710-40g-switcher-mlag-configuration/

Protocol Problem Determination Guide for IBM Spectrum Scale? ? SMB and NFS 
Access issues
https://developer.ibm.com/storage/2018/07/10/protocol-problem-determination-guide-ibm-spectrum-scale-smb-nfs-access-issues/

Access Control in IBM Spectrum Scale Object
https://developer.ibm.com/storage/2018/07/06/access-control-ibm-spectrum-scale-object/

IBM Spectrum Scale HDFS Transparency Docker support
https://developer.ibm.com/storage/2018/07/06/ibm-spectrum-scale-hdfs-transparency-docker-support/

Protocol Problem Determination Guide for IBM Spectrum Scale? ? Log 
Collection
https://developer.ibm.com/storage/2018/07/04/protocol-problem-determination-guide-ibm-spectrum-scale-log-collection/


Redpapers

IBM Spectrum Scale Immutability Introduction, Configuration Guidance, 
and Use Cases
http://www.redbooks.ibm.com/abstracts/redp5507.html?Open

Certifications
Assessment of the immutability function of IBM Spectrum Scale Version 5.0 
in accordance to US SEC17a-4f, EU GDPR Article 21 Section 1, German and 
Swiss laws and regulations in collaboration with KPMG.

Certificate: 
http://www.kpmg.de/bescheinigungen/RequestReport.aspx?DE968667B47544FF83F6CCDCF37E5FB5
Full assessment report: 
http://www.kpmg.de/bescheinigungen/RequestReport.aspx?B290411BE1224F5A9B4D24663BCD3C5D

For more : Search /browse here: https://developer.ibm.com/storage/blog

Consolidation list: 
https://www.ibm.com/developerworks/community/wikis/home?lang=en#!/wiki/General%20Parallel%20File%20System%20(GPFS)/page/White%20Papers%20%26%20Media


From:   Sandeep Ramesh/India/IBM
To:     gpfsug-discuss at spectrumscale.org
Date:   07/03/2018 12:13 AM
Subject:        Re: Latest Technical Blogs on Spectrum Scale (Q2 2018)


Dear User Group Members,

In continuation , here are list of development blogs in the this quarter 
(Q2 2018). We now have over 100+ developer blogs. As discussed in User 
Groups, passing it along:

IBM Spectrum Scale 5.0.1 ? Whats new in Unified File and Object
https://developer.ibm.com/storage/2018/06/15/6494/

IBM Spectrum Scale ILM Policies
https://developer.ibm.com/storage/2018/06/02/ibm-spectrum-scale-ilm-policies/

IBM Spectrum Scale 5.0.1 ? Whats new in Unified File and Object
https://developer.ibm.com/storage/2018/06/15/6494/

Management GUI enhancements in IBM Spectrum Scale release 5.0.1
https://developer.ibm.com/storage/2018/05/18/management-gui-enhancements-in-ibm-spectrum-scale-release-5-0-1/


Managing IBM Spectrum Scale services through GUI
https://developer.ibm.com/storage/2018/05/18/managing-ibm-spectrum-scale-services-through-gui/


Use AWS CLI with IBM Spectrum Scale? object storage
https://developer.ibm.com/storage/2018/05/16/use-awscli-with-ibm-spectrum-scale-object-storage/

Hadoop Storage Tiering with IBM Spectrum Scale
https://developer.ibm.com/storage/2018/05/09/hadoop-storage-tiering-ibm-spectrum-scale/

How many Files on my Filesystem?
https://developer.ibm.com/storage/2018/05/07/many-files-filesystem/

Recording Spectrum Scale Object Stats for Potential Billing like Purpose 
using Elasticsearch
https://developer.ibm.com/storage/2018/05/04/spectrum-scale-object-stats-for-billing-using-elasticsearch/

New features in IBM Elastic Storage Server (ESS) Version 5.3
https://developer.ibm.com/storage/2018/04/09/new-features-ibm-elastic-storage-server-ess-version-5-3/


Using IBM Spectrum Scale for storage in IBM Cloud Private (Missed to send 
earlier)
https://medium.com/ibm-cloud/ibm-spectrum-scale-with-ibm-cloud-private-8bf801796f19


Redpapers

Hortonworks Data Platform with IBM Spectrum Scale: Reference Guide for 
Building an Integrated Solution
http://www.redbooks.ibm.com/redpieces/abstracts/redp5448.html, 

Enabling Hybrid Cloud Storage for IBM Spectrum Scale Using Transparent 
Cloud Tiering
http://www.redbooks.ibm.com/abstracts/redp5411.html?Open

SAP HANA and ESS: A Winning Combination (Update)
http://www.redbooks.ibm.com/abstracts/redp5436.html?Open


Others
IBM Spectrum Scale Software Version Recommendation Preventive Service 
Planning (Updated)
http://www-01.ibm.com/support/docview.wss?uid=ssg1S1009703, 

IDC Infobrief: A Modular Approach to Genomics Infrastructure at Scale in 
HCLS
https://www.ibm.com/common/ssi/cgi-bin/ssialias?htmlfid=37016937USEN&

For more : Search /browse here: https://developer.ibm.com/storage/blog

Consolidation list: 
https://www.ibm.com/developerworks/community/wikis/home?lang=en#!/wiki/General%20Parallel%20File%20System%20(GPFS)/page/White%20Papers%20%26%20Media


From:   Sandeep Ramesh/India/IBM
To:     gpfsug-discuss at spectrumscale.org
Date:   03/27/2018 05:23 PM
Subject:        Re: Latest Technical Blogs on Spectrum Scale


Dear User Group Members,

In continuation , here are list of development blogs in the this quarter 
(Q1 2018). As discussed in User Groups, passing it along:

GDPR Compliance and Unstructured Data Storage
https://developer.ibm.com/storage/2018/03/27/gdpr-compliance-unstructure-data-storage/

IBM Spectrum Scale for Linux on IBM Z ? Release 5.0 features and 
highlights
https://developer.ibm.com/storage/2018/03/09/ibm-spectrum-scale-linux-ibm-z-release-5-0-features-highlights/

Management GUI enhancements in IBM Spectrum Scale release 5.0.0
https://developer.ibm.com/storage/2018/01/18/gui-enhancements-in-spectrum-scale-release-5-0-0/

IBM Spectrum Scale 5.0.0 ? What?s new in NFS?
https://developer.ibm.com/storage/2018/01/18/ibm-spectrum-scale-5-0-0-whats-new-nfs/

Benefits and implementation of Spectrum Scale sudo wrappers
https://developer.ibm.com/storage/2018/01/15/benefits-implementation-spectrum-scale-sudo-wrappers/

IBM Spectrum Scale: Big Data and Analytics Solution Brief
https://developer.ibm.com/storage/2018/01/15/ibm-spectrum-scale-big-data-analytics-solution-brief/

Variant Sub-blocks in Spectrum Scale 5.0
https://developer.ibm.com/storage/2018/01/11/spectrum-scale-variant-sub-blocks/

Compression support in Spectrum Scale 5.0.0
https://developer.ibm.com/storage/2018/01/11/compression-support-spectrum-scale-5-0-0/

IBM Spectrum Scale Versus Apache Hadoop HDFS
https://developer.ibm.com/storage/2018/01/10/spectrumscale_vs_hdfs/

ESS Fault Tolerance
https://developer.ibm.com/storage/2018/01/09/ess-fault-tolerance/


Genomic Workloads ? How To Get it Right From Infrastructure Point Of View.


https://developer.ibm.com/storage/2018/01/06/genomic-workloads-get-right-infrastructure-point-view/


IBM Spectrum Scale On AWS Cloud : This video explains how to deploy IBM 
Spectrum Scale on AWS. This solution helps the users who require highly 
available access to a shared name space across multiple instances with 
good performance, without requiring an in-depth knowledge of IBM Spectrum 
Scale.
Detailed Demo : https://www.youtube.com/watch?v=6j5Xj_d0bh4
Brief Demo : https://www.youtube.com/watch?v=-aMQKPW_RfY.

For more : Search /browse here: https://developer.ibm.com/storage/blog

Consolidation list: 
https://www.ibm.com/developerworks/community/wikis/home?lang=en#!/wiki/General%20Parallel%20File%20System%20(GPFS)/page/White%20Papers%20%26%20Media


From:   Sandeep Ramesh/India/IBM
To:     gpfsug-discuss at spectrumscale.org
Cc:     Doris Conti/Poughkeepsie/IBM at IBMUS
Date:   01/10/2018 12:13 PM
Subject:        Re: Latest Technical Blogs on Spectrum Scale


Dear User Group Members,

Here are list of development blogs in the last quarter. Passing it to this 
email group as Doris had got a feedback in the UG meetings to notify the 
members with the latest updates periodically.

Genomic Workloads ? How To Get it Right From Infrastructure Point Of View.
https://developer.ibm.com/storage/2018/01/06/genomic-workloads-get-right-infrastructure-point-view/

IBM Spectrum Scale Versus Apache Hadoop HDFS
https://developer.ibm.com/storage/2018/01/10/spectrumscale_vs_hdfs/

ESS Fault Tolerance
https://developer.ibm.com/storage/2018/01/09/ess-fault-tolerance/

IBM Spectrum Scale MMFSCK ? Savvy Enhancements
https://developer.ibm.com/storage/2018/01/05/ibm-spectrum-scale-mmfsck-savvy-enhancements/

ESS Disk Management
https://developer.ibm.com/storage/2018/01/02/ess-disk-management/

IBM Spectrum Scale Object Protocol On Ubuntu
https://developer.ibm.com/storage/2018/01/01/ibm-spectrum-scale-object-protocol-ubuntu/

IBM Spectrum Scale 5.0 ? Whats new in Unified File and Object
https://developer.ibm.com/storage/2017/12/20/ibm-spectrum-scale-5-0-whats-new-object/

A Complete Guide to ? Protocol Problem Determination Guide for IBM 
Spectrum Scale? ? Part 1
https://developer.ibm.com/storage/2017/12/19/complete-guide-protocol-problem-determination-guide-ibm-spectrum-scale-1/

IBM Spectrum Scale installation toolkit ? enhancements over releases
https://developer.ibm.com/storage/2017/12/15/ibm-spectrum-scale-installation-toolkit-enhancements-releases/

Network requirements in an Elastic Storage Server Setup
https://developer.ibm.com/storage/2017/12/13/network-requirements-in-an-elastic-storage-server-setup/

Co-resident migration with Transparent cloud tierin
https://developer.ibm.com/storage/2017/12/05/co-resident-migration-transparent-cloud-tierin/

IBM Spectrum Scale on Hortonworks HDP Hadoop clusters : A Complete Big 
Data Solution
https://developer.ibm.com/storage/2017/12/05/ibm-spectrum-scale-hortonworks-hdp-hadoop-clusters-complete-big-data-solution/

Big data analytics with Spectrum Scale using remote cluster mount & 
multi-filesystem support
https://developer.ibm.com/storage/2017/11/28/big-data-analytics-spectrum-scale-using-remote-cluster-mount-multi-filesystem-support/

IBM Spectrum Scale HDFS Transparency Short Circuit Write Support
https://developer.ibm.com/storage/2017/11/28/ibm-spectrum-scale-hdfs-transparency-short-circuit-write-support/

IBM Spectrum Scale HDFS Transparency Federation Support
https://developer.ibm.com/storage/2017/11/27/ibm-spectrum-scale-hdfs-transparency-federation-support/

How to configure and performance tuning different system workloads on IBM 
Spectrum Scale Sharing Nothing Cluster
https://developer.ibm.com/storage/2017/11/27/configure-performance-tuning-different-system-workloads-ibm-spectrum-scale-sharing-nothing-cluster/

How to configure and performance tuning Spark workloads on IBM Spectrum 
Scale Sharing Nothing Cluster
https://developer.ibm.com/storage/2017/11/27/configure-performance-tuning-spark-workloads-ibm-spectrum-scale-sharing-nothing-cluster/

How to configure and performance tuning database workloads on IBM Spectrum 
Scale Sharing Nothing Cluster
https://developer.ibm.com/storage/2017/11/27/configure-performance-tuning-database-workloads-ibm-spectrum-scale-sharing-nothing-cluster/

How to configure and performance tuning Hadoop workloads on IBM Spectrum 
Scale Sharing Nothing Cluster

https://developer.ibm.com/storage/2017/11/24/configure-performance-tuning-hadoop-workloads-ibm-spectrum-scale-sharing-nothing-cluster/

IBM Spectrum Scale Sharing Nothing Cluster Performance Tuning
https://developer.ibm.com/storage/2017/11/24/ibm-spectrum-scale-sharing-nothing-cluster-performance-tuning/

How to Configure IBM Spectrum Scale? with NIS based Authentication.
https://developer.ibm.com/storage/2017/11/21/configure-ibm-spectrum-scale-nis-based-authentication/


For more : Search /browse here: https://developer.ibm.com/storage/blog

Consolidation list: 
https://www.ibm.com/developerworks/community/wikis/home?lang=en#!/wiki/General%20Parallel%20File%20System%20(GPFS)/page/White%20Papers%20%26%20Media


From:   Sandeep Ramesh/India/IBM
To:     gpfsug-discuss at spectrumscale.org
Cc:     Doris Conti/Poughkeepsie/IBM at IBMUS
Date:   11/16/2017 08:15 PM
Subject:        Latest Technical Blogs on Spectrum Scale


Dear User Group members,

Here are the Development Blogs in last 3 months on Spectrum Scale 
Technical Topics.

Spectrum Scale Monitoring ? Know More ?
https://developer.ibm.com/storage/2017/11/16/spectrum-scale-monitoring-know/

IBM Spectrum Scale 5.0 Release ? What?s coming !
https://developer.ibm.com/storage/2017/11/14/ibm-spectrum-scale-5-0-release-whats-coming/

Four Essentials things to know for managing data ACLs on IBM Spectrum 
Scale? from Windows
https://developer.ibm.com/storage/2017/11/13/four-essentials-things-know-managing-data-acls-ibm-spectrum-scale-windows/

GSSUTILS: A new way of running SSR, Deploying or Upgrading ESS Server
https://developer.ibm.com/storage/2017/11/13/gssutils/

IBM Spectrum Scale Object Authentication
https://developer.ibm.com/storage/2017/11/02/spectrum-scale-object-authentication/

Video Surveillance ? Choosing the right storage
https://developer.ibm.com/storage/2017/11/02/video-surveillance-choosing-right-storage/


IBM Spectrum scale object deep dive training with problem determination
https://www.slideshare.net/SmitaRaut/ibm-spectrum-scale-object-deep-dive-training


Spectrum Scale as preferred software defined storage for Ubuntu OpenStack
https://developer.ibm.com/storage/2017/09/29/spectrum-scale-preferred-software-defined-storage-ubuntu-openstack/

IBM Elastic Storage Server 2U24 Storage ? an All-Flash offering, a 
performance workhorse
https://developer.ibm.com/storage/2017/10/06/ess-5-2-flash-storage/

A Complete Guide to Configure LDAP-based authentication with IBM Spectrum 
Scale? for File Access
https://developer.ibm.com/storage/2017/09/21/complete-guide-configure-ldap-based-authentication-ibm-spectrum-scale-file-access/

Deploying IBM Spectrum Scale on AWS Quick Start
https://developer.ibm.com/storage/2017/09/18/deploy-ibm-spectrum-scale-on-aws-quick-start/

Monitoring Spectrum Scale Object metrics
https://developer.ibm.com/storage/2017/09/14/monitoring-spectrum-scale-object-metrics/

Tier your data with ease to Spectrum Scale Private Cloud(s) using Moonwalk 
Universal

https://developer.ibm.com/storage/2017/09/14/tier-data-ease-spectrum-scale-private-clouds-using-moonwalk-universal/

Why do I see owner as ?Nobody? for my export mounted using NFSV4 Protocol 
on IBM Spectrum Scale??
https://developer.ibm.com/storage/2017/09/08/see-owner-nobody-export-mounted-using-nfsv4-protocol-ibm-spectrum-scale/

IBM Spectrum Scale? Authentication using Active Directory and LDAP
https://developer.ibm.com/storage/2017/09/01/ibm-spectrum-scale-authentication-using-active-directory-ldap/

IBM Spectrum Scale? Authentication using Active Directory and RFC2307
https://developer.ibm.com/storage/2017/09/01/ibm-spectrum-scale-authentication-using-active-directory-rfc2307/

High Availability Implementation with IBM Spectrum Virtualize and IBM 
Spectrum Scale
https://developer.ibm.com/storage/2017/08/30/high-availability-implementation-ibm-spectrum-virtualize-ibm-spectrum-scale/

10 Frequently asked Questions on configuring Authentication using AD + 
AUTO ID mapping on IBM Spectrum Scale?.
https://developer.ibm.com/storage/2017/08/04/10-frequently-asked-questions-configuring-authentication-using-ad-auto-id-mapping-ibm-spectrum-scale/

IBM Spectrum Scale? Authentication using Active Directory
https://developer.ibm.com/storage/2017/07/30/ibm-spectrum-scale-auth-using-active-directory/

Five cool things that you didn?t know Transparent Cloud Tiering on 
Spectrum Scale can do
https://developer.ibm.com/storage/2017/07/29/five-cool-things-didnt-know-transparent-cloud-tiering-spectrum-scale-can/

IBM Spectrum Scale GUI videos
https://developer.ibm.com/storage/2017/07/25/ibm-spectrum-scale-gui-videos/

IBM Spectrum Scale? Authentication ? Planning for NFS Access
https://developer.ibm.com/storage/2017/07/24/ibm-spectrum-scale-planning-nfs-access/

For more : Search /browse here: https://developer.ibm.com/storage/blog

Consolidation list: 
https://www.ibm.com/developerworks/community/wikis/home?lang=en#!/wiki/General%20Parallel%20File%20System%20(GPFS)/page/White%20Papers%20%26%20Media


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20190114/39ec5796/attachment.htm>

From cabrillo at ifca.unican.es  Tue Jan 15 10:49:58 2019
From: cabrillo at ifca.unican.es (Iban Cabrillo)
Date: Tue, 15 Jan 2019 11:49:58 +0100 (CET)
Subject: [gpfsug-discuss] Monitoring non gpfs cluster nodes using pmsensors
Message-ID: <1730394866.8701339.1547549398355.JavaMail.zimbra@ifca.unican.es>

Dear, 
The gpfsgui dashboard show us most part of relevant information for cluster management. Avoiding to install other plot utilities (like graphana for example), we want to explore the possibility to use this packages to harvest and plot this information, in order to centralize the graph management in one only place. We see this information arrives to the gpfsgui node (from non gpfs cluster nodes), but we can't show the plots. 
Is there any way to use the pmsensor and pmcollector packages to monitorice / plot non gpfs cluster nodes using the gpfsgui dashboard ? 
Regards, I 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20190115/8b6977b2/attachment.htm>

From Kevin.Buterbaugh at Vanderbilt.Edu  Mon Jan 14 15:02:07 2019
From: Kevin.Buterbaugh at Vanderbilt.Edu (Buterbaugh, Kevin L)
Date: Mon, 14 Jan 2019 15:02:07 +0000
Subject: [gpfsug-discuss] Get list of filesets
	_without_	running	mmlsfileset?
In-Reply-To: <u569ppa8lavhogh5slk01oju.1547438997683@emailhosting.com>
References: <u569ppa8lavhogh5slk01oju.1547438997683@emailhosting.com>
Message-ID: <5DCBA252-33B7-4AC9-B2E3-10DA6881B1AA@vanderbilt.edu>

Hi Scott and Valdis (and everyone else),

Thanks for your responses.

Yes, we _could_ easily build a local naming scheme ? the name of the fileset matches the name of a folder in one of a couple of parent directories.  However, an earlier response to my post asked if we?d be willing to share our script with the community and we would ? _if_ we can make it generic enough to be useful.  Local naming schemes hardcoded in the script make it much less generically useful.

Plus, it just seems to me that there ought to be a way to do this ? to get a list of fileset names from mmlsquota and then programmatically determine their junction path without having root privileges.  GPFS has got to be storing that information somewhere, and I?m frankly quite surprised that no IBMer has responded with an answer to that.  But I also know that when IBM is silent, there?s typically a reason.

And yes, we could regularly create a static file ? in fact, that?s what we do now once per day (in the early morning hours).  While this is not a huge deal - we only create / delete filesets a handful of times per month - on the day we do the script won?t function properly unless we manually update the file.  I?m wanting to eliminate that, if possible ? which as I stated in the preceding paragraph, I have a hard time believing is not possible.

I did look at the list of callbacks again (good thought!) and there?s not one specifically related to the creation / deletion of a fileset.  There was only one that I saw that I think could even possibly be of use ? ccrFileChange.  Can anyone on the list confirm or deny that the creation / deletion of a fileset would cause that callback to be triggered??  If it is triggered, then we could use that to update the static filesets within a minute or two of the change being made, which would definitely be acceptable.  I realize that many things likely trigger a ccrFileChange, so I?m thinking of having a callback script that checks the current list of filesets against the static file and updates that appropriately.

Thanks again for the responses?

Kevin

> On Jan 13, 2019, at 10:09 PM, Scott Goldman <scottg at emailhosting.com> wrote:
> 
> Kevin,
> Something I've done in the past is to create a service that once an hour/day/week that would build a static file that consists of the needed output.
> 
> As long as you can take the update delay (or perhaps trigger the update with a callback), this should work and could actually be lighter on the system.
> 
> Sent from my BlackBerry - the most secure mobile device
> 
>   Original Message  
> From: valdis.kletnieks at vt.edu
> Sent: January 12, 2019 4:07 PM
> To: gpfsug-discuss at spectrumscale.org
> Reply-to: gpfsug-discuss at spectrumscale.org
> Subject: Re: [gpfsug-discuss] Get list of filesets _without_ running	mmlsfileset?
> 
> On Sat, 12 Jan 2019 03:07:29 +0000, "Buterbaugh, Kevin L" said:
>> But from there I need to then be able to find out where that fileset is
>> mounted in the directory tree so that I can see who the owner and group of that
>> directory are.
> 
> You're not able to leverage a local naming scheme? There's no connection between
> the name of the fileset and where it is in the tree?  I would hope there is, because
> otherwise when your tool says 'Fileset ag5eg19 is over quota', your poor user will
> now be confused over what director(y/ies) need to be cleaned up.  If your tool
> says 'Fileset foo_bar_baz is over quota' and the user knows that's mounted at
> /gpfs/foo/bar/baz then it's actionable.
> 
> And if the user knows what the mapping is, your script can know it too....
> 


From Kevin.Buterbaugh at Vanderbilt.Edu  Mon Jan 14 15:02:07 2019
From: Kevin.Buterbaugh at Vanderbilt.Edu (Buterbaugh, Kevin L)
Date: Mon, 14 Jan 2019 15:02:07 +0000
Subject: [gpfsug-discuss] Get list of filesets
	_without_	running	mmlsfileset?
In-Reply-To: <u569ppa8lavhogh5slk01oju.1547438997683@emailhosting.com>
References: <u569ppa8lavhogh5slk01oju.1547438997683@emailhosting.com>
Message-ID: <5DCBA252-33B7-4AC9-B2E3-10DA6881B1AA@vanderbilt.edu>

Hi Scott and Valdis (and everyone else),

Thanks for your responses.

Yes, we _could_ easily build a local naming scheme ? the name of the fileset matches the name of a folder in one of a couple of parent directories.  However, an earlier response to my post asked if we?d be willing to share our script with the community and we would ? _if_ we can make it generic enough to be useful.  Local naming schemes hardcoded in the script make it much less generically useful.

Plus, it just seems to me that there ought to be a way to do this ? to get a list of fileset names from mmlsquota and then programmatically determine their junction path without having root privileges.  GPFS has got to be storing that information somewhere, and I?m frankly quite surprised that no IBMer has responded with an answer to that.  But I also know that when IBM is silent, there?s typically a reason.

And yes, we could regularly create a static file ? in fact, that?s what we do now once per day (in the early morning hours).  While this is not a huge deal - we only create / delete filesets a handful of times per month - on the day we do the script won?t function properly unless we manually update the file.  I?m wanting to eliminate that, if possible ? which as I stated in the preceding paragraph, I have a hard time believing is not possible.

I did look at the list of callbacks again (good thought!) and there?s not one specifically related to the creation / deletion of a fileset.  There was only one that I saw that I think could even possibly be of use ? ccrFileChange.  Can anyone on the list confirm or deny that the creation / deletion of a fileset would cause that callback to be triggered??  If it is triggered, then we could use that to update the static filesets within a minute or two of the change being made, which would definitely be acceptable.  I realize that many things likely trigger a ccrFileChange, so I?m thinking of having a callback script that checks the current list of filesets against the static file and updates that appropriately.

Thanks again for the responses?

Kevin

> On Jan 13, 2019, at 10:09 PM, Scott Goldman <scottg at emailhosting.com> wrote:
> 
> Kevin,
> Something I've done in the past is to create a service that once an hour/day/week that would build a static file that consists of the needed output.
> 
> As long as you can take the update delay (or perhaps trigger the update with a callback), this should work and could actually be lighter on the system.
> 
> Sent from my BlackBerry - the most secure mobile device
> 
>   Original Message  
> From: valdis.kletnieks at vt.edu
> Sent: January 12, 2019 4:07 PM
> To: gpfsug-discuss at spectrumscale.org
> Reply-to: gpfsug-discuss at spectrumscale.org
> Subject: Re: [gpfsug-discuss] Get list of filesets _without_ running	mmlsfileset?
> 
> On Sat, 12 Jan 2019 03:07:29 +0000, "Buterbaugh, Kevin L" said:
>> But from there I need to then be able to find out where that fileset is
>> mounted in the directory tree so that I can see who the owner and group of that
>> directory are.
> 
> You're not able to leverage a local naming scheme? There's no connection between
> the name of the fileset and where it is in the tree?  I would hope there is, because
> otherwise when your tool says 'Fileset ag5eg19 is over quota', your poor user will
> now be confused over what director(y/ies) need to be cleaned up.  If your tool
> says 'Fileset foo_bar_baz is over quota' and the user knows that's mounted at
> /gpfs/foo/bar/baz then it's actionable.
> 
> And if the user knows what the mapping is, your script can know it too....
> 


From makaplan at us.ibm.com  Tue Jan 15 14:46:18 2019
From: makaplan at us.ibm.com (Marc A Kaplan)
Date: Tue, 15 Jan 2019 11:46:18 -0300
Subject: [gpfsug-discuss] Get list of
	filesets_without_runningmmlsfileset?
In-Reply-To: <5DCBA252-33B7-4AC9-B2E3-10DA6881B1AA@vanderbilt.edu>
References: <u569ppa8lavhogh5slk01oju.1547438997683@emailhosting.com>
	<5DCBA252-33B7-4AC9-B2E3-10DA6881B1AA@vanderbilt.edu>
Message-ID: <OFC097E468.FCB57B6B-ON03258383.004F2153-03258383.005124A9@notes.na.collabserv.com>

Personally, I agree that there ought to be a way in the product.

In the meawhile, you no doubt already have some ways to tell your users 
where to find their filesets as pathnames.
Otherwise, how are they accessing their files?

And to keep things somewhat sane, I'd bet filesets are all linked to one 
or small number of well known paths in the filesystem.
Like  /AGpfsFilesystem/filesets/...  Plus you could add symlinks and/or as 
has been suggested post info extracted from mmlsfileset and/or mmlsquota.

So as a practical matter, is this an urgent problem...?  Why?  How?

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20190115/45467c9c/attachment.htm>

From Kevin.Buterbaugh at Vanderbilt.Edu  Tue Jan 15 15:11:41 2019
From: Kevin.Buterbaugh at Vanderbilt.Edu (Buterbaugh, Kevin L)
Date: Tue, 15 Jan 2019 15:11:41 +0000
Subject: [gpfsug-discuss] Get list
	of	filesets_without_runningmmlsfileset?
In-Reply-To: <OFC097E468.FCB57B6B-ON03258383.004F2153-03258383.005124A9@notes.na.collabserv.com>
References: <u569ppa8lavhogh5slk01oju.1547438997683@emailhosting.com>
	<5DCBA252-33B7-4AC9-B2E3-10DA6881B1AA@vanderbilt.edu>
	<OFC097E468.FCB57B6B-ON03258383.004F2153-03258383.005124A9@notes.na.collabserv.com>
Message-ID: <0D5558D9-9003-4B95-9A37-42321E03114D@vanderbilt.edu>

Hi Marc (All),

Yes, I can easily determine where filesets are linked here ? it is, as you said, in just one or two paths.  The script as it stands now has been doing that for several years and only needs a couple of relatively minor tweaks to be even more useful to _us_ by whittling down a couple of edge cases relating to fileset creation / deletion.

However ? there was a request to share the script with the broader community ? something I?m willing to do if I can get it in a state where it would be useful to others with little or no modification.  Anybody who?s been on this list for any length of time knows how much help I?ve received from the community over the years.  I truly appreciate that and would like to give back, even in a minor way, if possible.

But in order to do that the script can?t be full of local assumptions ? that?s it in a nutshell ? that?s why I want to programmatically determine the junction path at run time as a non-root user.

I?ll also mention here that early on in this thread Simon Thompson suggested looking into the REST API.  Sure enough, you can get the information that way ? but, AFAICT, that would require the script to contain a username / password combination that would allow anyone with access to the script to then use that authentication information to access other information within GPFS that we probably don?t want them to have access to.  If I?m mistaken about that, then please feel free to enlighten me.

Thanks again?

Kevin

?
Kevin Buterbaugh - Senior System Administrator
Vanderbilt University - Advanced Computing Center for Research and Education
Kevin.Buterbaugh at vanderbilt.edu<mailto:Kevin.Buterbaugh at vanderbilt.edu> - (615)875-9633

On Jan 15, 2019, at 8:46 AM, Marc A Kaplan <makaplan at us.ibm.com<mailto:makaplan at us.ibm.com>> wrote:

Personally, I agree that there ought to be a way in the product.

In the meawhile, you no doubt already have some ways to tell your users where to find their filesets as pathnames.
Otherwise, how are they accessing their files?

And to keep things somewhat sane, I'd bet filesets are all linked to one or small number of well known paths in the filesystem.
Like  /AGpfsFilesystem/filesets/...  Plus you could add symlinks and/or as has been suggested post info extracted from mmlsfileset and/or mmlsquota.

So as a practical matter, is this an urgent problem...?  Why?  How?
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org<http://spectrumscale.org>
https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fgpfsug.org%2Fmailman%2Flistinfo%2Fgpfsug-discuss&amp;data=02%7C01%7CKevin.Buterbaugh%40vanderbilt.edu%7Cbd2c28fdb60041f3434e08d67af83b11%7Cba5a7f39e3be4ab3b45067fa80faecad%7C0%7C0%7C636831603904557717&amp;sdata=A74TTq%2FQvyhEMHaolklbiMAEnaGVuHNiyhVYfn4wRek%3D&amp;reserved=0

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20190115/08365a46/attachment.htm>

From rohwedder at de.ibm.com  Tue Jan 15 15:36:39 2019
From: rohwedder at de.ibm.com (Markus Rohwedder)
Date: Tue, 15 Jan 2019 16:36:39 +0100
Subject: [gpfsug-discuss] Monitoring non gpfs cluster nodes using
	pmsensors
In-Reply-To: <1730394866.8701339.1547549398355.JavaMail.zimbra@ifca.unican.es>
References: <1730394866.8701339.1547549398355.JavaMail.zimbra@ifca.unican.es>
Message-ID: <OF260CED12.6EB39FDA-ON00258383.00547866-C1258383.0055C0C4@notes.na.collabserv.com>


Hello Iban,

the pmsensor and pmcollector packages together with the GUI dashboard and
statistics pages are not designed to be a general monitoring solution.
For example. in many places we are filtering for GPFS nodes that are known
to be cluster members and  we try to match
host names to GPFS node names. This causes the lack of nodes in GUI charts
you are experiencing.
 In addition. the CLI based setup and management of the sensors assume that
sensor nodes are cluster nodes.

We are not intending to open up the internal management and views for data
outside the cluster in the futute.-
The requirements to provide plotting, filtering, aggregation and
calculation in a general plotting environment can be very diverse and we
may not be able to handle this.
So while we are flattered by the request to use our charting capabilities
as a general solution, we propose to use tools like grafana as more general
solution.

Please note that the GUI charts and dashboards have URLs that allow them to
be hyperlinked, so you could also combine other web based charting tools
together with the GUI based charts.

Mit freundlichen Gr??en / Kind regards

Dr. Markus Rohwedder

Spectrum Scale GUI Development
                                                                                   
                                                                                   
 Phone:  +49 7034 6430190      IBM Deutschland Research &                          
                              Development                                          
                                                                                   
 E-Mail: rohwedder at de.ibm.com  Am Weiher 24                                        
                                                                                   
                               65451 Kelsterbach                                   
                                                                                   
                               Germany                                             
                                                                                   
                                                                                   
From:	Iban Cabrillo <cabrillo at ifca.unican.es>
To:	gpfsug-discuss <gpfsug-discuss at spectrumscale.org>
Date:	15.01.2019 12:05
Subject:	[gpfsug-discuss] Monitoring non gpfs cluster nodes using
            pmsensors
Sent by:	gpfsug-discuss-bounces at spectrumscale.org


Dear,
    The gpfsgui dashboard show us most part of relevant information for
cluster management. Avoiding to install other plot utilities (like graphana
for example), we want to explore the possibility to use this packages to
harvest and plot this information, in order to centralize the graph
management in one only place. We see this information arrives to the
gpfsgui node (from non gpfs cluster nodes), but we can't show the plots.
        Is there any way to use the pmsensor and pmcollector packages to
monitorice / plot non gpfs cluster nodes using the gpfsgui dashboard ?
Regards, I
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20190115/9226d2af/attachment.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: ecblank.gif
Type: image/gif
Size: 45 bytes
Desc: not available
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20190115/9226d2af/attachment.gif>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 1D690169.gif
Type: image/gif
Size: 4659 bytes
Desc: not available
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20190115/9226d2af/attachment-0001.gif>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: graycol.gif
Type: image/gif
Size: 105 bytes
Desc: not available
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20190115/9226d2af/attachment-0002.gif>

From S.J.Thompson at bham.ac.uk  Tue Jan 15 15:57:39 2019
From: S.J.Thompson at bham.ac.uk (Simon Thompson)
Date: Tue, 15 Jan 2019 15:57:39 +0000
Subject: [gpfsug-discuss] Monitoring non gpfs cluster nodes using
 pmsensors
Message-ID: <FE520345-7813-42C4-92B7-3F25A0BEAEB4@bham.ac.uk>

Understand that you don?t want to install Grafana on its own, but there is a GPFS Grafana bridge I believe that would allow you to include the GPFS collected data in a Grafana dashboard. So if not wanting to setup sensors for that data is the reason you don?t want Grafana, then using the bridge might pull the data you want?

Simon

From: <gpfsug-discuss-bounces at spectrumscale.org> on behalf of "cabrillo at ifca.unican.es" <cabrillo at ifca.unican.es>
Reply-To: "gpfsug-discuss at spectrumscale.org" <gpfsug-discuss at spectrumscale.org>
Date: Tuesday, 15 January 2019 at 11:05
To: "gpfsug-discuss at spectrumscale.org" <gpfsug-discuss at spectrumscale.org>
Subject: [gpfsug-discuss] Monitoring non gpfs cluster nodes using pmsensors

Dear,
    The gpfsgui dashboard show us most part of relevant information for cluster management. Avoiding to install other plot utilities (like graphana for example), we want to explore the possibility to use this packages to harvest and plot this information, in order to centralize the graph management in one only place. We see this information arrives to the gpfsgui node (from non gpfs cluster nodes), but we can't show the plots.
        Is there any way to use the pmsensor and pmcollector packages to monitorice / plot non gpfs cluster nodes using the gpfsgui dashboard ?
Regards, I

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20190115/114880c8/attachment.htm>

From A.Wolf-Reber at de.ibm.com  Wed Jan 16 08:16:58 2019
From: A.Wolf-Reber at de.ibm.com (Alexander Wolf)
Date: Wed, 16 Jan 2019 08:16:58 +0000
Subject: [gpfsug-discuss] Get list offilesets_without_runningmmlsfileset?
In-Reply-To: <OFC097E468.FCB57B6B-ON03258383.004F2153-03258383.005124A9@notes.na.collabserv.com>
References: <OFC097E468.FCB57B6B-ON03258383.004F2153-03258383.005124A9@notes.na.collabserv.com>,
	<u569ppa8lavhogh5slk01oju.1547438997683@emailhosting.com><5DCBA252-33B7-4AC9-B2E3-10DA6881B1AA@vanderbilt.edu>
Message-ID: <OF9182B43C.B5511A6C-ON00258384.002D4175-00258384.002D7FE7@notes.na.collabserv.com>

An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20190116/22c8982a/attachment.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: Image.15475476039319.png
Type: image/png
Size: 1134 bytes
Desc: not available
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20190116/22c8982a/attachment.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: Image.154754760393110.png
Type: image/png
Size: 6645 bytes
Desc: not available
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20190116/22c8982a/attachment-0001.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: Image.154754760393111.png
Type: image/png
Size: 1134 bytes
Desc: not available
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20190116/22c8982a/attachment-0002.png>

From makaplan at us.ibm.com  Wed Jan 16 12:57:18 2019
From: makaplan at us.ibm.com (Marc A Kaplan)
Date: Wed, 16 Jan 2019 09:57:18 -0300
Subject: [gpfsug-discuss] Get fileset and other info via Rest API and/or
	GUI
In-Reply-To: <OF9182B43C.B5511A6C-ON00258384.002D4175-00258384.002D7FE7@notes.na.collabserv.com>
References: <OFC097E468.FCB57B6B-ON03258383.004F2153-03258383.005124A9@notes.na.collabserv.com>,
	<u569ppa8lavhogh5slk01oju.1547438997683@emailhosting.com><5DCBA252-33B7-4AC9-B2E3-10DA6881B1AA@vanderbilt.edu>
	<OF9182B43C.B5511A6C-ON00258384.002D4175-00258384.002D7FE7@notes.na.collabserv.com>
Message-ID: <OF556A00F1.967FCAA1-ON03258384.00465E04-03258384.00472B61@notes.na.collabserv.com>

Good to know the "Rest" does it for us. Since I started working on GPFS 
internals and CLI utitlities around Release 3.x, I confess I never had 
need of the GUI or the Rest API server.   In fact I do most of my work 
remotely via Putty/Xterm/Emacs and only once-in-a-while even have an 
XWindows or VNC server/view of a GPFS node!  So consider any of my remarks 
in that context.  So I certainly defer to others when it comes to Spectrum 
Scale GUIs, "Protocol" servers and such.   If I'm missing anything great, 
perhaps some kind soul will send me a note offline from this public forum.

--Marc.K of GPFS


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20190116/e410560c/attachment.htm>

From spectrumscale at kiranghag.com  Wed Jan 16 16:18:16 2019
From: spectrumscale at kiranghag.com (KG)
Date: Wed, 16 Jan 2019 21:48:16 +0530
Subject: [gpfsug-discuss] Filesystem automount issues
Message-ID: <CAA-1hNYt=SpQtOKV8ZRxqZzcKWr_X_4HD=C98fPviYEwYh=GoQ@mail.gmail.com>

Hi

IHAC running Scale 5.x on RHEL 7.5

One out of two filesystems (/home) does not get mounted automatically at
boot. (/home is scale filesystem)

The scale log does mention that the filesystem is mounted but mount output
says otherwise.

There are no entries for /home in fstab since we let scale mount it.
Automount on scale and filesystem both have been set to yes.

Any pointers to troubleshoot would be appreciated.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20190116/7ceaab36/attachment.htm>

From stockf at us.ibm.com  Wed Jan 16 16:33:25 2019
From: stockf at us.ibm.com (Frederick Stock)
Date: Wed, 16 Jan 2019 11:33:25 -0500
Subject: [gpfsug-discuss] Filesystem automount issues
In-Reply-To: <CAA-1hNYt=SpQtOKV8ZRxqZzcKWr_X_4HD=C98fPviYEwYh=GoQ@mail.gmail.com>
References: <CAA-1hNYt=SpQtOKV8ZRxqZzcKWr_X_4HD=C98fPviYEwYh=GoQ@mail.gmail.com>
Message-ID: <OF8D6372C1.29181929-ON85258384.005AEADC-85258384.005AF366@notes.na.collabserv.com>

What does the output of "mmlsmount all -L" show?

Fred
__________________________________________________
Fred Stock | IBM Pittsburgh Lab | 720-430-8821
stockf at us.ibm.com


From:   KG <spectrumscale at kiranghag.com>
To:     gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
Date:   01/16/2019 11:19 AM
Subject:        [gpfsug-discuss] Filesystem automount issues
Sent by:        gpfsug-discuss-bounces at spectrumscale.org


Hi 

IHAC running Scale 5.x on RHEL 7.5

One out of two filesystems (/home) does not get mounted automatically at 
boot. (/home is scale filesystem)

The scale log does mention that the filesystem is mounted but mount output 
says otherwise.

There are no entries for /home in fstab since we let scale mount it. 
Automount on scale and filesystem both have been set to yes.

Any pointers to troubleshoot would be appreciated.

_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20190116/daba5610/attachment.htm>

From spectrumscale at kiranghag.com  Wed Jan 16 18:14:39 2019
From: spectrumscale at kiranghag.com (KG)
Date: Wed, 16 Jan 2019 23:44:39 +0530
Subject: [gpfsug-discuss] Filesystem automount issues
In-Reply-To: <OF8D6372C1.29181929-ON85258384.005AEADC-85258384.005AF366@notes.na.collabserv.com>
References: <CAA-1hNYt=SpQtOKV8ZRxqZzcKWr_X_4HD=C98fPviYEwYh=GoQ@mail.gmail.com>
	<OF8D6372C1.29181929-ON85258384.005AEADC-85258384.005AF366@notes.na.collabserv.com>
Message-ID: <CAA-1hNaUD=6PL2doE=PtFy8YGN_47QkWyxBALU-MydzjSMZeBg@mail.gmail.com>

It shows that the filesystem is not mounted

On Wed, Jan 16, 2019, 22:03 Frederick Stock <stockf at us.ibm.com wrote:

> What does the output of "mmlsmount all -L" show?
>
> Fred
> __________________________________________________
> Fred Stock | IBM Pittsburgh Lab | 720-430-8821
> stockf at us.ibm.com
>
>
>
> From:        KG <spectrumscale at kiranghag.com>
> To:        gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
> Date:        01/16/2019 11:19 AM
> Subject:        [gpfsug-discuss] Filesystem automount issues
> Sent by:        gpfsug-discuss-bounces at spectrumscale.org
> ------------------------------
>
>
>
> Hi
>
> IHAC running Scale 5.x on RHEL 7.5
>
> One out of two filesystems (/home) does not get mounted automatically at
> boot. (/home is scale filesystem)
>
> The scale log does mention that the filesystem is mounted but mount output
> says otherwise.
>
> There are no entries for /home in fstab since we let scale mount it.
> Automount on scale and filesystem both have been set to yes.
>
> Any pointers to troubleshoot would be appreciated.
>
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>
>
>
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20190116/cca3a395/attachment.htm>

From stockf at us.ibm.com  Wed Jan 16 18:38:07 2019
From: stockf at us.ibm.com (Frederick Stock)
Date: Wed, 16 Jan 2019 13:38:07 -0500
Subject: [gpfsug-discuss] Filesystem automount issues
In-Reply-To: <CAA-1hNaUD=6PL2doE=PtFy8YGN_47QkWyxBALU-MydzjSMZeBg@mail.gmail.com>
References: <CAA-1hNYt=SpQtOKV8ZRxqZzcKWr_X_4HD=C98fPviYEwYh=GoQ@mail.gmail.com><OF8D6372C1.29181929-ON85258384.005AEADC-85258384.005AF366@notes.na.collabserv.com>
	<CAA-1hNaUD=6PL2doE=PtFy8YGN_47QkWyxBALU-MydzjSMZeBg@mail.gmail.com>
Message-ID: <OF8C494357.2F2BFF5C-ON85258384.00664EFE-85258384.00665E19@notes.na.collabserv.com>

Would it be possible for you to include the output of "mmlsmount all -L" 
and "df -k" in your response?

Fred
__________________________________________________
Fred Stock | IBM Pittsburgh Lab | 720-430-8821
stockf at us.ibm.com


From:   KG <spectrumscale at kiranghag.com>
To:     gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
Date:   01/16/2019 01:15 PM
Subject:        Re: [gpfsug-discuss] Filesystem automount issues
Sent by:        gpfsug-discuss-bounces at spectrumscale.org


It shows that the filesystem is not mounted

On Wed, Jan 16, 2019, 22:03 Frederick Stock <stockf at us.ibm.com wrote:
What does the output of "mmlsmount all -L" show?

Fred
__________________________________________________
Fred Stock | IBM Pittsburgh Lab | 720-430-8821
stockf at us.ibm.com


From:        KG <spectrumscale at kiranghag.com>
To:        gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
Date:        01/16/2019 11:19 AM
Subject:        [gpfsug-discuss] Filesystem automount issues
Sent by:        gpfsug-discuss-bounces at spectrumscale.org


Hi 

IHAC running Scale 5.x on RHEL 7.5

One out of two filesystems (/home) does not get mounted automatically at 
boot. (/home is scale filesystem)

The scale log does mention that the filesystem is mounted but mount output 
says otherwise.

There are no entries for /home in fstab since we let scale mount it. 
Automount on scale and filesystem both have been set to yes.

Any pointers to troubleshoot would be appreciated.

_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20190116/e82897c8/attachment.htm>

From olaf.weiser at de.ibm.com  Wed Jan 16 20:01:53 2019
From: olaf.weiser at de.ibm.com (Olaf Weiser)
Date: Wed, 16 Jan 2019 21:01:53 +0100
Subject: [gpfsug-discuss] Filesystem automount issues
In-Reply-To: <OF8C494357.2F2BFF5C-ON85258384.00664EFE-85258384.00665E19@notes.na.collabserv.com>
References: <CAA-1hNYt=SpQtOKV8ZRxqZzcKWr_X_4HD=C98fPviYEwYh=GoQ@mail.gmail.com><OF8D6372C1.29181929-ON85258384.005AEADC-85258384.005AF366@notes.na.collabserv.com><CAA-1hNaUD=6PL2doE=PtFy8YGN_47QkWyxBALU-MydzjSMZeBg@mail.gmail.com>
	<OF8C494357.2F2BFF5C-ON85258384.00664EFE-85258384.00665E19@notes.na.collabserv.com>
Message-ID: <OFE6A8C713.3305FB69-ONC1258384.006D3A26-C1258384.006E090F@notes.na.collabserv.com>

An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20190116/c33f91ec/attachment.htm>

From S.J.Thompson at bham.ac.uk  Thu Jan 17 11:35:13 2019
From: S.J.Thompson at bham.ac.uk (Simon Thompson)
Date: Thu, 17 Jan 2019 11:35:13 +0000
Subject: [gpfsug-discuss] Node expels
Message-ID: <A5C3E1A4-DC1A-4983-9C32-2A9DE2F10C57@bham.ac.uk>

We?ve recently been seeing quite a few node expels with messages of the form:

2019-01-17_11:19:30.882+0000: [W] The TCP connection to IP address 10.20.0.58 proto-pg-pf01.bear.cluster <c0n236> (socket 153) state is unexpected: state=1 ca_state=4 snd_cwnd=1 snd_ssthresh=5 unacked=5 probes=0 backoff=7 retransmits=7 rto=26496000 rcv_ssthresh=102828 rtt=6729 rttvar=12066 sacked=0 retrans=1 reordering=3 lost=5
2019-01-17_11:19:30.882+0000: [I] tscCheckTcpConn: Sending debug data collection request to node 10.20.0.58 proto-pg-pf01.bear.cluster
2019-01-17_11:19:30.882+0000: Sending request to collect TCP debug data to proto-pg-pf01.bear.cluster localNode
2019-01-17_11:19:30.882+0000: [I] Calling user exit script gpfsSendRequestToNodes: event sendRequestToNodes, Async command /usr/lpp/mmfs/bin/mmcommon.
2019-01-17_11:24:52.611+0000: [E] Timed out in 300 seconds waiting for a commMsgCheckMessages reply from node 10.20.0.58 proto-pg-pf01.bear.cluster. Sending expel message.

On the client node, we see messages of the form:

2019-01-17_11:19:31.101+0000: [N] sdrServ: Received Tcp data collection request from 10.10.0.33
2019-01-17_11:19:31.102+0000: [N] GPFS will attempt to collect Tcp debug data on this node.
2019-01-17_11:24:52.838+0000: [N] sdrServ: Received expel data collection request from 10.10.0.33
2019-01-17_11:24:52.838+0000: [N] GPFS will attempt to collect debug data on this node.
2019-01-17_11:25:02.741+0000: [N] This node will be expelled from cluster rds.gpfs.servers due to expel msg from 10.10.12.41 (b
ber-les-nsd01-data.bb2.cluster in rds.gpfs.server
2019-01-17_11:25:03.160+0000: [N] sdrServ: Received expel data collection request from 10.20.0.56

They always appear to be to a specific type of hardware with the same Ethernet controller, though the nodes are split across three data centres and we aren?t seeing link congestion on the links between them.

On the node I listed above, it?s not actually doing anything either as the software on it is still being installed (i.e. it?s not doing GPFS or any other IO other than a couple of home directories).

Any suggestions on what ?(socket 153) state is unexpected? means?

Thanks

Simon


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20190117/e5ded89e/attachment.htm>

From TOMP at il.ibm.com  Thu Jan 17 11:46:19 2019
From: TOMP at il.ibm.com (Tomer Perry)
Date: Thu, 17 Jan 2019 13:46:19 +0200
Subject: [gpfsug-discuss] Node expels
In-Reply-To: <A5C3E1A4-DC1A-4983-9C32-2A9DE2F10C57@bham.ac.uk>
References: <A5C3E1A4-DC1A-4983-9C32-2A9DE2F10C57@bham.ac.uk>
Message-ID: <OFBE7422E0.78B38B96-ONC2258385.00408661-C2258385.0040AA47@notes.na.collabserv.com>

Simon,

Take a look at 
http://files.gpfsug.org/presentations/2018/USA/Scale_Network_Flow-0.8.pdf 
slide 13.


Regards,

Tomer Perry
Scalable I/O Development (Spectrum Scale)
email: tomp at il.ibm.com
1 Azrieli Center, Tel Aviv 67021, Israel
Global Tel:    +1 720 3422758
Israel Tel:      +972 3 9188625
Mobile:         +972 52 2554625


From:   Simon Thompson <S.J.Thompson at bham.ac.uk>
To:     "gpfsug-discuss at spectrumscale.org" 
<gpfsug-discuss at spectrumscale.org>
Date:   17/01/2019 13:35
Subject:        [gpfsug-discuss] Node expels
Sent by:        gpfsug-discuss-bounces at spectrumscale.org


We?ve recently been seeing quite a few node expels with messages of the 
form:
 
2019-01-17_11:19:30.882+0000: [W] The TCP connection to IP address 
10.20.0.58 proto-pg-pf01.bear.cluster <c0n236> (socket 153) state is 
unexpected: state=1 ca_state=4 snd_cwnd=1 snd_ssthresh=5 unacked=5 
probes=0 backoff=7 retransmits=7 rto=26496000 rcv_ssthresh=102828 rtt=6729 
rttvar=12066 sacked=0 retrans=1 reordering=3 lost=5
2019-01-17_11:19:30.882+0000: [I] tscCheckTcpConn: Sending debug data 
collection request to node 10.20.0.58 proto-pg-pf01.bear.cluster
2019-01-17_11:19:30.882+0000: Sending request to collect TCP debug data to 
proto-pg-pf01.bear.cluster localNode
2019-01-17_11:19:30.882+0000: [I] Calling user exit script 
gpfsSendRequestToNodes: event sendRequestToNodes, Async command 
/usr/lpp/mmfs/bin/mmcommon.
2019-01-17_11:24:52.611+0000: [E] Timed out in 300 seconds waiting for a 
commMsgCheckMessages reply from node 10.20.0.58 
proto-pg-pf01.bear.cluster. Sending expel message.
 
On the client node, we see messages of the form:
 
2019-01-17_11:19:31.101+0000: [N] sdrServ: Received Tcp data collection 
request from 10.10.0.33
2019-01-17_11:19:31.102+0000: [N] GPFS will attempt to collect Tcp debug 
data on this node.
2019-01-17_11:24:52.838+0000: [N] sdrServ: Received expel data collection 
request from 10.10.0.33
2019-01-17_11:24:52.838+0000: [N] GPFS will attempt to collect debug data 
on this node.
2019-01-17_11:25:02.741+0000: [N] This node will be expelled from cluster 
rds.gpfs.servers due to expel msg from 10.10.12.41 (b
ber-les-nsd01-data.bb2.cluster in rds.gpfs.server
2019-01-17_11:25:03.160+0000: [N] sdrServ: Received expel data collection 
request from 10.20.0.56
 
They always appear to be to a specific type of hardware with the same 
Ethernet controller, though the nodes are split across three data centres 
and we aren?t seeing link congestion on the links between them.
 
On the node I listed above, it?s not actually doing anything either as the 
software on it is still being installed (i.e. it?s not doing GPFS or any 
other IO other than a couple of home directories).
 
Any suggestions on what ?(socket 153) state is unexpected? means?
 
Thanks
 
Simon
 
 _______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20190117/94dd36b8/attachment.htm>

From TOMP at il.ibm.com  Thu Jan 17 13:28:15 2019
From: TOMP at il.ibm.com (Tomer Perry)
Date: Thu, 17 Jan 2019 15:28:15 +0200
Subject: [gpfsug-discuss] Node expels
In-Reply-To: <OFBE7422E0.78B38B96-ONC2258385.00408661-C2258385.0040AA47@notes.na.collabserv.com>
References: <A5C3E1A4-DC1A-4983-9C32-2A9DE2F10C57@bham.ac.uk>
	<OFBE7422E0.78B38B96-ONC2258385.00408661-C2258385.0040AA47@notes.na.collabserv.com>
Message-ID: <OF8D485B42.80E48D28-ONC2258385.00497FA3-C2258385.0049FF45@notes.na.collabserv.com>

Hi,

I was asked to elaborate a bit ( thus also adding John and Yong Ze Chen).

As written on the slide:
One of the best ways to determine if a network layer problem is root cause 
for an expel is to look at the low-level socket details dumped in the 
?extra? log data (mmfs dump all) saved as part of automatic data 
collection on Linux GPFS nodes. 

So, the idea is that in expel situation, we dump the socket state from the 
OS ( you can see the same using 'ss -i' for example).
In your example, it shows that the ca_state is 4, there are retransmits, 
high rto and all the point to a network problem.
You can find more details here: 
http://www.yonch.com/tech/linux-tcp-congestion-control-internals


Regards,

Tomer Perry
Scalable I/O Development (Spectrum Scale)
email: tomp at il.ibm.com
1 Azrieli Center, Tel Aviv 67021, Israel
Global Tel:    +1 720 3422758
Israel Tel:      +972 3 9188625
Mobile:         +972 52 2554625


From:   "Tomer Perry" <TOMP at il.ibm.com>
To:     gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
Date:   17/01/2019 13:46
Subject:        Re: [gpfsug-discuss] Node expels
Sent by:        gpfsug-discuss-bounces at spectrumscale.org


Simon,

Take a look at 
http://files.gpfsug.org/presentations/2018/USA/Scale_Network_Flow-0.8.pdf 
slide 13.


Regards,

Tomer Perry
Scalable I/O Development (Spectrum Scale)
email: tomp at il.ibm.com
1 Azrieli Center, Tel Aviv 67021, Israel
Global Tel:    +1 720 3422758
Israel Tel:      +972 3 9188625
Mobile:         +972 52 2554625


From:        Simon Thompson <S.J.Thompson at bham.ac.uk>
To:        "gpfsug-discuss at spectrumscale.org" 
<gpfsug-discuss at spectrumscale.org>
Date:        17/01/2019 13:35
Subject:        [gpfsug-discuss] Node expels
Sent by:        gpfsug-discuss-bounces at spectrumscale.org


We?ve recently been seeing quite a few node expels with messages of the 
form:
 
2019-01-17_11:19:30.882+0000: [W] The TCP connection to IP address 
10.20.0.58 proto-pg-pf01.bear.cluster <c0n236> (socket 153) state is 
unexpected: state=1 ca_state=4 snd_cwnd=1 snd_ssthresh=5 unacked=5 
probes=0 backoff=7 retransmits=7 rto=26496000 rcv_ssthresh=102828 rtt=6729 
rttvar=12066 sacked=0 retrans=1 reordering=3 lost=5
2019-01-17_11:19:30.882+0000: [I] tscCheckTcpConn: Sending debug data 
collection request to node 10.20.0.58 proto-pg-pf01.bear.cluster
2019-01-17_11:19:30.882+0000: Sending request to collect TCP debug data to 
proto-pg-pf01.bear.cluster localNode
2019-01-17_11:19:30.882+0000: [I] Calling user exit script 
gpfsSendRequestToNodes: event sendRequestToNodes, Async command 
/usr/lpp/mmfs/bin/mmcommon.
2019-01-17_11:24:52.611+0000: [E] Timed out in 300 seconds waiting for a 
commMsgCheckMessages reply from node 10.20.0.58 
proto-pg-pf01.bear.cluster. Sending expel message.
 
On the client node, we see messages of the form:
 
2019-01-17_11:19:31.101+0000: [N] sdrServ: Received Tcp data collection 
request from 10.10.0.33
2019-01-17_11:19:31.102+0000: [N] GPFS will attempt to collect Tcp debug 
data on this node.
2019-01-17_11:24:52.838+0000: [N] sdrServ: Received expel data collection 
request from 10.10.0.33
2019-01-17_11:24:52.838+0000: [N] GPFS will attempt to collect debug data 
on this node.
2019-01-17_11:25:02.741+0000: [N] This node will be expelled from cluster 
rds.gpfs.servers due to expel msg from 10.10.12.41 (b
ber-les-nsd01-data.bb2.cluster in rds.gpfs.server
2019-01-17_11:25:03.160+0000: [N] sdrServ: Received expel data collection 
request from 10.20.0.56
 
They always appear to be to a specific type of hardware with the same 
Ethernet controller, though the nodes are split across three data centres 
and we aren?t seeing link congestion on the links between them.
 
On the node I listed above, it?s not actually doing anything either as the 
software on it is still being installed (i.e. it?s not doing GPFS or any 
other IO other than a couple of home directories).
 
Any suggestions on what ?(socket 153) state is unexpected? means?
 
Thanks
 
Simon
 
 _______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20190117/45f3bbdf/attachment.htm>

From jlewars at us.ibm.com  Thu Jan 17 14:30:45 2019
From: jlewars at us.ibm.com (John Lewars)
Date: Thu, 17 Jan 2019 09:30:45 -0500
Subject: [gpfsug-discuss] Node expels
In-Reply-To: <OF8D485B42.80E48D28-ONC2258385.00497FA3-C2258385.0049F88F@LocalDomain>
References: <A5C3E1A4-DC1A-4983-9C32-2A9DE2F10C57@bham.ac.uk>
	<OFBE7422E0.78B38B96-ONC2258385.00408661-C2258385.0040AA47@notes.na.collabserv.com>
	<OF8D485B42.80E48D28-ONC2258385.00497FA3-C2258385.0049F88F@LocalDomain>
Message-ID: <OFFA6836B4.406B5B85-ON85258385.004E611F-85258385.004FB83B@notes.na.collabserv.com>

>They always appear to be to a specific type of hardware with the same 
Ethernet controller, 

That makes me think you might be seeing packet loss that could require 
ring buffer tuning (the defaults and limits will differ with different 
ethernet adapters). 

The expel section in the slides on this page has been expanded to include 
a 'debugging expels section' (slides 19-20, which also reference ring 
buffer tuning):
https://www.ibm.com/developerworks/community/wikis/home?lang=en#!/wiki/General%20Parallel%20File%20System%20(GPFS)/page/DEBUG%20Expels/comment/7e4f9433-7ca3-430f-b40b-94777c507381


Regards,
John Lewars 
Spectrum Scale Performance, IBM Poughkeepsie


From:   Tomer Perry/Israel/IBM
To:     gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
Cc:     John Lewars/Poughkeepsie/IBM at IBMUS, Yong Ze Chen/China/IBM at IBMCN
Date:   01/17/2019 08:28 AM
Subject:        Re: [gpfsug-discuss] Node expels


Hi,

I was asked to elaborate a bit ( thus also adding John and Yong Ze Chen).

As written on the slide:
One of the best ways to determine if a network layer problem is root cause 
for an expel is to look at the low-level socket details dumped in the 
?extra? log data (mmfs dump all) saved as part of automatic data 
collection on Linux GPFS nodes. 

So, the idea is that in expel situation, we dump the socket state from the 
OS ( you can see the same using 'ss -i' for example).
In your example, it shows that the ca_state is 4, there are retransmits, 
high rto and all the point to a network problem.
You can find more details here: 
http://www.yonch.com/tech/linux-tcp-congestion-control-internals


Regards,

Tomer Perry
Scalable I/O Development (Spectrum Scale)
email: tomp at il.ibm.com
1 Azrieli Center, Tel Aviv 67021, Israel
Global Tel:    +1 720 3422758
Israel Tel:      +972 3 9188625
Mobile:         +972 52 2554625


From:   "Tomer Perry" <TOMP at il.ibm.com>
To:     gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
Date:   17/01/2019 13:46
Subject:        Re: [gpfsug-discuss] Node expels
Sent by:        gpfsug-discuss-bounces at spectrumscale.org


Simon,

Take a look at 
http://files.gpfsug.org/presentations/2018/USA/Scale_Network_Flow-0.8.pdf 
slide 13.


Regards,

Tomer Perry
Scalable I/O Development (Spectrum Scale)
email: tomp at il.ibm.com
1 Azrieli Center, Tel Aviv 67021, Israel
Global Tel:    +1 720 3422758
Israel Tel:      +972 3 9188625
Mobile:         +972 52 2554625


From:        Simon Thompson <S.J.Thompson at bham.ac.uk>
To:        "gpfsug-discuss at spectrumscale.org" 
<gpfsug-discuss at spectrumscale.org>
Date:        17/01/2019 13:35
Subject:        [gpfsug-discuss] Node expels
Sent by:        gpfsug-discuss-bounces at spectrumscale.org


We?ve recently been seeing quite a few node expels with messages of the 
form:
 
2019-01-17_11:19:30.882+0000: [W] The TCP connection to IP address 
10.20.0.58 proto-pg-pf01.bear.cluster <c0n236> (socket 153) state is 
unexpected: state=1 ca_state=4 snd_cwnd=1 snd_ssthresh=5 unacked=5 
probes=0 backoff=7 retransmits=7 rto=26496000 rcv_ssthresh=102828 rtt=6729 
rttvar=12066 sacked=0 retrans=1 reordering=3 lost=5
2019-01-17_11:19:30.882+0000: [I] tscCheckTcpConn: Sending debug data 
collection request to node 10.20.0.58 proto-pg-pf01.bear.cluster
2019-01-17_11:19:30.882+0000: Sending request to collect TCP debug data to 
proto-pg-pf01.bear.cluster localNode
2019-01-17_11:19:30.882+0000: [I] Calling user exit script 
gpfsSendRequestToNodes: event sendRequestToNodes, Async command 
/usr/lpp/mmfs/bin/mmcommon.
2019-01-17_11:24:52.611+0000: [E] Timed out in 300 seconds waiting for a 
commMsgCheckMessages reply from node 10.20.0.58 
proto-pg-pf01.bear.cluster. Sending expel message.
 
On the client node, we see messages of the form:
 
2019-01-17_11:19:31.101+0000: [N] sdrServ: Received Tcp data collection 
request from 10.10.0.33
2019-01-17_11:19:31.102+0000: [N] GPFS will attempt to collect Tcp debug 
data on this node.
2019-01-17_11:24:52.838+0000: [N] sdrServ: Received expel data collection 
request from 10.10.0.33
2019-01-17_11:24:52.838+0000: [N] GPFS will attempt to collect debug data 
on this node.
2019-01-17_11:25:02.741+0000: [N] This node will be expelled from cluster 
rds.gpfs.servers due to expel msg from 10.10.12.41 (b
ber-les-nsd01-data.bb2.cluster in rds.gpfs.server
2019-01-17_11:25:03.160+0000: [N] sdrServ: Received expel data collection 
request from 10.20.0.56
 
They always appear to be to a specific type of hardware with the same 
Ethernet controller, though the nodes are split across three data centres 
and we aren?t seeing link congestion on the links between them.
 
On the node I listed above, it?s not actually doing anything either as the 
software on it is still being installed (i.e. it?s not doing GPFS or any 
other IO other than a couple of home directories).
 
Any suggestions on what ?(socket 153) state is unexpected? means?
 
Thanks
 
Simon
 
 _______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20190117/737c2872/attachment.htm>

From S.J.Thompson at bham.ac.uk  Thu Jan 17 19:02:06 2019
From: S.J.Thompson at bham.ac.uk (Simon Thompson)
Date: Thu, 17 Jan 2019 19:02:06 +0000
Subject: [gpfsug-discuss] Node expels
In-Reply-To: <OFFA6836B4.406B5B85-ON85258385.004E611F-85258385.004FB83B@notes.na.collabserv.com>
References: <A5C3E1A4-DC1A-4983-9C32-2A9DE2F10C57@bham.ac.uk>
	<OFBE7422E0.78B38B96-ONC2258385.00408661-C2258385.0040AA47@notes.na.collabserv.com>
	<OF8D485B42.80E48D28-ONC2258385.00497FA3-C2258385.0049F88F@LocalDomain>,
	<OFFA6836B4.406B5B85-ON85258385.004E611F-85258385.004FB83B@notes.na.collabserv.com>
Message-ID: <CF45EE16DEF2FE4B9AA7FF2B6EE2654501297B40A5@EX13.adf.bham.ac.uk>

So we've backed out a bunch of network tuning parameters we had set (based on the GPFS wiki pages), they've been set a while but um ... maybe they are causing issues.

Secondly, we've noticed in dump tscomm that we see connection broken to a node, and then the node ID is usually the same node, which is a bit weird to me.

We've also just updated firmware on the Intel nics (the x722) which is part of the Skylake board. And specifically its the newer skylake kit we see this problem on. We've a number of issues with the x722 firmware (like it won't even bring a link up when plugged into some of our 10GbE switches, but that's another story).

We've also dropped the bonded links from these nodes, just in case its related...

Simon

________________________________
From: gpfsug-discuss-bounces at spectrumscale.org [gpfsug-discuss-bounces at spectrumscale.org] on behalf of jlewars at us.ibm.com [jlewars at us.ibm.com]
Sent: 17 January 2019 14:30
To: Tomer Perry; gpfsug main discussion list
Cc: Yong Ze Chen
Subject: Re: [gpfsug-discuss] Node expels

>They always appear to be to a specific type of hardware with the same Ethernet controller,

That makes me think you might be seeing packet loss that could require ring buffer tuning (the defaults and limits will differ with different ethernet adapters).

The expel section in the slides on this page has been expanded to include a 'debugging expels section' (slides 19-20, which also reference ring buffer tuning):
https://www.ibm.com/developerworks/community/wikis/home?lang=en#!/wiki/General%20Parallel%20File%20System%20(GPFS)/page/DEBUG%20Expels/comment/7e4f9433-7ca3-430f-b40b-94777c507381

Regards,
John Lewars
Spectrum Scale Performance, IBM Poughkeepsie


From:        Tomer Perry/Israel/IBM
To:        gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
Cc:        John Lewars/Poughkeepsie/IBM at IBMUS, Yong Ze Chen/China/IBM at IBMCN
Date:        01/17/2019 08:28 AM
Subject:        Re: [gpfsug-discuss] Node expels
________________________________


Hi,

I was asked to elaborate a bit ( thus also adding John and Yong Ze Chen).

As written on the slide:
One of the best ways to determine if a network layer problem is root cause for an expel is to look at the low-level socket details dumped in the ?extra? log data (mmfs dump all) saved as part of automatic data collection on Linux GPFS nodes.

So, the idea is that in expel situation, we dump the socket state from the OS ( you can see the same using 'ss -i' for example).
In your example, it shows that the ca_state is 4, there are retransmits, high rto and all the point to a network problem.
You can find more details here: http://www.yonch.com/tech/linux-tcp-congestion-control-internals


Regards,

Tomer Perry
Scalable I/O Development (Spectrum Scale)
email: tomp at il.ibm.com
1 Azrieli Center, Tel Aviv 67021, Israel
Global Tel:    +1 720 3422758
Israel Tel:      +972 3 9188625
Mobile:         +972 52 2554625


From:        "Tomer Perry" <TOMP at il.ibm.com>
To:        gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
Date:        17/01/2019 13:46
Subject:        Re: [gpfsug-discuss] Node expels
Sent by:        gpfsug-discuss-bounces at spectrumscale.org
________________________________


Simon,

Take a look at http://files.gpfsug.org/presentations/2018/USA/Scale_Network_Flow-0.8.pdfslide 13.


Regards,

Tomer Perry
Scalable I/O Development (Spectrum Scale)
email: tomp at il.ibm.com
1 Azrieli Center, Tel Aviv 67021, Israel
Global Tel:    +1 720 3422758
Israel Tel:      +972 3 9188625
Mobile:         +972 52 2554625


From:        Simon Thompson <S.J.Thompson at bham.ac.uk>
To:        "gpfsug-discuss at spectrumscale.org" <gpfsug-discuss at spectrumscale.org>
Date:        17/01/2019 13:35
Subject:        [gpfsug-discuss] Node expels
Sent by:        gpfsug-discuss-bounces at spectrumscale.org
________________________________


We?ve recently been seeing quite a few node expels with messages of the form:

2019-01-17_11:19:30.882+0000: [W] The TCP connection to IP address 10.20.0.58 proto-pg-pf01.bear.cluster <c0n236> (socket 153) state is unexpected: state=1 ca_state=4 snd_cwnd=1 snd_ssthresh=5 unacked=5 probes=0 backoff=7 retransmits=7 rto=26496000 rcv_ssthresh=102828 rtt=6729 rttvar=12066 sacked=0 retrans=1 reordering=3 lost=5
2019-01-17_11:19:30.882+0000: [I] tscCheckTcpConn: Sending debug data collection request to node 10.20.0.58 proto-pg-pf01.bear.cluster
2019-01-17_11:19:30.882+0000: Sending request to collect TCP debug data to proto-pg-pf01.bear.cluster localNode
2019-01-17_11:19:30.882+0000: [I] Calling user exit script gpfsSendRequestToNodes: event sendRequestToNodes, Async command /usr/lpp/mmfs/bin/mmcommon.
2019-01-17_11:24:52.611+0000: [E] Timed out in 300 seconds waiting for a commMsgCheckMessages reply from node 10.20.0.58 proto-pg-pf01.bear.cluster. Sending expel message.

On the client node, we see messages of the form:

2019-01-17_11:19:31.101+0000: [N] sdrServ: Received Tcp data collection request from 10.10.0.33
2019-01-17_11:19:31.102+0000: [N] GPFS will attempt to collect Tcp debug data on this node.
2019-01-17_11:24:52.838+0000: [N] sdrServ: Received expel data collection request from 10.10.0.33
2019-01-17_11:24:52.838+0000: [N] GPFS will attempt to collect debug data on this node.
2019-01-17_11:25:02.741+0000: [N] This node will be expelled from cluster rds.gpfs.servers due to expel msg from 10.10.12.41 (b
ber-les-nsd01-data.bb2.cluster in rds.gpfs.server
2019-01-17_11:25:03.160+0000: [N] sdrServ: Received expel data collection request from 10.20.0.56

They always appear to be to a specific type of hardware with the same Ethernet controller, though the nodes are split across three data centres and we aren?t seeing link congestion on the links between them.

On the node I listed above, it?s not actually doing anything either as the software on it is still being installed (i.e. it?s not doing GPFS or any other IO other than a couple of home directories).

Any suggestions on what ?(socket 153) state is unexpected? means?

Thanks

Simon

_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20190117/220fb871/attachment.htm>

From orichards at pixitmedia.com  Thu Jan 17 20:52:50 2019
From: orichards at pixitmedia.com (Orlando Richards)
Date: Thu, 17 Jan 2019 20:52:50 +0000
Subject: [gpfsug-discuss] Node expels
In-Reply-To: <CF45EE16DEF2FE4B9AA7FF2B6EE2654501297B40A5@EX13.adf.bham.ac.uk>
References: <A5C3E1A4-DC1A-4983-9C32-2A9DE2F10C57@bham.ac.uk>
	<OFBE7422E0.78B38B96-ONC2258385.00408661-C2258385.0040AA47@notes.na.collabserv.com>
	<OF8D485B42.80E48D28-ONC2258385.00497FA3-C2258385.0049F88F@LocalDomain>
	<OFFA6836B4.406B5B85-ON85258385.004E611F-85258385.004FB83B@notes.na.collabserv.com>
	<CF45EE16DEF2FE4B9AA7FF2B6EE2654501297B40A5@EX13.adf.bham.ac.uk>
Message-ID: <4e0ea3c4-3076-e9a0-55c3-58f98be96d9b@pixitmedia.com>

Hi Simon,

We've had to disable the offload's for Intel cards in many situations 
with the i40e drivers - Redhat have an article about it: 
https://access.redhat.com/solutions/3662011

-------
Orlando


On 17/01/2019 19:02, Simon Thompson wrote:
> So we've backed out a bunch of network tuning parameters we had set 
> (based on the GPFS wiki pages), they've been set a while but um ... 
> maybe they are causing issues.
>
> Secondly, we've noticed in dump tscomm that we see connection broken 
> to a node, and then the node ID is usually the same node, which is a 
> bit weird to me.
>
> We've also just updated firmware on the Intel nics (the x722) which is 
> part of the Skylake board. And specifically its the newer skylake kit 
> we see this problem on. We've a number of issues with the x722 
> firmware (like it won't even bring a link up when plugged into some of 
> our 10GbE switches, but that's another story).
>
> We've also dropped the bonded links from these nodes, just in case its 
> related...
>
> Simon
>
> ------------------------------------------------------------------------
> *From:* gpfsug-discuss-bounces at spectrumscale.org 
> [gpfsug-discuss-bounces at spectrumscale.org] on behalf of 
> jlewars at us.ibm.com [jlewars at us.ibm.com]
> *Sent:* 17 January 2019 14:30
> *To:* Tomer Perry; gpfsug main discussion list
> *Cc:* Yong Ze Chen
> *Subject:* Re: [gpfsug-discuss] Node expels
>
> >They always appear to be to a specific type of hardware with the same 
> Ethernet controller,
>
> That makes me think you might be seeing packet loss that could require 
> ring buffer tuning (the defaults and limits will differ with different 
> ethernet adapters).
>
> The expel section in the slides on this page has been expanded to 
> include a 'debugging expels section' (slides 19-20, which also 
> reference ring buffer tuning):
> https://www.ibm.com/developerworks/community/wikis/home?lang=en#!/wiki/General%20Parallel%20File%20System%20(GPFS)/page/DEBUG%20Expels/comment/7e4f9433-7ca3-430f-b40b-94777c507381
>
> Regards,
> John Lewars
> Spectrum Scale Performance, IBM Poughkeepsie
>
>
>
>
> From: Tomer Perry/Israel/IBM
> To: gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
> Cc: John Lewars/Poughkeepsie/IBM at IBMUS, Yong Ze Chen/China/IBM at IBMCN
> Date: 01/17/2019 08:28 AM
> Subject: Re: [gpfsug-discuss] Node expels
> ------------------------------------------------------------------------
>
>
> Hi,
>
> I was asked to elaborate a bit ( thus also adding John and Yong Ze Chen).
>
> As written on the slide:
> One of the best ways to determine if a network layer problem is root 
> cause for an expel is to look at the low-level socket details dumped 
> in the ?extra? log data (mmfs dump all) saved as part of automatic 
> data collection on Linux GPFS nodes.
>
> So, the idea is that in expel situation, we dump the socket state from 
> the OS ( you can see the same using 'ss -i' for example).
> In your example, it shows that the ca_state is 4, there are 
> retransmits, high rto and all the point to a network problem.
> You can find more details here: 
> http://www.yonch.com/tech/linux-tcp-congestion-control-internals
>
>
> Regards,
>
> Tomer Perry
> Scalable I/O Development (Spectrum Scale)
> email: tomp at il.ibm.com
> 1 Azrieli Center, Tel Aviv 67021, Israel
> Global Tel: ? ?+1 720 3422758
> Israel Tel: ? ? ?+972 3 9188625
> Mobile: ? ? ? ? +972 52 2554625
>
>
>
>
>
> From: "Tomer Perry" <TOMP at il.ibm.com>
> To: gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
> Date: 17/01/2019 13:46
> Subject: Re: [gpfsug-discuss] Node expels
> Sent by: gpfsug-discuss-bounces at spectrumscale.org
> ------------------------------------------------------------------------
>
>
>
> Simon,
>
> Take a look at 
> _http://files.gpfsug.org/presentations/2018/USA/Scale_Network_Flow-0.8.pdf_slide 
> 13.
>
>
> Regards,
>
> Tomer Perry
> Scalable I/O Development (Spectrum Scale)
> email: tomp at il.ibm.com
> 1 Azrieli Center, Tel Aviv 67021, Israel
> Global Tel: ? ?+1 720 3422758
> Israel Tel: ? ? ?+972 3 9188625
> Mobile: ? ? ? ? +972 52 2554625
>
>
>
>
> From: Simon Thompson <S.J.Thompson at bham.ac.uk>
> To: "gpfsug-discuss at spectrumscale.org" <gpfsug-discuss at spectrumscale.org>
> Date: 17/01/2019 13:35
> Subject: [gpfsug-discuss] Node expels
> Sent by: gpfsug-discuss-bounces at spectrumscale.org
> ------------------------------------------------------------------------
>
>
>
> We?ve recently been seeing quite a few node expels with messages of 
> the form:
>
> 2019-01-17_11:19:30.882+0000: [W] The TCP connection to IP address 
> 10.20.0.58 proto-pg-pf01.bear.cluster <c0n236> (socket 153) state is 
> unexpected: state=1 ca_state=4 snd_cwnd=1 snd_ssthresh=5 unacked=5 
> probes=0 backoff=7 retransmits=7 rto=26496000 rcv_ssthresh=102828 
> rtt=6729 rttvar=12066 sacked=0 retrans=1 reordering=3 lost=5
> 2019-01-17_11:19:30.882+0000: [I] tscCheckTcpConn: Sending debug data 
> collection request to node 10.20.0.58 proto-pg-pf01.bear.cluster
> 2019-01-17_11:19:30.882+0000: Sending request to collect TCP debug 
> data to proto-pg-pf01.bear.cluster localNode
> 2019-01-17_11:19:30.882+0000: [I] Calling user exit script 
> gpfsSendRequestToNodes: event sendRequestToNodes, Async command 
> /usr/lpp/mmfs/bin/mmcommon.
> 2019-01-17_11:24:52.611+0000: [E] Timed out in 300 seconds waiting for 
> a commMsgCheckMessages reply from node 10.20.0.58 
> proto-pg-pf01.bear.cluster. Sending expel message.
>
> On the client node, we see messages of the form:
>
> 2019-01-17_11:19:31.101+0000: [N] sdrServ: Received Tcp data 
> collection request from 10.10.0.33
> 2019-01-17_11:19:31.102+0000: [N] GPFS will attempt to collect Tcp 
> debug data on this node.
> 2019-01-17_11:24:52.838+0000: [N] sdrServ: Received expel data 
> collection request from 10.10.0.33
> 2019-01-17_11:24:52.838+0000: [N] GPFS will attempt to collect debug 
> data on this node.
> 2019-01-17_11:25:02.741+0000: [N] This node will be expelled from 
> cluster rds.gpfs.servers due to expel msg from 10.10.12.41 (b
> ber-les-nsd01-data.bb2.cluster in rds.gpfs.server
> 2019-01-17_11:25:03.160+0000: [N] sdrServ: Received expel data 
> collection request from 10.20.0.56
>
> They always appear to be to a specific type of hardware with the same 
> Ethernet controller, though the nodes are split across three data 
> centres and we aren?t seeing link congestion on the links between them.
>
> On the node I listed above, it?s not actually doing anything either as 
> the software on it is still being installed (i.e. it?s not doing GPFS 
> or any other IO other than a couple of home directories).
>
> Any suggestions on what ?(socket 153) state is unexpected? means?
>
> Thanks
>
> Simon
>
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org_
> __http://gpfsug.org/mailman/listinfo/gpfsug-discuss_
>
>
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>
>
>
>
>
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss

-- 
 <http://pixitmedia.com>
This email is confidential in that it is intended 
for the exclusive attention of the addressee(s) indicated. If you are not 
the intended recipient, this email should not be read or disclosed to any 
other person. Please notify the sender immediately and delete this email 
from your computer system. Any opinions expressed are not necessarily those 
of the company from which this email was sent and, whilst to the best of 
our knowledge no viruses or defects exist, no responsibility can be 
accepted for any loss or damage arising from its receipt or subsequent use 
of this email.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20190117/e320b299/attachment.htm>

From jonathan.buzzard at strath.ac.uk  Fri Jan 18 15:23:09 2019
From: jonathan.buzzard at strath.ac.uk (Jonathan Buzzard)
Date: Fri, 18 Jan 2019 15:23:09 +0000
Subject: [gpfsug-discuss] DSS-G
Message-ID: <39dbe81fed83a72913decd11cc3f55e04ef5bb08.camel@strath.ac.uk>


Anyone out their with a DSS-G using SR650 servers?

We have one and after some hassle we have finally got the access to the
software downloads and I have been reading through the documentation to
familiarize myself with the upgrade procedure.

Skipping over the shear madness of that which appears to involved doing
a complete netboot reisntall of the nodes for every upgrade, it looks
like we have wrong hardware. It all came in a Lenovo rack with factory
cabling so one assumes it would be correct.

However the "Manufactoring Preload Procedure" document says

    The DSS-G installation scripts assume that IPMI access to the
    servers is set up through the first regular 1GbE Ethernet port
    of the server (marked with a green star in figure 21) in shared
    mode, not through the dedicated IPMI port under the first three
    PCIe slots of the SR650 server?s back, and not on the lower left
    side of the x3650 M5 server?s back.

Except our SR650's have 2x10GbE SFP+ LOM and the XCC is connected to
the dedicated IPMI port. Oh great, reinstalling the OS for an update is
already giving me the screaming heebie jeebies, but now my factory
delivered setup is wrong. So in my book increased chance of the install
procedure writing all over the disks during install and blowing away
the NSD's. Last time I was involved in an net install of RHEL (well
CentOS but makes little difference) onto a GPFS not with attached disks
the installer wrote all over the NSD descriptors and destroyed the file
system.

So before one plays war with Lenovo for shipping an unsupported
configuration I was wondering how other DSS-G's with SR650's have come
from the factory.

JAB.

-- 
Jonathan A. Buzzard                         Tel: +44141-5483420
HPC System Administrator, ARCHIE-WeSt.
University of Strathclyde, John Anderson Building, Glasgow. G4 0NG


From S.J.Thompson at bham.ac.uk  Fri Jan 18 16:02:48 2019
From: S.J.Thompson at bham.ac.uk (Simon Thompson)
Date: Fri, 18 Jan 2019 16:02:48 +0000
Subject: [gpfsug-discuss] DSS-G
In-Reply-To: <39dbe81fed83a72913decd11cc3f55e04ef5bb08.camel@strath.ac.uk>
References: <39dbe81fed83a72913decd11cc3f55e04ef5bb08.camel@strath.ac.uk>
Message-ID: <70C48D1B-4E99-4831-A9D9-AFD326154D8A@bham.ac.uk>

I have several. One of mine was shipped for customer rack (which happened to be an existing Lenovo rack anyway), the other was based on 3560m5 so cabled differently then anyway (and its now a franken DSS-G as we upgraded the servers to SR650 and added an SSD tray, but I have so much non-standard Lenovo config stuff in our systems ....)

If you bond the LOM ports together then you can't use the XCC in shared mode. But the installer scripts will make it shared when you reinstall/upgrade. Well, it can half work in some cases depending on how you have your switch connected. For example we set the switch to fail back to non-bond mode (relatively common now), which is find when the OS is not booted, you can talk to XCC. But as soon as the OS boots and it bonds, the switch port turns into a bond/trunk port and BAM, you can no longer talk to the XCC port.

We have an xcat post script to put it back to being dedicated on the XCC port. So during install you lose access for a little while whilst the Lenovo script runs before my script puts it back again.

And if you read the upgrade guide, then it tells you to unplug the SAS ports before doing the reinstall (OK I haven't checked the 2.2a upgrade guide, but it always did). HOWEVER, the xcat template for DSS-G should also black list the SAS driver to prevent it seeing the attached JBOD storage. AND GPFS now writes proper GPT headers as well to the disks which the installer should then leave alone. (But yes, haven't we all done an install and wiped the disk headers ... GPFS works great until you try to mount the file-system sometime later)

On the needing to reinstall ... I agree I don't like the reinstall to upgrade between releases, but if you look what it's doing it sorta half makes sense. For example it force flashes an exact validated firmware onto the SAS cards and forces the port config etc onto the card to being in a known current state. I don't like it, but I see why it's done like that. We have in the past picked the relevant bits out (e.g. disk firmware and GPFS packages), and done just those, THIS IS NOT SUPPORTED, but we did pick it apart to see what had changed.

If you go to 2.2a as well, the gui is now moved out (it was a bad idea to install on the DSS-G nodes anyway I'm sure), and the pmcollector package magically doesn't get installed either on the DSS-G nodes.

Oh AND, the LOM ports ... if you upgrade to DSS-G 2.2a, that will flash the firmware to Intel 4.0 release for the X722. And that doesn't work if you have Mellanox Ethernet switches running Cumulus.  (we proved it was the firmware by upgrading another SR650 to the latest firmware and suddenly it no longer works) - you won't get a link up, even at PXE time so not a driver issue. And if you have a VDX switch you need another workaround ...

Simon

?On 18/01/2019, 15:38, "gpfsug-discuss-bounces at spectrumscale.org on behalf of Jonathan Buzzard" <gpfsug-discuss-bounces at spectrumscale.org on behalf of jonathan.buzzard at strath.ac.uk> wrote:

    
    Anyone out their with a DSS-G using SR650 servers?
    
    We have one and after some hassle we have finally got the access to the
    software downloads and I have been reading through the documentation to
    familiarize myself with the upgrade procedure.
    
    Skipping over the shear madness of that which appears to involved doing
    a complete netboot reisntall of the nodes for every upgrade, it looks
    like we have wrong hardware. It all came in a Lenovo rack with factory
    cabling so one assumes it would be correct.
    
    However the "Manufactoring Preload Procedure" document says
    
        The DSS-G installation scripts assume that IPMI access to the
        servers is set up through the first regular 1GbE Ethernet port
        of the server (marked with a green star in figure 21) in shared
        mode, not through the dedicated IPMI port under the first three
        PCIe slots of the SR650 server?s back, and not on the lower left
        side of the x3650 M5 server?s back.
    
    Except our SR650's have 2x10GbE SFP+ LOM and the XCC is connected to
    the dedicated IPMI port. Oh great, reinstalling the OS for an update is
    already giving me the screaming heebie jeebies, but now my factory
    delivered setup is wrong. So in my book increased chance of the install
    procedure writing all over the disks during install and blowing away
    the NSD's. Last time I was involved in an net install of RHEL (well
    CentOS but makes little difference) onto a GPFS not with attached disks
    the installer wrote all over the NSD descriptors and destroyed the file
    system.
    
    So before one plays war with Lenovo for shipping an unsupported
    configuration I was wondering how other DSS-G's with SR650's have come
    from the factory.
    
    JAB.
    
    -- 
    Jonathan A. Buzzard                         Tel: +44141-5483420
    HPC System Administrator, ARCHIE-WeSt.
    University of Strathclyde, John Anderson Building, Glasgow. G4 0NG
    
    
    _______________________________________________
    gpfsug-discuss mailing list
    gpfsug-discuss at spectrumscale.org
    http://gpfsug.org/mailman/listinfo/gpfsug-discuss
    

From jonathan.buzzard at strath.ac.uk  Fri Jan 18 17:14:52 2019
From: jonathan.buzzard at strath.ac.uk (Jonathan Buzzard)
Date: Fri, 18 Jan 2019 17:14:52 +0000
Subject: [gpfsug-discuss] DSS-G
In-Reply-To: <70C48D1B-4E99-4831-A9D9-AFD326154D8A@bham.ac.uk>
References: <39dbe81fed83a72913decd11cc3f55e04ef5bb08.camel@strath.ac.uk>
	<70C48D1B-4E99-4831-A9D9-AFD326154D8A@bham.ac.uk>
Message-ID: <901117abe1768c9d02aae3b6cc9b5cf47dc3cc97.camel@strath.ac.uk>

On Fri, 2019-01-18 at 16:02 +0000, Simon Thompson wrote:

[SNIP]

> 
> If you bond the LOM ports together then you can't use the XCC in
> shared mode. But the installer scripts will make it shared when you
> reinstall/upgrade. Well, it can half work in some cases depending on
> how you have your switch connected. For example we set the switch to
> fail back to non-bond mode (relatively common now), which is find
> when the OS is not booted, you can talk to XCC. But as soon as the OS
> boots and it bonds, the switch port turns into a bond/trunk port and
> BAM, you can no longer talk to the XCC port.

We don't have that issue :-) Currently there is nothing plugged into
the LOM because we are using the Mellanox ConnectX4 card for bonded
40Gbps Ethernet to carry the GPFS traffic in the main with one of the
ports on the two cards set to Infiniband so the storage can be mounted
on an old cluster which only has 1Gb Ethernet (new cluster uses 10GbE
networking to carry storage).

However we have a shortage of 10GbE ports and the documentation says it
should be 1GbE anyway, hence asking what Lenovo might have shipped to
other people, as we have a disparity between what has been shipped and
what the documentation says it should be like.

[SNIP]

> And if you read the upgrade guide, then it tells you to unplug the
> SAS ports before doing the reinstall (OK I haven't checked the 2.2a
> upgrade guide, but it always did).

Well the 2.2a documentation does not say anything about that :-) I had
basically decided however it was going to be necessary for safety
purposes. While I do have a full backup of the file system I don't want
to have to use it.

>  HOWEVER, the xcat template for DSS-G should also black list the SAS
> driver to prevent it seeing the attached JBOD storage. AND GPFS now
> writes proper GPT headers as well to the disks which the installer
> should then leave alone. (But yes, haven't we all done an install and
> wiped the disk headers ... GPFS works great until you try to mount
> the file-system sometime later)

Well I have never wiped my NSD's, just the numpty getting ready to
prepare the CentOS6 upgrade for the cluster forgot to unzone the
storage arrays (cluster had FC attached storage to all nodes for
performance reasons, back in the day 4Gb FC was a lot cheaper than
10GbE and 1GbE was not fast enough) and wiped it for me :-(

> On the needing to reinstall ... I agree I don't like the reinstall to
> upgrade between releases, but if you look what it's doing it sorta
> half makes sense. For example it force flashes an exact validated
> firmware onto the SAS cards and forces the port config etc onto the
> card to being in a known current state. I don't like it, but I see
> why it's done like that.

Except that does not require a reinstall of the OS to achieve.
Reinstalling from scratch for an update is complete madness IMHO.

> 
> If you go to 2.2a as well, the gui is now moved out (it was a bad
> idea to install on the DSS-G nodes anyway I'm sure), and the
> pmcollector package magically doesn't get installed either on the
> DSS-G nodes.
> 

Currently we don't have the GUI installed anywhere. I am not sure I
trust IBM yet to not change the GUI completely again to be bothered
getting it to work.

> Oh AND, the LOM ports ... if you upgrade to DSS-G 2.2a, that will
> flash the firmware to Intel 4.0 release for the X722. And that
> doesn't work if you have Mellanox Ethernet switches running
> Cumulus.  (we proved it was the firmware by upgrading another SR650
> to the latest firmware and suddenly it no longer works) - you won't
> get a link up, even at PXE time so not a driver issue. And if you
> have a VDX switch you need another workaround ...
> 

We have Lenovo switches, so hopefully Lenovo tested with their own
switches work ;-)

Mind you I get this running the dssgcktopology tool

    Warning: Unsupported configuration of odd number of enclosures detected.

Which nitwit wrote that script then? From the "Manufacturing Preload
Procedure" for 2.2a on page 9

    For the high density DSS models DSS-G210, DSS-G220, DSS-G240 and
    DSS-G260 with 3.5? NL-SAS disks (7.2k RPM), the DSS-G building
    block contains one, two, four or six Lenovo D3284 disk enclosures.

Right so what is it then? Because one enclosure which is clearly an odd
number of enclosures is allegedly an unsupported configuration
according to the tool, but supported according to the documentation!!!


JAB.

-- 
Jonathan A. Buzzard                         Tel: +44141-5483420
HPC System Administrator, ARCHIE-WeSt.
University of Strathclyde, John Anderson Building, Glasgow. G4 0NG


From matthew.robinson02 at gmail.com  Fri Jan 18 19:25:35 2019
From: matthew.robinson02 at gmail.com (Matthew Robinson)
Date: Fri, 18 Jan 2019 14:25:35 -0500
Subject: [gpfsug-discuss] DSS-G
In-Reply-To: <39dbe81fed83a72913decd11cc3f55e04ef5bb08.camel@strath.ac.uk>
References: <39dbe81fed83a72913decd11cc3f55e04ef5bb08.camel@strath.ac.uk>
Message-ID: <CADvPS3goEsw-xUSTKw7-JgYL6qSBNOMU06C2rh0H-GTtqLifWQ@mail.gmail.com>

Hi Jonathan,

In the last DSS 2.x tarballs there should a PDG included. This should
provide alot of detail going over the solutions configuration and common
problems for troubleshooting. Or at least the Problem Determantion Guide
was there be for my department let me go. The shared IMM port is pretty
standard from the 3650 to the SD530's for the most part.

You should have a port marked shared on either and the IPMI interace is to
be shared mode for dual subnet masks on the same NIC. This is is the
standard xcat configuration from Sourcforge. If I am not mistaken the PDG
should be stored in the first DSS-G version tarball for reference.

Hope this helps,
Matthew Robinson

On Fri, Jan 18, 2019 at 10:23 AM Jonathan Buzzard <
jonathan.buzzard at strath.ac.uk> wrote:

>
> Anyone out their with a DSS-G using SR650 servers?
>
> We have one and after some hassle we have finally got the access to the
> software downloads and I have been reading through the documentation to
> familiarize myself with the upgrade procedure.
>
> Skipping over the shear madness of that which appears to involved doing
> a complete netboot reisntall of the nodes for every upgrade, it looks
> like we have wrong hardware. It all came in a Lenovo rack with factory
> cabling so one assumes it would be correct.
>
> However the "Manufactoring Preload Procedure" document says
>
>     The DSS-G installation scripts assume that IPMI access to the
>     servers is set up through the first regular 1GbE Ethernet port
>     of the server (marked with a green star in figure 21) in shared
>     mode, not through the dedicated IPMI port under the first three
>     PCIe slots of the SR650 server?s back, and not on the lower left
>     side of the x3650 M5 server?s back.
>
> Except our SR650's have 2x10GbE SFP+ LOM and the XCC is connected to
> the dedicated IPMI port. Oh great, reinstalling the OS for an update is
> already giving me the screaming heebie jeebies, but now my factory
> delivered setup is wrong. So in my book increased chance of the install
> procedure writing all over the disks during install and blowing away
> the NSD's. Last time I was involved in an net install of RHEL (well
> CentOS but makes little difference) onto a GPFS not with attached disks
> the installer wrote all over the NSD descriptors and destroyed the file
> system.
>
> So before one plays war with Lenovo for shipping an unsupported
> configuration I was wondering how other DSS-G's with SR650's have come
> from the factory.
>
> JAB.
>
> --
> Jonathan A. Buzzard                         Tel: +44141-5483420
> HPC System Administrator, ARCHIE-WeSt.
> University of Strathclyde, John Anderson Building, Glasgow. G4 0NG
>
>
>
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>


-- 
Matthew Robinson
Comptia A+, Net+
919.909.0494
matthew.robinson02 at gmail.com

The greatest discovery of my generation is that man can alter his life
simply by altering his attitude of mind. - William James, Harvard
Psychologist.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20190118/2ed6ccb1/attachment.htm>

From Renar.Grunenberg at huk-coburg.de  Mon Jan 21 15:59:29 2019
From: Renar.Grunenberg at huk-coburg.de (Grunenberg, Renar)
Date: Mon, 21 Jan 2019 15:59:29 +0000
Subject: [gpfsug-discuss] Spectrum Scale Cygwin cmd delays
Message-ID: <5b711884338047fbaf5a61005964652d@SMXRF105.msg.hukrf.de>

Hello All,
We test spectrum scale on an windows only Client-Cluster (remote mounted to a linux Cluster) but
the execution of mm commands in cygwin is very slow.
We have tried the following adjustments to increase the execution speed.


  *   We have installed Cygwin Server as a service (cygserver-config). Unfortunately, this resulted in no faster execution.
  *   Adaptation of the hosts file: 127.0.0.1 localhost cygdrive wpad
to prevent any DNS problems when accessing ?/cygdrive/...?
  *   Started them as Administrator

All adjustments have so far not led to any improvement. Are there any hints to enhance the cmd execution time on windows (w2k12 actual used)

Regards Renar


Renar Grunenberg
Abteilung Informatik - Betrieb

HUK-COBURG
Bahnhofsplatz
96444 Coburg
Telefon:        09561 96-44110
Telefax:        09561 96-44104
E-Mail: Renar.Grunenberg at huk-coburg.de
Internet:       www.huk.de
________________________________
HUK-COBURG Haftpflicht-Unterst?tzungs-Kasse kraftfahrender Beamter Deutschlands a. G. in Coburg
Reg.-Gericht Coburg HRB 100; St.-Nr. 9212/101/00021
Sitz der Gesellschaft: Bahnhofsplatz, 96444 Coburg
Vorsitzender des Aufsichtsrats: Prof. Dr. Heinrich R. Schradin.
Vorstand: Klaus-J?rgen Heitmann (Sprecher), Stefan Gronbach, Dr. Hans Olav Her?y, Dr. J?rg Rheinl?nder (stv.), Sarah R?ssler, Daniel Thomas.
________________________________
Diese Nachricht enth?lt vertrauliche und/oder rechtlich gesch?tzte Informationen.
Wenn Sie nicht der richtige Adressat sind oder diese Nachricht irrt?mlich erhalten haben,
informieren Sie bitte sofort den Absender und vernichten Sie diese Nachricht.
Das unerlaubte Kopieren sowie die unbefugte Weitergabe dieser Nachricht ist nicht gestattet.

This information may contain confidential and/or privileged information.
If you are not the intended recipient (or have received this information in error) please notify the
sender immediately and destroy this information.
Any unauthorized copying, disclosure or distribution of the material in this information is strictly forbidden.
________________________________
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20190121/ad0062ea/attachment.htm>

From spectrumscale at kiranghag.com  Mon Jan 21 16:03:13 2019
From: spectrumscale at kiranghag.com (KG)
Date: Mon, 21 Jan 2019 21:33:13 +0530
Subject: [gpfsug-discuss] Dr site using full replication?
Message-ID: <CAA-1hNbG5ZvqP=-29hzK_vFGnhgdfcewPo8W-GQ6soBnqoT-Mg@mail.gmail.com>

Hi Folks

Has anyone replicated scale node to a dr site by replicating boot disks and
nsd ?

The same hostnames and ip subnet would be available on the other site and
cluster should be able to operate from any one location at a time.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20190121/ffc8f08b/attachment.htm>

From Kevin.Buterbaugh at Vanderbilt.Edu  Mon Jan 21 16:02:50 2019
From: Kevin.Buterbaugh at Vanderbilt.Edu (Buterbaugh, Kevin L)
Date: Mon, 21 Jan 2019 16:02:50 +0000
Subject: [gpfsug-discuss] Get list of
 filesets_without_runningmmlsfileset?
In-Reply-To: <OFC097E468.FCB57B6B-ON03258383.004F2153-03258383.005124A9@notes.na.collabserv.com>
References: <u569ppa8lavhogh5slk01oju.1547438997683@emailhosting.com>
	<5DCBA252-33B7-4AC9-B2E3-10DA6881B1AA@vanderbilt.edu>
	<OFC097E468.FCB57B6B-ON03258383.004F2153-03258383.005124A9@notes.na.collabserv.com>
Message-ID: <60451989-2E0B-4CF9-A6E2-BC0939169311@vanderbilt.edu>

Hi All,

I just wanted to follow up on this thread ? the only way I have found to obtain a list of filesets and their associated junction paths as a non-root user is via the REST API (and thanks to those who suggested that).  However, AFAICT querying the REST API via a script would expose the username / password used to do so to anyone who bothered to look at the code, which would in turn allow a knowledgeable and curious user to query the REST API themselves for other information we do not necessarily want to expose to them.  Therefore, it is not an acceptable solution to us.

Therefore, unless someone responds with a way to allow a non-root user to obtain fileset junction paths that doesn?t involve the REST API, I?m afraid I?m at a dead end in terms of making our quota usage Python script something that I can share with the broader community.  It just has too much site-specific code in it.  Sorry?

Kevin

P.S.  In case you?re curious about how the quota script is obtaining those junction paths ? we have a cron job that runs once per hour on the cluster manager that dumps the output of mmlsfileset to a text file, which the script then reads.  The cron job used to just run once per day and used to just run mmlsfileset.  I have modified it to be a shell script which checks for the load average on the cluster manager being less than 10 and that there are no waiters of more than 10 seconds duration.  If both of those conditions are true, it runs mmlsfileset.  If either are not, it simply exits ? the idea being that one or both of those would likely be true if something were going on with the cluster manager that would cause the mmlsfileset to hang.

I have also modified the quota script itself so that it checks that the junction path for a fileset actually exists before attempting to stat it (duh - should?ve done that from the start), which handles the case where a user would run the quota script and it would bomb off with an exception because the fileset was deleted and the cron job hadn?t run yet.  If a new fileset is created, well, it just won?t get checked by the quota script until the cron job runs successfully.  We have decided that this is an acceptable compromise.

On Jan 15, 2019, at 8:46 AM, Marc A Kaplan <makaplan at us.ibm.com<mailto:makaplan at us.ibm.com>> wrote:

Personally, I agree that there ought to be a way in the product.

In the meawhile, you no doubt already have some ways to tell your users where to find their filesets as pathnames.
Otherwise, how are they accessing their files?

And to keep things somewhat sane, I'd bet filesets are all linked to one or small number of well known paths in the filesystem.
Like  /AGpfsFilesystem/filesets/...  Plus you could add symlinks and/or as has been suggested post info extracted from mmlsfileset and/or mmlsquota.

So as a practical matter, is this an urgent problem...?  Why?  How?
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org<http://spectrumscale.org>

?
Kevin Buterbaugh - Senior System Administrator
Vanderbilt University - Advanced Computing Center for Research and Education
Kevin.Buterbaugh at vanderbilt.edu<mailto:Kevin.Buterbaugh at vanderbilt.edu> - (615)875-9633

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20190121/8f528a35/attachment.htm>

From jeverdon at us.ibm.com  Mon Jan 21 22:41:26 2019
From: jeverdon at us.ibm.com (Jodi E Everdon)
Date: Mon, 21 Jan 2019 17:41:26 -0500
Subject: [gpfsug-discuss] post to list
Message-ID: <OF412D2905.0D2F93FD-ON00258389.007CA149-85258389.007CA4F5@notes.na.collabserv.com>


 Jodi Everdon                                                   IBM 
                                                                    
 New Technology Introduction (NTI)                  2455 South Road 
 Client Experience Validation                Poughkeepsie, NY 12601 
 Email: jeverdon at us.ibm.com                           North America 
 IBM IT Infrastructure:                                             
 www.ibm.com/it-infrastructure                                      
                                                                    
                                                                    
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20190121/57123671/attachment.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 15606074.gif
Type: image/gif
Size: 1851 bytes
Desc: not available
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20190121/57123671/attachment.gif>

From scale at us.ibm.com  Mon Jan 21 23:34:31 2019
From: scale at us.ibm.com (IBM Spectrum Scale)
Date: Mon, 21 Jan 2019 15:34:31 -0800
Subject: [gpfsug-discuss] Spectrum Scale Cygwin cmd delays
In-Reply-To: <5b711884338047fbaf5a61005964652d@SMXRF105.msg.hukrf.de>
References: <5b711884338047fbaf5a61005964652d@SMXRF105.msg.hukrf.de>
Message-ID: <OFAAA4B565.31A1217E-ON88258389.008056FE-88258389.0081815D@notes.na.collabserv.com>

Hello Renar,

A few things to try:

Make sure IPv6 is disabled. On each Windows node, run "mmcmi  host 
<hostname>", with <hostname> being itself and each and every node in the 
cluster. Make sure mmcmi prints valid IPv4 address.
To eliminate DNS issues, try adding IPv4 entries for each cluster node in 
"c:\windows\system32\drivers\etc\hosts".
If any anti-virus is active, disable realtime scanning on c:\cygwin64 
(wherever you installed cygwin 64-bit).

You can also try debugging a script, say: (from GPFS ksh):  DEBUG=1 
mmlscluster, and see what takes time.

Regards, The Spectrum Scale (GPFS) team

------------------------------------------------------------------------------------------------------------------
If you feel that your question can benefit other users of  Spectrum Scale 
(GPFS), then please post it to the public IBM developerWroks Forum at 
https://www.ibm.com/developerworks/community/forums/html/forum?id=11111111-0000-0000-0000-000000000479
. 

If your query concerns a potential software error in Spectrum Scale (GPFS) 
and you have an IBM software maintenance contract please contact 
1-800-237-5511 in the United States or your local IBM Service Center in 
other countries. 

The forum is informally monitored as time permits and should not be used 
for priority messages to the Spectrum Scale (GPFS) team.


From:   "Grunenberg, Renar" <Renar.Grunenberg at huk-coburg.de>
To:     "'gpfsug-discuss at spectrumscale.org'" 
<gpfsug-discuss at spectrumscale.org>
Date:   01/21/2019 08:01 AM
Subject:        [gpfsug-discuss] Spectrum Scale Cygwin cmd delays
Sent by:        gpfsug-discuss-bounces at spectrumscale.org


Hello All,
We test spectrum scale on an windows only Client-Cluster (remote mounted 
to a linux Cluster) but
the execution of mm commands in cygwin is very slow. 
We have tried the following adjustments to increase the execution speed.
 
We have installed Cygwin Server as a service (cygserver-config). 
Unfortunately, this resulted in no faster execution.
Adaptation of the hosts file: 127.0.0.1 localhost cygdrive wpad
to prevent any DNS problems when accessing ?/cygdrive/...?
Started them as Administrator
 
All adjustments have so far not led to any improvement. Are there any 
hints to enhance the cmd execution time on windows (w2k12 actual used)
 
Regards Renar
 
Renar Grunenberg
Abteilung Informatik - Betrieb

HUK-COBURG
Bahnhofsplatz
96444 Coburg
Telefon:
09561 96-44110
Telefax:
09561 96-44104
E-Mail:
Renar.Grunenberg at huk-coburg.de
Internet:
www.huk.de
HUK-COBURG Haftpflicht-Unterst?tzungs-Kasse kraftfahrender Beamter 
Deutschlands a. G. in Coburg
Reg.-Gericht Coburg HRB 100; St.-Nr. 9212/101/00021
Sitz der Gesellschaft: Bahnhofsplatz, 96444 Coburg
Vorsitzender des Aufsichtsrats: Prof. Dr. Heinrich R. Schradin.
Vorstand: Klaus-J?rgen Heitmann (Sprecher), Stefan Gronbach, Dr. Hans Olav 
Her?y, Dr. J?rg Rheinl?nder (stv.), Sarah R?ssler, Daniel Thomas.
Diese Nachricht enth?lt vertrauliche und/oder rechtlich gesch?tzte 
Informationen.
Wenn Sie nicht der richtige Adressat sind oder diese Nachricht irrt?mlich 
erhalten haben,
informieren Sie bitte sofort den Absender und vernichten Sie diese 
Nachricht.
Das unerlaubte Kopieren sowie die unbefugte Weitergabe dieser Nachricht 
ist nicht gestattet.

This information may contain confidential and/or privileged information.
If you are not the intended recipient (or have received this information 
in error) please notify the
sender immediately and destroy this information.
Any unauthorized copying, disclosure or distribution of the material in 
this information is strictly forbidden.

_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=IbxtjdkPAM2Sbon4Lbbi4w&m=frR4WiYT89JSgLnJMtRAlESzRXWW2YatEwsuuV8M810&s=FSjMBxMo8G8y3VR2A59hgIWaHPKPFNHU7RXcneIVCPE&e=


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20190121/6702aa28/attachment.htm>

From A.Wolf-Reber at de.ibm.com  Tue Jan 22 07:36:15 2019
From: A.Wolf-Reber at de.ibm.com (Alexander Wolf)
Date: Tue, 22 Jan 2019 07:36:15 +0000
Subject: [gpfsug-discuss] Get list of
	filesets_without_runningmmlsfileset?
In-Reply-To: <60451989-2E0B-4CF9-A6E2-BC0939169311@vanderbilt.edu>
References: <60451989-2E0B-4CF9-A6E2-BC0939169311@vanderbilt.edu>,
	<u569ppa8lavhogh5slk01oju.1547438997683@emailhosting.com><5DCBA252-33B7-4AC9-B2E3-10DA6881B1AA@vanderbilt.edu><OFC097E468.FCB57B6B-ON03258383.004F2153-03258383.005124A9@notes.na.collabserv.com>
Message-ID: <OF5C8172C6.4148DDC6-ON0025838A.00297DA1-0025838A.0029C5B6@notes.na.collabserv.com>

An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20190122/339bab98/attachment.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: Image.15481420128480.png
Type: image/png
Size: 1134 bytes
Desc: not available
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20190122/339bab98/attachment.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: Image.15481420128481.png
Type: image/png
Size: 6645 bytes
Desc: not available
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20190122/339bab98/attachment-0001.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: Image.15481420128482.png
Type: image/png
Size: 1134 bytes
Desc: not available
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20190122/339bab98/attachment-0002.png>

From S.J.Thompson at bham.ac.uk  Tue Jan 22 14:35:02 2019
From: S.J.Thompson at bham.ac.uk (Simon Thompson)
Date: Tue, 22 Jan 2019 14:35:02 +0000
Subject: [gpfsug-discuss] Node expels
In-Reply-To: <OFFA6836B4.406B5B85-ON85258385.004E611F-85258385.004FB83B@notes.na.collabserv.com>
References: <A5C3E1A4-DC1A-4983-9C32-2A9DE2F10C57@bham.ac.uk>
	<OFBE7422E0.78B38B96-ONC2258385.00408661-C2258385.0040AA47@notes.na.collabserv.com>
	<OF8D485B42.80E48D28-ONC2258385.00497FA3-C2258385.0049F88F@LocalDomain>
	<OFFA6836B4.406B5B85-ON85258385.004E611F-85258385.004FB83B@notes.na.collabserv.com>
Message-ID: <0B0D4ACE-1B54-4D22-85E3-B3154DD7C943@bham.ac.uk>

OK we think we might have a reason for this.

We run iptables on some of our management function nodes, and we found that in some cases, our config management tool can cause a ?systemctl restart iptables? to occur (the rule ordering generation was non deterministic meaning it could shuffle rules ? we fixed that and made it reload rather than restart). Which takes a fraction of a second, but it appears that this is sufficient for GPFS to get into a state. What I didn?t mention before was that we could get it into a state where the only way to recover was to shutdown the storage cluster and restart it.

I?m not sure why normal expel and recovery doesn?t appear to work in this case, though we?re not 100% certain that its iptables restart. (we just have a very smoky gun at present). (I have a ticket with that question open).

Maybe it?s a combination of having a default DROP policy on iptables as well - we have also switched to ACCEPT and added a DROP rule at the end of the ruleset which gives the same result.

Simon

From: <gpfsug-discuss-bounces at spectrumscale.org> on behalf of "jlewars at us.ibm.com" <jlewars at us.ibm.com>
Reply-To: "gpfsug-discuss at spectrumscale.org" <gpfsug-discuss at spectrumscale.org>
Date: Thursday, 17 January 2019 at 14:31
To: Tomer Perry <TOMP at il.ibm.com>, "gpfsug-discuss at spectrumscale.org" <gpfsug-discuss at spectrumscale.org>
Cc: Yong Ze Chen <yongzech at cn.ibm.com>
Subject: Re: [gpfsug-discuss] Node expels

>They always appear to be to a specific type of hardware with the same Ethernet controller,

That makes me think you might be seeing packet loss that could require ring buffer tuning (the defaults and limits will differ with different ethernet adapters).

The expel section in the slides on this page has been expanded to include a 'debugging expels section' (slides 19-20, which also reference ring buffer tuning):
https://www.ibm.com/developerworks/community/wikis/home?lang=en#!/wiki/General%20Parallel%20File%20System%20(GPFS)/page/DEBUG%20Expels/comment/7e4f9433-7ca3-430f-b40b-94777c507381

Regards,
John Lewars
Spectrum Scale Performance, IBM Poughkeepsie


From:        Tomer Perry/Israel/IBM
To:        gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
Cc:        John Lewars/Poughkeepsie/IBM at IBMUS, Yong Ze Chen/China/IBM at IBMCN
Date:        01/17/2019 08:28 AM
Subject:        Re: [gpfsug-discuss] Node expels
________________________________


Hi,

I was asked to elaborate a bit ( thus also adding John and Yong Ze Chen).

As written on the slide:
One of the best ways to determine if a network layer problem is root cause for an expel is to look at the low-level socket details dumped in the ?extra? log data (mmfs dump all) saved as part of automatic data collection on Linux GPFS nodes.

So, the idea is that in expel situation, we dump the socket state from the OS ( you can see the same using 'ss -i' for example).
In your example, it shows that the ca_state is 4, there are retransmits, high rto and all the point to a network problem.
You can find more details here: http://www.yonch.com/tech/linux-tcp-congestion-control-internals


Regards,

Tomer Perry
Scalable I/O Development (Spectrum Scale)
email: tomp at il.ibm.com
1 Azrieli Center, Tel Aviv 67021, Israel
Global Tel:    +1 720 3422758
Israel Tel:      +972 3 9188625
Mobile:         +972 52 2554625


From:        "Tomer Perry" <TOMP at il.ibm.com>
To:        gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
Date:        17/01/2019 13:46
Subject:        Re: [gpfsug-discuss] Node expels
Sent by:        gpfsug-discuss-bounces at spectrumscale.org
________________________________


Simon,

Take a look at http://files.gpfsug.org/presentations/2018/USA/Scale_Network_Flow-0.8.pdfslide 13.


Regards,

Tomer Perry
Scalable I/O Development (Spectrum Scale)
email: tomp at il.ibm.com
1 Azrieli Center, Tel Aviv 67021, Israel
Global Tel:    +1 720 3422758
Israel Tel:      +972 3 9188625
Mobile:         +972 52 2554625


From:        Simon Thompson <S.J.Thompson at bham.ac.uk>
To:        "gpfsug-discuss at spectrumscale.org" <gpfsug-discuss at spectrumscale.org>
Date:        17/01/2019 13:35
Subject:        [gpfsug-discuss] Node expels
Sent by:        gpfsug-discuss-bounces at spectrumscale.org
________________________________


We?ve recently been seeing quite a few node expels with messages of the form:

2019-01-17_11:19:30.882+0000: [W] The TCP connection to IP address 10.20.0.58 proto-pg-pf01.bear.cluster <c0n236> (socket 153) state is unexpected: state=1 ca_state=4 snd_cwnd=1 snd_ssthresh=5 unacked=5 probes=0 backoff=7 retransmits=7 rto=26496000 rcv_ssthresh=102828 rtt=6729 rttvar=12066 sacked=0 retrans=1 reordering=3 lost=5
2019-01-17_11:19:30.882+0000: [I] tscCheckTcpConn: Sending debug data collection request to node 10.20.0.58 proto-pg-pf01.bear.cluster
2019-01-17_11:19:30.882+0000: Sending request to collect TCP debug data to proto-pg-pf01.bear.cluster localNode
2019-01-17_11:19:30.882+0000: [I] Calling user exit script gpfsSendRequestToNodes: event sendRequestToNodes, Async command /usr/lpp/mmfs/bin/mmcommon.
2019-01-17_11:24:52.611+0000: [E] Timed out in 300 seconds waiting for a commMsgCheckMessages reply from node 10.20.0.58 proto-pg-pf01.bear.cluster. Sending expel message.

On the client node, we see messages of the form:

2019-01-17_11:19:31.101+0000: [N] sdrServ: Received Tcp data collection request from 10.10.0.33
2019-01-17_11:19:31.102+0000: [N] GPFS will attempt to collect Tcp debug data on this node.
2019-01-17_11:24:52.838+0000: [N] sdrServ: Received expel data collection request from 10.10.0.33
2019-01-17_11:24:52.838+0000: [N] GPFS will attempt to collect debug data on this node.
2019-01-17_11:25:02.741+0000: [N] This node will be expelled from cluster rds.gpfs.servers due to expel msg from 10.10.12.41 (b
ber-les-nsd01-data.bb2.cluster in rds.gpfs.server
2019-01-17_11:25:03.160+0000: [N] sdrServ: Received expel data collection request from 10.20.0.56

They always appear to be to a specific type of hardware with the same Ethernet controller, though the nodes are split across three data centres and we aren?t seeing link congestion on the links between them.

On the node I listed above, it?s not actually doing anything either as the software on it is still being installed (i.e. it?s not doing GPFS or any other IO other than a couple of home directories).

Any suggestions on what ?(socket 153) state is unexpected? means?

Thanks

Simon

_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20190122/3fe3c4d7/attachment.htm>

From rmoye at quantlab.com  Tue Jan 22 15:43:26 2019
From: rmoye at quantlab.com (Roger Moye)
Date: Tue, 22 Jan 2019 15:43:26 +0000
Subject: [gpfsug-discuss] Spectrum Scale Cygwin cmd delays
In-Reply-To: <OFAAA4B565.31A1217E-ON88258389.008056FE-88258389.0081815D@notes.na.collabserv.com>
References: <5b711884338047fbaf5a61005964652d@SMXRF105.msg.hukrf.de>
	<OFAAA4B565.31A1217E-ON88258389.008056FE-88258389.0081815D@notes.na.collabserv.com>
Message-ID: <18bab23b080c4ad487c68b8ebc04b975@quantlab.com>

We experienced the same issue and were advised not to use Windows for quorum nodes.   We moved our Windows nodes into the storage cluster which was entirely Linux and that solved it.   If this is not an option, perhaps adding some Linux nodes to your remote cluster as quorum nodes would help.

-Roger


From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of IBM Spectrum Scale
Sent: Monday, January 21, 2019 5:35 PM
To: gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
Subject: Re: [gpfsug-discuss] Spectrum Scale Cygwin cmd delays

Hello Renar,

A few things to try:

  1.  Make sure IPv6 is disabled. On each Windows node, run "mmcmi  host  <hostname>", with <hostname> being itself and each and every node in the cluster. Make sure mmcmi prints valid IPv4 address.

  1.  To eliminate DNS issues, try adding IPv4 entries for each cluster node in "c:\windows\system32\drivers\etc\hosts".

  1.  If any anti-virus is active, disable realtime scanning on c:\cygwin64  (wherever you installed cygwin 64-bit).

You can also try debugging a script, say: (from GPFS ksh):  DEBUG=1  mmlscluster, and see what takes time.

Regards, The Spectrum Scale (GPFS) team

------------------------------------------------------------------------------------------------------------------
If you feel that your question can benefit other users of  Spectrum Scale (GPFS), then please post it to the public IBM developerWroks Forum at https://www.ibm.com/developerworks/community/forums/html/forum?id=11111111-0000-0000-0000-000000000479.

If your query concerns a potential software error in Spectrum Scale (GPFS) and you have an IBM software maintenance contract please contact  1-800-237-5511 in the United States or your local IBM Service Center in other countries.

The forum is informally monitored as time permits and should not be used for priority messages to the Spectrum Scale (GPFS) team.


From:        "Grunenberg, Renar" <Renar.Grunenberg at huk-coburg.de<mailto:Renar.Grunenberg at huk-coburg.de>>
To:        "'gpfsug-discuss at spectrumscale.org'" <gpfsug-discuss at spectrumscale.org<mailto:gpfsug-discuss at spectrumscale.org>>
Date:        01/21/2019 08:01 AM
Subject:        [gpfsug-discuss] Spectrum Scale Cygwin cmd delays
Sent by:        gpfsug-discuss-bounces at spectrumscale.org<mailto:gpfsug-discuss-bounces at spectrumscale.org>
________________________________


Hello All,

We test spectrum scale on an windows only Client-Cluster (remote mounted to a linux Cluster) but

the execution of mm commands in cygwin is very slow.

We have tried the following adjustments to increase the execution speed.


  *   We have installed Cygwin Server as a service (cygserver-config). Unfortunately, this resulted in no faster execution.
  *   Adaptation of the hosts file: 127.0.0.1localhost cygdrive wpad
to prevent any DNS problems when accessing "/cygdrive/..."
  *   Started them as Administrator


All adjustments have so far not led to any improvement. Are there any hints to enhance the cmd execution time on windows (w2k12 actual used)


Regards Renar


Renar Grunenberg
Abteilung Informatik - Betrieb

HUK-COBURG
Bahnhofsplatz
96444 Coburg
Telefon:

09561 96-44110

Telefax:

09561 96-44104

E-Mail:

Renar.Grunenberg at huk-coburg.de<mailto:Renar.Grunenberg at huk-coburg.de>

Internet:

www.huk.de


________________________________
HUK-COBURG Haftpflicht-Unterst?tzungs-Kasse kraftfahrender Beamter Deutschlands a. G. in Coburg
Reg.-Gericht Coburg HRB 100; St.-Nr. 9212/101/00021
Sitz der Gesellschaft: Bahnhofsplatz, 96444 Coburg
Vorsitzender des Aufsichtsrats: Prof. Dr. Heinrich R. Schradin.
Vorstand: Klaus-J?rgen Heitmann (Sprecher), Stefan Gronbach, Dr. Hans Olav Her?y, Dr. J?rg Rheinl?nder (stv.), Sarah R?ssler, Daniel Thomas.
________________________________
Diese Nachricht enth?lt vertrauliche und/oder rechtlich gesch?tzte Informationen.
Wenn Sie nicht der richtige Adressat sind oder diese Nachricht irrt?mlich erhalten haben,
informieren Sie bitte sofort den Absender und vernichten Sie diese Nachricht.
Das unerlaubte Kopieren sowie die unbefugte Weitergabe dieser Nachricht ist nicht gestattet.

This information may contain confidential and/or privileged information.
If you are not the intended recipient (or have received this information in error) please notify the
sender immediately and destroy this information.
Any unauthorized copying, disclosure or distribution of the material in this information is strictly forbidden.
________________________________

_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


-----------------------------------------------------------------------------------

The information in this communication and any attachment is confidential and intended solely for the attention and use of the named addressee(s). All information and opinions expressed herein are subject to change without notice. This communication is not to be construed as an offer to sell or the solicitation of an offer to buy any security. Any such offer or solicitation can only be made by means of the delivery of a confidential private offering memorandum (which should be carefully reviewed for a complete description of investment strategies and risks). Any reliance one may place on the accuracy or validity of this information is at their own risk. Past performance is not necessarily indicative of the future results of an investment. All figures are estimated and unaudited unless otherwise noted. If you are not the intended recipient, or a person responsible for delivering this to the intended recipient, you are not authorized to and must not disclose, copy, distribute, or retain this message or any part of it. In this case, please notify the sender immediately at 713-333-5440
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20190122/0b540d0c/attachment.htm>

From Renar.Grunenberg at huk-coburg.de  Tue Jan 22 17:10:24 2019
From: Renar.Grunenberg at huk-coburg.de (Grunenberg, Renar)
Date: Tue, 22 Jan 2019 17:10:24 +0000
Subject: [gpfsug-discuss] Spectrum Scale Cygwin cmd delays
In-Reply-To: <18bab23b080c4ad487c68b8ebc04b975@quantlab.com>
References: <5b711884338047fbaf5a61005964652d@SMXRF105.msg.hukrf.de>
	<OFAAA4B565.31A1217E-ON88258389.008056FE-88258389.0081815D@notes.na.collabserv.com>
	<18bab23b080c4ad487c68b8ebc04b975@quantlab.com>
Message-ID: <be9f8bd00ba543e583d2cbf840de34d6@SMXRF105.msg.hukrf.de>

Hallo Roger,
first thanks fort he tip. But we decided to separate the linux-io-Cluster from the Windows client only cluster, because of security requirements and ssh management requirements. We can use at this point, local named admins on Windows and use on Linux a Deamon and an separated Admin-interface Network for pwless root ssh. Your Hint seems to be CCR related or is this a Cygwin problem.
@Spectrum Scale Team:
Point1: IP V6 can?t disabled because of applications that want to use this. But the mmcmi cmd are give us already the right ipv4 adresses.
Point2. There are no DNS-Issues
Point3: We must check these.
Any recommendations to Rogers statements?

Regards Renar


Renar Grunenberg
Abteilung Informatik - Betrieb

HUK-COBURG
Bahnhofsplatz
96444 Coburg
Telefon:        09561 96-44110
Telefax:        09561 96-44104
E-Mail: Renar.Grunenberg at huk-coburg.de
Internet:       www.huk.de
________________________________
HUK-COBURG Haftpflicht-Unterst?tzungs-Kasse kraftfahrender Beamter Deutschlands a. G. in Coburg
Reg.-Gericht Coburg HRB 100; St.-Nr. 9212/101/00021
Sitz der Gesellschaft: Bahnhofsplatz, 96444 Coburg
Vorsitzender des Aufsichtsrats: Prof. Dr. Heinrich R. Schradin.
Vorstand: Klaus-J?rgen Heitmann (Sprecher), Stefan Gronbach, Dr. Hans Olav Her?y, Dr. J?rg Rheinl?nder (stv.), Sarah R?ssler, Daniel Thomas.
________________________________
Diese Nachricht enth?lt vertrauliche und/oder rechtlich gesch?tzte Informationen.
Wenn Sie nicht der richtige Adressat sind oder diese Nachricht irrt?mlich erhalten haben,
informieren Sie bitte sofort den Absender und vernichten Sie diese Nachricht.
Das unerlaubte Kopieren sowie die unbefugte Weitergabe dieser Nachricht ist nicht gestattet.

This information may contain confidential and/or privileged information.
If you are not the intended recipient (or have received this information in error) please notify the
sender immediately and destroy this information.
Any unauthorized copying, disclosure or distribution of the material in this information is strictly forbidden.
________________________________
Von: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] Im Auftrag von Roger Moye
Gesendet: Dienstag, 22. Januar 2019 16:43
An: gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
Betreff: Re: [gpfsug-discuss] Spectrum Scale Cygwin cmd delays

We experienced the same issue and were advised not to use Windows for quorum nodes.   We moved our Windows nodes into the storage cluster which was entirely Linux and that solved it.   If this is not an option, perhaps adding some Linux nodes to your remote cluster as quorum nodes would help.

-Roger


From: gpfsug-discuss-bounces at spectrumscale.org<mailto:gpfsug-discuss-bounces at spectrumscale.org> [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of IBM Spectrum Scale
Sent: Monday, January 21, 2019 5:35 PM
To: gpfsug main discussion list <gpfsug-discuss at spectrumscale.org<mailto:gpfsug-discuss at spectrumscale.org>>
Subject: Re: [gpfsug-discuss] Spectrum Scale Cygwin cmd delays

Hello Renar,

A few things to try:

  1.  Make sure IPv6 is disabled. On each Windows node, run "mmcmi  host  <hostname>", with <hostname> being itself and each and every node in the cluster. Make sure mmcmi prints valid IPv4 address.

  1.  To eliminate DNS issues, try adding IPv4 entries for each cluster node in "c:\windows\system32\drivers\etc\hosts".

  1.  If any anti-virus is active, disable realtime scanning on c:\cygwin64  (wherever you installed cygwin 64-bit).

You can also try debugging a script, say: (from GPFS ksh):  DEBUG=1  mmlscluster, and see what takes time.

Regards, The Spectrum Scale (GPFS) team

------------------------------------------------------------------------------------------------------------------
If you feel that your question can benefit other users of  Spectrum Scale (GPFS), then please post it to the public IBM developerWroks Forum at https://www.ibm.com/developerworks/community/forums/html/forum?id=11111111-0000-0000-0000-000000000479.

If your query concerns a potential software error in Spectrum Scale (GPFS) and you have an IBM software maintenance contract please contact  1-800-237-5511 in the United States or your local IBM Service Center in other countries.

The forum is informally monitored as time permits and should not be used for priority messages to the Spectrum Scale (GPFS) team.


From:        "Grunenberg, Renar" <Renar.Grunenberg at huk-coburg.de<mailto:Renar.Grunenberg at huk-coburg.de>>
To:        "'gpfsug-discuss at spectrumscale.org'" <gpfsug-discuss at spectrumscale.org<mailto:gpfsug-discuss at spectrumscale.org>>
Date:        01/21/2019 08:01 AM
Subject:        [gpfsug-discuss] Spectrum Scale Cygwin cmd delays
Sent by:        gpfsug-discuss-bounces at spectrumscale.org<mailto:gpfsug-discuss-bounces at spectrumscale.org>
________________________________


Hello All,

We test spectrum scale on an windows only Client-Cluster (remote mounted to a linux Cluster) but

the execution of mm commands in cygwin is very slow.

We have tried the following adjustments to increase the execution speed.


  *   We have installed Cygwin Server as a service (cygserver-config). Unfortunately, this resulted in no faster execution.
  *   Adaptation of the hosts file: 127.0.0.1localhost cygdrive wpad
to prevent any DNS problems when accessing ?/cygdrive/...?
  *   Started them as Administrator


All adjustments have so far not led to any improvement. Are there any hints to enhance the cmd execution time on windows (w2k12 actual used)


Regards Renar


Renar Grunenberg
Abteilung Informatik - Betrieb

HUK-COBURG
Bahnhofsplatz
96444 Coburg
Telefon:

09561 96-44110

Telefax:

09561 96-44104

E-Mail:

Renar.Grunenberg at huk-coburg.de<mailto:Renar.Grunenberg at huk-coburg.de>

Internet:

www.huk.de


________________________________
HUK-COBURG Haftpflicht-Unterst?tzungs-Kasse kraftfahrender Beamter Deutschlands a. G. in Coburg
Reg.-Gericht Coburg HRB 100; St.-Nr. 9212/101/00021
Sitz der Gesellschaft: Bahnhofsplatz, 96444 Coburg
Vorsitzender des Aufsichtsrats: Prof. Dr. Heinrich R. Schradin.
Vorstand: Klaus-J?rgen Heitmann (Sprecher), Stefan Gronbach, Dr. Hans Olav Her?y, Dr. J?rg Rheinl?nder (stv.), Sarah R?ssler, Daniel Thomas.
________________________________
Diese Nachricht enth?lt vertrauliche und/oder rechtlich gesch?tzte Informationen.
Wenn Sie nicht der richtige Adressat sind oder diese Nachricht irrt?mlich erhalten haben,
informieren Sie bitte sofort den Absender und vernichten Sie diese Nachricht.
Das unerlaubte Kopieren sowie die unbefugte Weitergabe dieser Nachricht ist nicht gestattet.

This information may contain confidential and/or privileged information.
If you are not the intended recipient (or have received this information in error) please notify the
sender immediately and destroy this information.
Any unauthorized copying, disclosure or distribution of the material in this information is strictly forbidden.
________________________________

_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


 -----------------------------------------------------------------------------------

The information in this communication and any attachment is confidential and intended solely for the attention and use of the named addressee(s). All information and opinions expressed herein are subject to change without notice. This communication is not to be construed as an offer to sell or the solicitation of an offer to buy any security. Any such offer or solicitation can only be made by means of the delivery of a confidential private offering memorandum (which should be carefully reviewed for a complete description of investment strategies and risks). Any reliance one may place on the accuracy or validity of this information is at their own risk. Past performance is not necessarily indicative of the future results of an investment. All figures are estimated and unaudited unless otherwise noted. If you are not the intended recipient, or a person responsible for delivering this to the intended recipient, you are not authorized to and must not disclose, copy, distribute, or retain this message or any part of it. In this case, please notify the sender immediately at 713-333-5440
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20190122/4f874a1c/attachment.htm>

From Achim.Rehor at de.ibm.com  Tue Jan 22 18:18:03 2019
From: Achim.Rehor at de.ibm.com (Achim Rehor)
Date: Tue, 22 Jan 2019 19:18:03 +0100
Subject: [gpfsug-discuss] Node expels
In-Reply-To: <0B0D4ACE-1B54-4D22-85E3-B3154DD7C943@bham.ac.uk>
References: <A5C3E1A4-DC1A-4983-9C32-2A9DE2F10C57@bham.ac.uk><OFBE7422E0.78B38B96-ONC2258385.00408661-C2258385.0040AA47@notes.na.collabserv.com><OF8D485B42.80E48D28-ONC2258385.00497FA3-C2258385.0049F88F@LocalDomain><OFFA6836B4.406B5B85-ON85258385.004E611F-85258385.004FB83B@notes.na.collabserv.com>
	<0B0D4ACE-1B54-4D22-85E3-B3154DD7C943@bham.ac.uk>
Message-ID: <OF3FD5E84E.B9F4F55B-ONC125838A.0060E4FC-C125838A.00648750@notes.na.collabserv.com>

An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20190122/45cf166e/attachment.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: image/gif
Size: 7182 bytes
Desc: not available
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20190122/45cf166e/attachment.gif>

From Renar.Grunenberg at huk-coburg.de  Wed Jan 23 12:45:39 2019
From: Renar.Grunenberg at huk-coburg.de (Grunenberg, Renar)
Date: Wed, 23 Jan 2019 12:45:39 +0000
Subject: [gpfsug-discuss] Spectrum Scale Cygwin cmd delays
In-Reply-To: <be9f8bd00ba543e583d2cbf840de34d6@SMXRF105.msg.hukrf.de>
References: <5b711884338047fbaf5a61005964652d@SMXRF105.msg.hukrf.de>
	<OFAAA4B565.31A1217E-ON88258389.008056FE-88258389.0081815D@notes.na.collabserv.com>
	<18bab23b080c4ad487c68b8ebc04b975@quantlab.com>
	<be9f8bd00ba543e583d2cbf840de34d6@SMXRF105.msg.hukrf.de>
Message-ID: <349cb338583a4c1d996677837fc65b6e@SMXRF105.msg.hukrf.de>

Hallo All,

as a point to the problem, it seems to be that all the delayes are happening here

DEBUG=1 mmgetstate ?a

??..
/bin/rm -f /var/mmfs/gen/mmsdrfs.1256 /var/mmfs/tmp/allClusterNodes.mmgetstate.1256 /var/mmfs/tmp/allQuorumNodes.mmgetstate.1256 /var/mmfs/tmp/allNonQuorumNodes.mmgetstate.1256 /var/mmfs/ssl/stage/tmpKeyData.mmgetstate.1256.pub /var/mmfs/ssl/stage/tmpKeyData.mmgetstate.1256.priv /var/mmfs/ssl/stage/tmpKeyData.mmgetstate.1256.cert /var/mmfs/ssl/stage/tmpKeyData.mmgetstate.1256.keystore /var/mmfs/tmp/nodefile.mmgetstate.1256 /var/mmfs/tmp/diskfile.mmgetstate.1256 /var/mmfs/tmp/diskNamesFile.mmgetstate.1256

Any points to this it will be fixed in the near future are welcome.

Regards Renar


Renar Grunenberg
Abteilung Informatik - Betrieb

HUK-COBURG
Bahnhofsplatz
96444 Coburg
Telefon:        09561 96-44110
Telefax:        09561 96-44104
E-Mail: Renar.Grunenberg at huk-coburg.de
Internet:       www.huk.de
________________________________
HUK-COBURG Haftpflicht-Unterst?tzungs-Kasse kraftfahrender Beamter Deutschlands a. G. in Coburg
Reg.-Gericht Coburg HRB 100; St.-Nr. 9212/101/00021
Sitz der Gesellschaft: Bahnhofsplatz, 96444 Coburg
Vorsitzender des Aufsichtsrats: Prof. Dr. Heinrich R. Schradin.
Vorstand: Klaus-J?rgen Heitmann (Sprecher), Stefan Gronbach, Dr. Hans Olav Her?y, Dr. J?rg Rheinl?nder (stv.), Sarah R?ssler, Daniel Thomas.
________________________________
Diese Nachricht enth?lt vertrauliche und/oder rechtlich gesch?tzte Informationen.
Wenn Sie nicht der richtige Adressat sind oder diese Nachricht irrt?mlich erhalten haben,
informieren Sie bitte sofort den Absender und vernichten Sie diese Nachricht.
Das unerlaubte Kopieren sowie die unbefugte Weitergabe dieser Nachricht ist nicht gestattet.

This information may contain confidential and/or privileged information.
If you are not the intended recipient (or have received this information in error) please notify the
sender immediately and destroy this information.
Any unauthorized copying, disclosure or distribution of the material in this information is strictly forbidden.
________________________________
Von: Grunenberg, Renar
Gesendet: Dienstag, 22. Januar 2019 18:10
An: 'gpfsug main discussion list' <gpfsug-discuss at spectrumscale.org>
Betreff: AW: [gpfsug-discuss] Spectrum Scale Cygwin cmd delays

Hallo Roger,
first thanks fort he tip. But we decided to separate the linux-io-Cluster from the Windows client only cluster, because of security requirements and ssh management requirements. We can use at this point, local named admins on Windows and use on Linux a Deamon and an separated Admin-interface Network for pwless root ssh. Your Hint seems to be CCR related or is this a Cygwin problem.
@Spectrum Scale Team:
Point1: IP V6 can?t disabled because of applications that want to use this. But the mmcmi cmd are give us already the right ipv4 adresses.
Point2. There are no DNS-Issues
Point3: We must check these.
Any recommendations to Rogers statements?

Regards Renar

Von: gpfsug-discuss-bounces at spectrumscale.org<mailto:gpfsug-discuss-bounces at spectrumscale.org> [mailto:gpfsug-discuss-bounces at spectrumscale.org] Im Auftrag von Roger Moye
Gesendet: Dienstag, 22. Januar 2019 16:43
An: gpfsug main discussion list <gpfsug-discuss at spectrumscale.org<mailto:gpfsug-discuss at spectrumscale.org>>
Betreff: Re: [gpfsug-discuss] Spectrum Scale Cygwin cmd delays

We experienced the same issue and were advised not to use Windows for quorum nodes.   We moved our Windows nodes into the storage cluster which was entirely Linux and that solved it.   If this is not an option, perhaps adding some Linux nodes to your remote cluster as quorum nodes would help.

-Roger


From: gpfsug-discuss-bounces at spectrumscale.org<mailto:gpfsug-discuss-bounces at spectrumscale.org> [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of IBM Spectrum Scale
Sent: Monday, January 21, 2019 5:35 PM
To: gpfsug main discussion list <gpfsug-discuss at spectrumscale.org<mailto:gpfsug-discuss at spectrumscale.org>>
Subject: Re: [gpfsug-discuss] Spectrum Scale Cygwin cmd delays

Hello Renar,

A few things to try:

  1.  Make sure IPv6 is disabled. On each Windows node, run "mmcmi  host  <hostname>", with <hostname> being itself and each and every node in the cluster. Make sure mmcmi prints valid IPv4 address.

  1.  To eliminate DNS issues, try adding IPv4 entries for each cluster node in "c:\windows\system32\drivers\etc\hosts".

  1.  If any anti-virus is active, disable realtime scanning on c:\cygwin64  (wherever you installed cygwin 64-bit).

You can also try debugging a script, say: (from GPFS ksh):  DEBUG=1  mmlscluster, and see what takes time.

Regards, The Spectrum Scale (GPFS) team

------------------------------------------------------------------------------------------------------------------
If you feel that your question can benefit other users of  Spectrum Scale (GPFS), then please post it to the public IBM developerWroks Forum at https://www.ibm.com/developerworks/community/forums/html/forum?id=11111111-0000-0000-0000-000000000479.

If your query concerns a potential software error in Spectrum Scale (GPFS) and you have an IBM software maintenance contract please contact  1-800-237-5511 in the United States or your local IBM Service Center in other countries.

The forum is informally monitored as time permits and should not be used for priority messages to the Spectrum Scale (GPFS) team.


From:        "Grunenberg, Renar" <Renar.Grunenberg at huk-coburg.de<mailto:Renar.Grunenberg at huk-coburg.de>>
To:        "'gpfsug-discuss at spectrumscale.org'" <gpfsug-discuss at spectrumscale.org<mailto:gpfsug-discuss at spectrumscale.org>>
Date:        01/21/2019 08:01 AM
Subject:        [gpfsug-discuss] Spectrum Scale Cygwin cmd delays
Sent by:        gpfsug-discuss-bounces at spectrumscale.org<mailto:gpfsug-discuss-bounces at spectrumscale.org>
________________________________


Hello All,

We test spectrum scale on an windows only Client-Cluster (remote mounted to a linux Cluster) but

the execution of mm commands in cygwin is very slow.

We have tried the following adjustments to increase the execution speed.


  *   We have installed Cygwin Server as a service (cygserver-config). Unfortunately, this resulted in no faster execution.
  *   Adaptation of the hosts file: 127.0.0.1localhost cygdrive wpad
to prevent any DNS problems when accessing ?/cygdrive/...?
  *   Started them as Administrator


All adjustments have so far not led to any improvement. Are there any hints to enhance the cmd execution time on windows (w2k12 actual used)


Regards Renar


Renar Grunenberg
Abteilung Informatik - Betrieb

HUK-COBURG
Bahnhofsplatz
96444 Coburg
Telefon:

09561 96-44110

Telefax:

09561 96-44104

E-Mail:

Renar.Grunenberg at huk-coburg.de<mailto:Renar.Grunenberg at huk-coburg.de>

Internet:

www.huk.de


________________________________
HUK-COBURG Haftpflicht-Unterst?tzungs-Kasse kraftfahrender Beamter Deutschlands a. G. in Coburg
Reg.-Gericht Coburg HRB 100; St.-Nr. 9212/101/00021
Sitz der Gesellschaft: Bahnhofsplatz, 96444 Coburg
Vorsitzender des Aufsichtsrats: Prof. Dr. Heinrich R. Schradin.
Vorstand: Klaus-J?rgen Heitmann (Sprecher), Stefan Gronbach, Dr. Hans Olav Her?y, Dr. J?rg Rheinl?nder (stv.), Sarah R?ssler, Daniel Thomas.
________________________________
Diese Nachricht enth?lt vertrauliche und/oder rechtlich gesch?tzte Informationen.
Wenn Sie nicht der richtige Adressat sind oder diese Nachricht irrt?mlich erhalten haben,
informieren Sie bitte sofort den Absender und vernichten Sie diese Nachricht.
Das unerlaubte Kopieren sowie die unbefugte Weitergabe dieser Nachricht ist nicht gestattet.

This information may contain confidential and/or privileged information.
If you are not the intended recipient (or have received this information in error) please notify the
sender immediately and destroy this information.
Any unauthorized copying, disclosure or distribution of the material in this information is strictly forbidden.
________________________________

_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss

 -----------------------------------------------------------------------------------

The information in this communication and any attachment is confidential and intended solely for the attention and use of the named addressee(s). All information and opinions expressed herein are subject to change without notice. This communication is not to be construed as an offer to sell or the solicitation of an offer to buy any security. Any such offer or solicitation can only be made by means of the delivery of a confidential private offering memorandum (which should be carefully reviewed for a complete description of investment strategies and risks). Any reliance one may place on the accuracy or validity of this information is at their own risk. Past performance is not necessarily indicative of the future results of an investment. All figures are estimated and unaudited unless otherwise noted. If you are not the intended recipient, or a person responsible for delivering this to the intended recipient, you are not authorized to and must not disclose, copy, distribute, or retain this message or any part of it. In this case, please notify the sender immediately at 713-333-5440
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20190123/eff7ad74/attachment.htm>

From heiner.billich at psi.ch  Thu Jan 24 14:29:42 2019
From: heiner.billich at psi.ch (Billich Heinrich Rainer (PSI))
Date: Thu, 24 Jan 2019 14:29:42 +0000
Subject: [gpfsug-discuss] does ganesha deny access for unknown UIDs?
Message-ID: <44051794-8F45-4725-92E0-09729474E7A1@psi.ch>

Hello,

a local account on a nfs client couldn?t write to a ganesha nfs export even with directory permissions 777. The solution was to create the account on the ganesha servers, too.

Please can you confirm that this is the intended behaviour? is there an option to change this and to map unknown accounts to nobody instead? We often have embedded Linux appliances or similar as nfs clients which need to place some data on the nfs exports  using uid/gid of local accounts.

We manage gids on the server side and allow NFS v3 client access only.

I crosspost this to ganesha support and to the gpfsug mailing list.

Thank you,

Heiner Billich

ganesha version: 2.5.3-ibm028.00.el7.x86_64

the ganesha config

CacheInode
{
        fd_hwmark_percent=60;
        fd_lwmark_percent=20;
        fd_limit_percent=90;
        lru_run_interval=90;
        entries_hwmark=1500000;
}
NFS_Core_Param
{
        clustered=TRUE;
        rpc_max_connections=10000;
        heartbeat_freq=0;
        mnt_port=33247;
        nb_worker=256;
        nfs_port=2049;
        nfs_protocols=3,4;
        nlm_port=33245;
        rquota_port=33246;
        rquota_port=33246;
        short_file_handle=FALSE;
        mount_path_pseudo=true;
}
GPFS
{
        fsal_grace=FALSE;
        fsal_trace=TRUE;
}
NFSv4
{
        delegations=FALSE;
        domainname=virtual1.com;
        grace_period=60;
        lease_lifetime=60;
}
Export_Defaults
{
        access_type=none;
        anonymous_gid=-2;
        anonymous_uid=-2;
        manage_gids=TRUE;
        nfs_commit=FALSE;
        privilegedport=FALSE;
        protocols=3,4;
        sectype=sys;
        squash=root_squash;
        transports=TCP;
}

one export

# === START /**** id=206 nclients=3 ===
EXPORT {
            Attr_Expiration_Time=60;
            Delegations=none;
            Export_id=206;
            Filesystem_id=42.206;
            MaxOffsetRead=18446744073709551615;
            MaxOffsetWrite=18446744073709551615;
            MaxRead=1048576;
            MaxWrite=1048576;
            Path="/****";
            PrefRead=1048576;
            PrefReaddir=1048576;
            PrefWrite=1048576;
            Pseudo="/****";
            Tag="****";
            UseCookieVerifier=false;
            FSAL {
                        Name=GPFS;
            }
            CLIENT {
                # === ****/X12SA ===
                        Access_Type=RW;
                        Anonymous_gid=-2;
                        Anonymous_uid=-2;
                        Clients=X.Y.A.B/24;
                        Delegations=none;
                        Manage_Gids=TRUE;
                        NFS_Commit=FALSE;
                        PrivilegedPort=FALSE;
                        Protocols=3;
                        SecType=SYS;
                        Squash=Root;
                        Transports=TCP;
            }
?.
--
Paul Scherrer Institut
Heiner Billich
System Engineer Scientific Computing
Science IT / High Performance Computing
WHGA/106
Forschungsstrasse 111
5232 Villigen PSI
Switzerland

Phone +41 56 310 36 02
heiner.billich at psi.ch<mailto:heiner.billich at psi.ch>
https://www.psi.ch


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20190124/98475db1/attachment.htm>

From truongv at us.ibm.com  Thu Jan 24 18:17:45 2019
From: truongv at us.ibm.com (Truong Vu)
Date: Thu, 24 Jan 2019 13:17:45 -0500
Subject: [gpfsug-discuss] Spectrum Scale Cygwin cmd delays
In-Reply-To: <mailman.136.1548247543.1125.gpfsug-discuss@spectrumscale.org>
References: <mailman.136.1548247543.1125.gpfsug-discuss@spectrumscale.org>
Message-ID: <OFAC36337F.4D74A762-ON8525838C.0064074A-8525838C.00648091@notes.na.collabserv.com>


Hi Renar,

Let's see if it is really the /bin/rm is the problem here.  Can you run the
command again without cleanup the temp files as follow:

DEBUG=1 keepTempFiles=1 mmgetstate -a

Thanks,
Tru.


From:	gpfsug-discuss-request at spectrumscale.org
To:	gpfsug-discuss at spectrumscale.org
Date:	01/23/2019 07:46 AM
Subject:	gpfsug-discuss Digest, Vol 84, Issue 32
Sent by:	gpfsug-discuss-bounces at spectrumscale.org


Send gpfsug-discuss mailing list submissions to
		 gpfsug-discuss at spectrumscale.org

To subscribe or unsubscribe via the World Wide Web, visit

https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=HQmkdQWQHoc1Nu6Mg_g8NVugim3OiUUy5n0QgLQcbkM&m=jDHxo2hE5uOrH5xaI6YYQdQ-O5yZG-udF7ooPNOEUUM&s=UBffyp1tO8WZsaCys72XHljL9SyUe_v4ECCmymP17Lg&e=

or, via email, send a message with subject or body 'help' to
		 gpfsug-discuss-request at spectrumscale.org

You can reach the person managing the list at
		 gpfsug-discuss-owner at spectrumscale.org

When replying, please edit your Subject line so it is more specific
than "Re: Contents of gpfsug-discuss digest..."


Today's Topics:

   1. Re: Spectrum Scale Cygwin cmd delays (Grunenberg, Renar)


----------------------------------------------------------------------

Message: 1
Date: Wed, 23 Jan 2019 12:45:39 +0000
From: "Grunenberg, Renar" <Renar.Grunenberg at huk-coburg.de>
To: 'gpfsug main discussion list' <gpfsug-discuss at spectrumscale.org>
Subject: Re: [gpfsug-discuss] Spectrum Scale Cygwin cmd delays
Message-ID: <349cb338583a4c1d996677837fc65b6e at SMXRF105.msg.hukrf.de>
Content-Type: text/plain; charset="utf-8"

Hallo All,

as a point to the problem, it seems to be that all the delayes are
happening here

DEBUG=1 mmgetstate ?a

??..
/bin/rm
-f /var/mmfs/gen/mmsdrfs.1256 /var/mmfs/tmp/allClusterNodes.mmgetstate.1256 /var/mmfs/tmp/allQuorumNodes.mmgetstate.1256 /var/mmfs/tmp/allNonQuorumNodes.mmgetstate.1256 /var/mmfs/ssl/stage/tmpKeyData.mmgetstate.1256.pub /var/mmfs/ssl/stage/tmpKeyData.mmgetstate.1256.priv /var/mmfs/ssl/stage/tmpKeyData.mmgetstate.1256.cert /var/mmfs/ssl/stage/tmpKeyData.mmgetstate.1256.keystore /var/mmfs/tmp/nodefile.mmgetstate.1256 /var/mmfs/tmp/diskfile.mmgetstate.1256 /var/mmfs/tmp/diskNamesFile.mmgetstate.1256


Any points to this it will be fixed in the near future are welcome.

Regards Renar


Renar Grunenberg
Abteilung Informatik - Betrieb

HUK-COBURG
Bahnhofsplatz
96444 Coburg
Telefon:        09561 96-44110
Telefax:        09561 96-44104
E-Mail: Renar.Grunenberg at huk-coburg.de
Internet:       www.huk.de
________________________________
HUK-COBURG Haftpflicht-Unterst?tzungs-Kasse kraftfahrender Beamter
Deutschlands a. G. in Coburg
Reg.-Gericht Coburg HRB 100; St.-Nr. 9212/101/00021
Sitz der Gesellschaft: Bahnhofsplatz, 96444 Coburg
Vorsitzender des Aufsichtsrats: Prof. Dr. Heinrich R. Schradin.
Vorstand: Klaus-J?rgen Heitmann (Sprecher), Stefan Gronbach, Dr. Hans Olav
Her?y, Dr. J?rg Rheinl?nder (stv.), Sarah R?ssler, Daniel Thomas.
________________________________
Diese Nachricht enth?lt vertrauliche und/oder rechtlich gesch?tzte
Informationen.
Wenn Sie nicht der richtige Adressat sind oder diese Nachricht irrt?mlich
erhalten haben,
informieren Sie bitte sofort den Absender und vernichten Sie diese
Nachricht.
Das unerlaubte Kopieren sowie die unbefugte Weitergabe dieser Nachricht ist
nicht gestattet.

This information may contain confidential and/or privileged information.
If you are not the intended recipient (or have received this information in
error) please notify the
sender immediately and destroy this information.
Any unauthorized copying, disclosure or distribution of the material in
this information is strictly forbidden.
________________________________
Von: Grunenberg, Renar
Gesendet: Dienstag, 22. Januar 2019 18:10
An: 'gpfsug main discussion list' <gpfsug-discuss at spectrumscale.org>
Betreff: AW: [gpfsug-discuss] Spectrum Scale Cygwin cmd delays

Hallo Roger,
first thanks fort he tip. But we decided to separate the linux-io-Cluster
from the Windows client only cluster, because of security requirements and
ssh management requirements. We can use at this point, local named admins
on Windows and use on Linux a Deamon and an separated Admin-interface
Network for pwless root ssh. Your Hint seems to be CCR related or is this a
Cygwin problem.
@Spectrum Scale Team:
Point1: IP V6 can?t disabled because of applications that want to use this.
But the mmcmi cmd are give us already the right ipv4 adresses.
Point2. There are no DNS-Issues
Point3: We must check these.
Any recommendations to Rogers statements?

Regards Renar

Von: gpfsug-discuss-bounces at spectrumscale.org<
mailto:gpfsug-discuss-bounces at spectrumscale.org> [
mailto:gpfsug-discuss-bounces at spectrumscale.org] Im Auftrag von Roger Moye
Gesendet: Dienstag, 22. Januar 2019 16:43
An: gpfsug main discussion list <gpfsug-discuss at spectrumscale.org<
mailto:gpfsug-discuss at spectrumscale.org>>
Betreff: Re: [gpfsug-discuss] Spectrum Scale Cygwin cmd delays

We experienced the same issue and were advised not to use Windows for
quorum nodes.   We moved our Windows nodes into the storage cluster which
was entirely Linux and that solved it.   If this is not an option, perhaps
adding some Linux nodes to your remote cluster as quorum nodes would help.

-Roger


From: gpfsug-discuss-bounces at spectrumscale.org<
mailto:gpfsug-discuss-bounces at spectrumscale.org> [
mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of IBM Spectrum
Scale
Sent: Monday, January 21, 2019 5:35 PM
To: gpfsug main discussion list <gpfsug-discuss at spectrumscale.org<
mailto:gpfsug-discuss at spectrumscale.org>>
Subject: Re: [gpfsug-discuss] Spectrum Scale Cygwin cmd delays

Hello Renar,

A few things to try:

  1.  Make sure IPv6 is disabled. On each Windows node, run "mmcmi  host
<hostname>", with <hostname> being itself and each and every node in the
cluster. Make sure mmcmi prints valid IPv4 address.

  1.  To eliminate DNS issues, try adding IPv4 entries for each cluster
node in "c:\windows\system32\drivers\etc\hosts".

  1.  If any anti-virus is active, disable realtime scanning on c:\cygwin64
(wherever you installed cygwin 64-bit).

You can also try debugging a script, say: (from GPFS ksh):  DEBUG=1
mmlscluster, and see what takes time.

Regards, The Spectrum Scale (GPFS) team

------------------------------------------------------------------------------------------------------------------

If you feel that your question can benefit other users of  Spectrum Scale
(GPFS), then please post it to the public IBM developerWroks Forum at
https://www.ibm.com/developerworks/community/forums/html/forum?id=11111111-0000-0000-0000-000000000479
.

If your query concerns a potential software error in Spectrum Scale (GPFS)
and you have an IBM software maintenance contract please contact
1-800-237-5511 in the United States or your local IBM Service Center in
other countries.

The forum is informally monitored as time permits and should not be used
for priority messages to the Spectrum Scale (GPFS) team.


From:        "Grunenberg, Renar" <Renar.Grunenberg at huk-coburg.de<
mailto:Renar.Grunenberg at huk-coburg.de>>
To:        "'gpfsug-discuss at spectrumscale.org'"
<gpfsug-discuss at spectrumscale.org<mailto:gpfsug-discuss at spectrumscale.org>>
Date:        01/21/2019 08:01 AM
Subject:        [gpfsug-discuss] Spectrum Scale Cygwin cmd delays
Sent by:        gpfsug-discuss-bounces at spectrumscale.org<
mailto:gpfsug-discuss-bounces at spectrumscale.org>
________________________________


Hello All,

We test spectrum scale on an windows only Client-Cluster (remote mounted to
a linux Cluster) but

the execution of mm commands in cygwin is very slow.

We have tried the following adjustments to increase the execution speed.


  *   We have installed Cygwin Server as a service (cygserver-config).
Unfortunately, this resulted in no faster execution.
  *   Adaptation of the hosts file: 127.0.0.1localhost cygdrive wpad
to prevent any DNS problems when accessing ?/cygdrive/...?
  *   Started them as Administrator


All adjustments have so far not led to any improvement. Are there any hints
to enhance the cmd execution time on windows (w2k12 actual used)


Regards Renar


Renar Grunenberg
Abteilung Informatik - Betrieb

HUK-COBURG
Bahnhofsplatz
96444 Coburg
Telefon:

09561 96-44110

Telefax:

09561 96-44104

E-Mail:

Renar.Grunenberg at huk-coburg.de<mailto:Renar.Grunenberg at huk-coburg.de>

Internet:

www.huk.de


________________________________
HUK-COBURG Haftpflicht-Unterst?tzungs-Kasse kraftfahrender Beamter
Deutschlands a. G. in Coburg
Reg.-Gericht Coburg HRB 100; St.-Nr. 9212/101/00021
Sitz der Gesellschaft: Bahnhofsplatz, 96444 Coburg
Vorsitzender des Aufsichtsrats: Prof. Dr. Heinrich R. Schradin.
Vorstand: Klaus-J?rgen Heitmann (Sprecher), Stefan Gronbach, Dr. Hans Olav
Her?y, Dr. J?rg Rheinl?nder (stv.), Sarah R?ssler, Daniel Thomas.
________________________________
Diese Nachricht enth?lt vertrauliche und/oder rechtlich gesch?tzte
Informationen.
Wenn Sie nicht der richtige Adressat sind oder diese Nachricht irrt?mlich
erhalten haben,
informieren Sie bitte sofort den Absender und vernichten Sie diese
Nachricht.
Das unerlaubte Kopieren sowie die unbefugte Weitergabe dieser Nachricht ist
nicht gestattet.

This information may contain confidential and/or privileged information.
If you are not the intended recipient (or have received this information in
error) please notify the
sender immediately and destroy this information.
Any unauthorized copying, disclosure or distribution of the material in
this information is strictly forbidden.
________________________________

_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=HQmkdQWQHoc1Nu6Mg_g8NVugim3OiUUy5n0QgLQcbkM&m=jDHxo2hE5uOrH5xaI6YYQdQ-O5yZG-udF7ooPNOEUUM&s=UBffyp1tO8WZsaCys72XHljL9SyUe_v4ECCmymP17Lg&e=


-----------------------------------------------------------------------------------


The information in this communication and any attachment is confidential
and intended solely for the attention and use of the named addressee(s).
All information and opinions expressed herein are subject to change without
notice. This communication is not to be construed as an offer to sell or
the solicitation of an offer to buy any security. Any such offer or
solicitation can only be made by means of the delivery of a confidential
private offering memorandum (which should be carefully reviewed for a
complete description of investment strategies and risks). Any reliance one
may place on the accuracy or validity of this information is at their own
risk. Past performance is not necessarily indicative of the future results
of an investment. All figures are estimated and unaudited unless otherwise
noted. If you are not the intended recipient, or a person responsible for
delivering this to the intended recipient, you are not authorized to and
must not disclose, copy, distribute, o
 r retain this message or any part of it. In this case, please notify the
sender immediately at 713-333-5440
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <
https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_pipermail_gpfsug-2Ddiscuss_attachments_20190123_eff7ad74_attachment.html&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=HQmkdQWQHoc1Nu6Mg_g8NVugim3OiUUy5n0QgLQcbkM&m=jDHxo2hE5uOrH5xaI6YYQdQ-O5yZG-udF7ooPNOEUUM&s=JWv1FytE6pkOdJtqJV5sSVf3ZwV0B9FDZmfzI7LQEGk&e=
>

------------------------------

_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=HQmkdQWQHoc1Nu6Mg_g8NVugim3OiUUy5n0QgLQcbkM&m=jDHxo2hE5uOrH5xaI6YYQdQ-O5yZG-udF7ooPNOEUUM&s=UBffyp1tO8WZsaCys72XHljL9SyUe_v4ECCmymP17Lg&e=


End of gpfsug-discuss Digest, Vol 84, Issue 32
**********************************************


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20190124/5009094e/attachment.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: graycol.gif
Type: image/gif
Size: 105 bytes
Desc: not available
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20190124/5009094e/attachment.gif>

From heiner.billich at psi.ch  Fri Jan 25 09:13:53 2019
From: heiner.billich at psi.ch (Billich Heinrich Rainer (PSI))
Date: Fri, 25 Jan 2019 09:13:53 +0000
Subject: [gpfsug-discuss] [NFS-Ganesha-Support] does ganesha deny access
	for unknown UIDs?
In-Reply-To: <35897363-6096-89e9-d22c-ba97ad10c26f@redhat.com>
References: <44051794-8F45-4725-92E0-09729474E7A1@psi.ch>
	<35897363-6096-89e9-d22c-ba97ad10c26f@redhat.com>
Message-ID: <1F7557E9-FE60-4F37-BA0A-FD4C37E124BD@psi.ch>

Hello Daniel,

thank you. The clients do NFS v3 mounts, hence idmap is no option - as I know it's used in NFS v4 to map between uid/guid and names only? For a process to switch to a certain uid/guid in general one does not need a  matching passwd entry? I see that with ACLs you get issues as they use names, and you can't do a server-side group membership lookup, and there may be more subtle issues.

Anyway, I'll create the needed accounts on the server. By the way: We had the same issue with Netapp filers and it took a while to  find the configuration option to allow 'unknown' uid/gid  to access a nfs v3 export.

I'll try  to reproduce on a test system with increased logging to see what exactly goes wrong and maybe ask later to add a configuration option to ganesha to switch to a behaviour more similar to kernel-nfs.

Many client systems at my site are legacy and run various operating systems, hence a complete switch to NFS v4 is unlikely to happen soon.

cheers,

Heiner 
--
Paul Scherrer Institut
Heiner Billich                           
System Engineer Scientific Computing
Science IT / High Performance Computing                 
WHGA/106                              
Forschungsstrasse 111
5232 Villigen PSI
Switzerland
 
Phone +41 56 310 36 02
heiner.billich at psi.ch 
https://www.psi.ch
 
 
?On 24/01/19 16:35, "Daniel Gryniewicz" <dang at redhat.com> wrote:

    Hi.
    
    For local operating FSALs (like GPFS and VFS), the way Ganesha makes 
    sure that a UID/GID combo has the correct permissions for an operation 
    is to set the UID/GID of the thread to the one in the operation, then 
    perform the actual operation.  This way, the kernel and the underlying 
    filesystem perform atomic permission checking on the op.  This 
    setuid/setgid will fail, of course, if the local system doesn't have 
    that UID/GID to set to.
    
    The solution for this is to use NFS idmap to map the remote ID to a 
    local one. This includes the ability to map unknown IDs to some local ID.
    
    Daniel
    
    On 1/24/19 9:29 AM, Billich Heinrich Rainer (PSI) wrote:
    > Hello,
    > 
    > a local account on a nfs client couldn?t write to a ganesha nfs export 
    > even with directory permissions 777. The solution was to create the 
    > account on the ganesha servers, too.
    > 
    > Please can you confirm that this is the intended behaviour? is there an 
    > option to change this and to map unknown accounts to nobody instead? We 
    > often have embedded Linux appliances or similar as nfs clients which 
    > need to place some data on the nfs exports  using uid/gid of local accounts.
    > 
    > We manage gids on the server side and allow NFS v3 client access only.
    > 
    > I crosspost this to ganesha support and to the gpfsug mailing list.
    > 
    > Thank you,
    > 
    > Heiner Billich
    > 
    > ganesha version: 2.5.3-ibm028.00.el7.x86_64
    
    
From andy_kurth at ncsu.edu  Fri Jan 25 16:08:12 2019
From: andy_kurth at ncsu.edu (Andy Kurth)
Date: Fri, 25 Jan 2019 11:08:12 -0500
Subject: [gpfsug-discuss] does ganesha deny access for unknown UIDs?
In-Reply-To: <44051794-8F45-4725-92E0-09729474E7A1@psi.ch>
References: <44051794-8F45-4725-92E0-09729474E7A1@psi.ch>
Message-ID: <CAD7o_Xw16_oi6ci23dSt0Z0Ja8p5JN0TG4+D0cT8WP5GZEgv3g@mail.gmail.com>

I believe this is occurring because of the manage_gids=TRUE setting.  The
purpose of this setting is to overcome the AUTH_SYS 16 group limit.  If
true, Ganesha takes the UID and resolves all of the GIDs on the server.  If
false, the GIDs sent by the client are used.

I ran a quick test by creating a local user on the client and exporting 2
shares with 777 permissions, one with manage_gids=TRUE and one with FALSE.

The user could view the share and create files with manage_gids=FALSE.
ganesha.log showed that it tried and failed to resolve the UID to a name,
but allowed the operation nonetheless:

2019-01-25 10:19:03 : epoch 0001004c : gpfs-proto.domain :
ganesha.nfsd-123297[work-30] xdr_encode_nfs4_princ :ID MAPPER :INFO
:nfs4_uid_to_name failed with code -2.
2019-01-25 10:19:03 : epoch 0001004c : gpfs-proto.domain :
ganesha.nfsd-123297[work-30] xdr_encode_nfs4_princ :ID MAPPER :INFO :Lookup
for 779 failed, using numeric owner

With manage_gids=TRUE, the client received permission denied and
ganesha.log showed the GID query failing:

2019-01-25 10:19:27 : epoch 0001004c : gpfs-proto.domain :
ganesha.nfsd-123297[work-39] uid2grp_allocate_by_uid :ID MAPPER :INFO :No
matching password record found for uid 779
2019-01-25 10:19:27 : epoch 0001004c : gpfs-proto.domain :
ganesha.nfsd-123297[work-39] nfs_req_creds :DISP :INFO :Attempt to fetch
managed_gids failed

Hope this helps,
Andy Kurth / NC State University

On Thu, Jan 24, 2019 at 9:36 AM Billich Heinrich Rainer (PSI) <
heiner.billich at psi.ch> wrote:

> Hello,
>
>
>
> a local account on a nfs client couldn?t write to a ganesha nfs export
> even with directory permissions 777. The solution was to create the account
> on the ganesha servers, too.
>
>
>
> Please can you confirm that this is the intended behaviour? is there an
> option to change this and to map unknown accounts to nobody instead? We
> often have embedded Linux appliances or similar as nfs clients which need
> to place some data on the nfs exports  using uid/gid of local accounts.
>
>
>
> We manage gids on the server side and allow NFS v3 client access only.
>
>
>
> I crosspost this to ganesha support and to the gpfsug mailing list.
>
>
>
> Thank you,
>
>
>
> Heiner Billich
>
>
>
> ganesha version: 2.5.3-ibm028.00.el7.x86_64
>
>
>
> the ganesha config
>
>
>
> CacheInode
>
> {
>
>         fd_hwmark_percent=60;
>
>         fd_lwmark_percent=20;
>
>         fd_limit_percent=90;
>
>         lru_run_interval=90;
>
>         entries_hwmark=1500000;
>
> }
>
> NFS_Core_Param
>
> {
>
>         clustered=TRUE;
>
>         rpc_max_connections=10000;
>
>         heartbeat_freq=0;
>
>         mnt_port=33247;
>
>         nb_worker=256;
>
>         nfs_port=2049;
>
>         nfs_protocols=3,4;
>
>         nlm_port=33245;
>
>         rquota_port=33246;
>
>         rquota_port=33246;
>
>         short_file_handle=FALSE;
>
>         mount_path_pseudo=true;
>
> }
>
> GPFS
>
> {
>
>         fsal_grace=FALSE;
>
>         fsal_trace=TRUE;
>
> }
>
> NFSv4
>
> {
>
>         delegations=FALSE;
>
>         domainname=virtual1.com;
>
>         grace_period=60;
>
>         lease_lifetime=60;
>
> }
>
> Export_Defaults
>
> {
>
>         access_type=none;
>
>         anonymous_gid=-2;
>
>         anonymous_uid=-2;
>
>         manage_gids=TRUE;
>
>         nfs_commit=FALSE;
>
>         privilegedport=FALSE;
>
>         protocols=3,4;
>
>         sectype=sys;
>
>         squash=root_squash;
>
>         transports=TCP;
>
> }
>
>
>
> one export
>
>
>
> # === START /**** id=206 nclients=3 ===
>
> EXPORT {
>
>             Attr_Expiration_Time=60;
>
>             Delegations=none;
>
>             Export_id=206;
>
>             Filesystem_id=42.206;
>
>             MaxOffsetRead=18446744073709551615;
>
>             MaxOffsetWrite=18446744073709551615;
>
>             MaxRead=1048576;
>
>             MaxWrite=1048576;
>
>             Path="/****";
>
>             PrefRead=1048576;
>
>             PrefReaddir=1048576;
>
>             PrefWrite=1048576;
>
>             Pseudo="/****";
>
>             Tag="****";
>
>             UseCookieVerifier=false;
>
>             FSAL {
>
>                         Name=GPFS;
>
>             }
>
>             CLIENT {
>
>                 # === ****/X12SA ===
>
>                         Access_Type=RW;
>
>                         Anonymous_gid=-2;
>
>                         Anonymous_uid=-2;
>
>                         Clients=X.Y.A.B/24;
>
>                         Delegations=none;
>
>                         Manage_Gids=TRUE;
>
>                         NFS_Commit=FALSE;
>
>                         PrivilegedPort=FALSE;
>
>                         Protocols=3;
>
>                         SecType=SYS;
>
>                         Squash=Root;
>
>                         Transports=TCP;
>
>             }
>
> ?.
>
> --
>
> Paul Scherrer Institut
>
> Heiner Billich
>
> System Engineer Scientific Computing
>
> Science IT / High Performance Computing
>
> WHGA/106
>
> Forschungsstrasse 111
>
> 5232 Villigen PSI
>
> Switzerland
>
>
>
> Phone +41 56 310 36 02
>
> heiner.billich at psi.ch
>
> https://www.psi.ch
>
>
>
>
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>


-- 
*Andy Kurth*
Research Storage Specialist
NC State University
Office of Information Technology

P: 919-513-4090
311A Hillsborough Building
Campus Box 7109
Raleigh, NC 27695
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20190125/fa75ca82/attachment.htm>

From Robert.Oesterlin at nuance.com  Fri Jan 25 18:07:06 2019
From: Robert.Oesterlin at nuance.com (Oesterlin, Robert)
Date: Fri, 25 Jan 2019 18:07:06 +0000
Subject: [gpfsug-discuss] FW: 'Flash (Alert): IBM Spectrum Scale (GPFS)
 V4.1.1.0 through 5.0.1.1: a read from or write to a DMAPI-migrated file may
 result in undetected data corruption or...
Message-ID: <BCEF6705-161F-4AE3-8A6A-013A32E605DD@nuance.com>


[cid:forums.png] gpfs at us.ibm.com<https://urldefense.proofpoint.com/v2/url?u=http-3A__www.ibm.com_developerworks_community_profiles_html_profileView.do-3Fuserid-3D060000T9GF&d=DwMCaQ&c=djjh8EKwHtOepW4Bjau0lKhLlu-DxM1dlgP0rrLsOzY&r=LPDewt1Z4o9eKc86MXmhqX-45Cz1yz1ylYELF9olLKU&m=3MROJ4IHysUdoFBuLcjdmIZoyhVZhkc5B7akC5d3UrQ&s=qaM8zM9taqauMm_9u78ag5vuhlwoHKRJx5fKVwZd7ro&e=> created a topic named Flash (Alert): IBM Spectrum Scale (GPFS) V4.1.1.0 through 5.0.1.1: a read from or write to a DMAPI-migrated file may result in undetected data corruption or a recall failure<https://urldefense.proofpoint.com/v2/url?u=http-3A__www.ibm.com_developerworks_community_forums_html_topic-3Fid-3D7fb394b0-2Db121-2D4e25-2D94fc-2D7634e434fafb&d=DwMCaQ&c=djjh8EKwHtOepW4Bjau0lKhLlu-DxM1dlgP0rrLsOzY&r=LPDewt1Z4o9eKc86MXmhqX-45Cz1yz1ylYELF9olLKU&m=3MROJ4IHysUdoFBuLcjdmIZoyhVZhkc5B7akC5d3UrQ&s=zvkcT1meGoD98wIH019llMm-ccvSKHvVRh3vFuQbE30&e=> in the General Parallel File System - Announce (GPFS - Announce)<https://urldefense.proofpoint.com/v2/url?u=http-3A__www.ibm.com_developerworks_community_forums_html_forum-3Fid-3D11111111-2D0000-2D0000-2D0000-2D000000001606&d=DwMCaQ&c=djjh8EKwHtOepW4Bjau0lKhLlu-DxM1dlgP0rrLsOzY&r=LPDewt1Z4o9eKc86MXmhqX-45Cz1yz1ylYELF9olLKU&m=3MROJ4IHysUdoFBuLcjdmIZoyhVZhkc5B7akC5d3UrQ&s=YH3QraUtetdbVKro1UeYM71AKeXQhAqpLrEmIHtNLMk&e=> forum.

Abstract

IBM has identified a problem in IBM Spectrum Scale V4.1.1.0 through 5.0.1.1, in which under some conditions reading a DMAPI-migrated file may return zeroes instead of the actual data. Further, a DMAPI-migrate operation or writing to a DMAPI-migrated file may cause the size of the stub file to be updated incorrectly, which may cause a mismatch between the file size recorded in the stub file and in the migrated object. This may result in failure of a manual or transparent recall, when triggered by a subsequent read from or write to the file.


See the complete bulletin at:  http://www.ibm.com/support/docview.wss?uid=ibm10741243<https://urldefense.proofpoint.com/v2/url?u=http-3A__www.ibm.com_support_docview.wss-3Fuid-3Dibm10741243&d=DwMCaQ&c=djjh8EKwHtOepW4Bjau0lKhLlu-DxM1dlgP0rrLsOzY&r=LPDewt1Z4o9eKc86MXmhqX-45Cz1yz1ylYELF9olLKU&m=3MROJ4IHysUdoFBuLcjdmIZoyhVZhkc5B7akC5d3UrQ&s=5ISSBXUXtfdQwT3UCpTYivcYDsLl86bcnIffgyyR7jA&e=>
     Open this item<https://urldefense.proofpoint.com/v2/url?u=http-3A__www.ibm.com_developerworks_community_forums_html_topic-3Fid-3D7fb394b0-2Db121-2D4e25-2D94fc-2D7634e434fafb&d=DwMCaQ&c=djjh8EKwHtOepW4Bjau0lKhLlu-DxM1dlgP0rrLsOzY&r=LPDewt1Z4o9eKc86MXmhqX-45Cz1yz1ylYELF9olLKU&m=3MROJ4IHysUdoFBuLcjdmIZoyhVZhkc5B7akC5d3UrQ&s=zvkcT1meGoD98wIH019llMm-ccvSKHvVRh3vFuQbE30&e=>


Posting Date:

Friday, January 25, 2019 at 11:31:20 AM EST


To unsubscribe or change settings, please go to your developerWorks community Settings.<https://urldefense.proofpoint.com/v2/url?u=http-3A__www.ibm.com_developerworks_community_news&d=DwMCaQ&c=djjh8EKwHtOepW4Bjau0lKhLlu-DxM1dlgP0rrLsOzY&r=LPDewt1Z4o9eKc86MXmhqX-45Cz1yz1ylYELF9olLKU&m=3MROJ4IHysUdoFBuLcjdmIZoyhVZhkc5B7akC5d3UrQ&s=2haObCeGgZxUv289WrlvLEcJnvY7O0dgkxGfSRFniHE&e=>
This is a notification sent from developerWorks community<https://urldefense.proofpoint.com/v2/url?u=http-3A__www.ibm.com_developerworks_community_homepage&d=DwMCaQ&c=djjh8EKwHtOepW4Bjau0lKhLlu-DxM1dlgP0rrLsOzY&r=LPDewt1Z4o9eKc86MXmhqX-45Cz1yz1ylYELF9olLKU&m=3MROJ4IHysUdoFBuLcjdmIZoyhVZhkc5B7akC5d3UrQ&s=uJ55QyLzBRklALK7HBI4OVOTAmxPwh77GUMiZfvNGpE&e=>.


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20190125/145e4a1f/attachment.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image001.png
Type: image/png
Size: 289 bytes
Desc: image001.png
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20190125/145e4a1f/attachment.png>

From S.J.Thompson at bham.ac.uk  Fri Jan 25 18:28:27 2019
From: S.J.Thompson at bham.ac.uk (Simon Thompson)
Date: Fri, 25 Jan 2019 18:28:27 +0000
Subject: [gpfsug-discuss] does ganesha deny access for unknown UIDs?
In-Reply-To: <CAD7o_Xw16_oi6ci23dSt0Z0Ja8p5JN0TG4+D0cT8WP5GZEgv3g@mail.gmail.com>
References: <44051794-8F45-4725-92E0-09729474E7A1@psi.ch>,
	<CAD7o_Xw16_oi6ci23dSt0Z0Ja8p5JN0TG4+D0cT8WP5GZEgv3g@mail.gmail.com>
Message-ID: <CF45EE16DEF2FE4B9AA7FF2B6EE2654501297BBDD2@EX13.adf.bham.ac.uk>

Note there are other limitations introduced by setting manage_gids. Whilst you get round the 16 group limit, instead ACLs are not properly interpreted to provide user access when an ACL is in place. In a PMR were told the only was around this would be to user sec_krb.

Simon
________________________________________
From: gpfsug-discuss-bounces at spectrumscale.org [gpfsug-discuss-bounces at spectrumscale.org] on behalf of Andy Kurth [andy_kurth at ncsu.edu]
Sent: 25 January 2019 16:08
To: gpfsug main discussion list
Subject: Re: [gpfsug-discuss] does ganesha deny access for unknown UIDs?

I believe this is occurring because of the manage_gids=TRUE setting.  The purpose of this setting is to overcome the AUTH_SYS 16 group limit.  If true, Ganesha takes the UID and resolves all of the GIDs on the server.  If false, the GIDs sent by the client are used.

I ran a quick test by creating a local user on the client and exporting 2 shares with 777 permissions, one with manage_gids=TRUE and one with FALSE.

The user could view the share and create files with manage_gids=FALSE.  ganesha.log showed that it tried and failed to resolve the UID to a name, but allowed the operation nonetheless:

2019-01-25 10:19:03 : epoch 0001004c : gpfs-proto.domain : ganesha.nfsd-123297[work-30] xdr_encode_nfs4_princ :ID MAPPER :INFO :nfs4_uid_to_name failed with code -2.
2019-01-25 10:19:03 : epoch 0001004c : gpfs-proto.domain : ganesha.nfsd-123297[work-30] xdr_encode_nfs4_princ :ID MAPPER :INFO :Lookup for 779 failed, using numeric owner

With manage_gids=TRUE, the client received permission denied and ganesha.log showed the GID query failing:

2019-01-25 10:19:27 : epoch 0001004c : gpfs-proto.domain : ganesha.nfsd-123297[work-39] uid2grp_allocate_by_uid :ID MAPPER :INFO :No matching password record found for uid 779
2019-01-25 10:19:27 : epoch 0001004c : gpfs-proto.domain : ganesha.nfsd-123297[work-39] nfs_req_creds :DISP :INFO :Attempt to fetch managed_gids failed

Hope this helps,
Andy Kurth / NC State University

On Thu, Jan 24, 2019 at 9:36 AM Billich Heinrich Rainer (PSI) <heiner.billich at psi.ch<mailto:heiner.billich at psi.ch>> wrote:
Hello,

a local account on a nfs client couldn?t write to a ganesha nfs export even with directory permissions 777. The solution was to create the account on the ganesha servers, too.

Please can you confirm that this is the intended behaviour? is there an option to change this and to map unknown accounts to nobody instead? We often have embedded Linux appliances or similar as nfs clients which need to place some data on the nfs exports  using uid/gid of local accounts.

We manage gids on the server side and allow NFS v3 client access only.

I crosspost this to ganesha support and to the gpfsug mailing list.

Thank you,

Heiner Billich

ganesha version: 2.5.3-ibm028.00.el7.x86_64

the ganesha config

CacheInode
{
        fd_hwmark_percent=60;
        fd_lwmark_percent=20;
        fd_limit_percent=90;
        lru_run_interval=90;
        entries_hwmark=1500000;
}
NFS_Core_Param
{
        clustered=TRUE;
        rpc_max_connections=10000;
        heartbeat_freq=0;
        mnt_port=33247;
        nb_worker=256;
        nfs_port=2049;
        nfs_protocols=3,4;
        nlm_port=33245;
        rquota_port=33246;
        rquota_port=33246;
        short_file_handle=FALSE;
        mount_path_pseudo=true;
}
GPFS
{
        fsal_grace=FALSE;
        fsal_trace=TRUE;
}
NFSv4
{
        delegations=FALSE;
        domainname=virtual1.com<http://virtual1.com>;
        grace_period=60;
        lease_lifetime=60;
}
Export_Defaults
{
        access_type=none;
        anonymous_gid=-2;
        anonymous_uid=-2;
        manage_gids=TRUE;
        nfs_commit=FALSE;
        privilegedport=FALSE;
        protocols=3,4;
        sectype=sys;
        squash=root_squash;
        transports=TCP;
}

one export

# === START /**** id=206 nclients=3 ===
EXPORT {
            Attr_Expiration_Time=60;
            Delegations=none;
            Export_id=206;
            Filesystem_id=42.206;
            MaxOffsetRead=18446744073709551615;
            MaxOffsetWrite=18446744073709551615;
            MaxRead=1048576;
            MaxWrite=1048576;
            Path="/****";
            PrefRead=1048576;
            PrefReaddir=1048576;
            PrefWrite=1048576;
            Pseudo="/****";
            Tag="****";
            UseCookieVerifier=false;
            FSAL {
                        Name=GPFS;
            }
            CLIENT {
                # === ****/X12SA ===
                        Access_Type=RW;
                        Anonymous_gid=-2;
                        Anonymous_uid=-2;
                        Clients=X.Y.A.B/24;
                        Delegations=none;
                        Manage_Gids=TRUE;
                        NFS_Commit=FALSE;
                        PrivilegedPort=FALSE;
                        Protocols=3;
                        SecType=SYS;
                        Squash=Root;
                        Transports=TCP;
            }
?.
--
Paul Scherrer Institut
Heiner Billich
System Engineer Scientific Computing
Science IT / High Performance Computing
WHGA/106
Forschungsstrasse 111
5232 Villigen PSI
Switzerland

Phone +41 56 310 36 02
heiner.billich at psi.ch<mailto:heiner.billich at psi.ch>
https://www.psi.ch


_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org<http://spectrumscale.org>
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


--
Andy Kurth
Research Storage Specialist
NC State University
Office of Information Technology

P: 919-513-4090
311A Hillsborough Building
Campus Box 7109
Raleigh, NC 27695


From mnaineni at in.ibm.com  Fri Jan 25 19:38:27 2019
From: mnaineni at in.ibm.com (Malahal R Naineni)
Date: Fri, 25 Jan 2019 19:38:27 +0000
Subject: [gpfsug-discuss] does ganesha deny access for unknown UIDs?
In-Reply-To: <CAD7o_Xw16_oi6ci23dSt0Z0Ja8p5JN0TG4+D0cT8WP5GZEgv3g@mail.gmail.com>
References: <CAD7o_Xw16_oi6ci23dSt0Z0Ja8p5JN0TG4+D0cT8WP5GZEgv3g@mail.gmail.com>,
	<44051794-8F45-4725-92E0-09729474E7A1@psi.ch>
Message-ID: <OFEFF095EA.E45066EB-ON0025838D.006B8B87-0025838D.006BE414@notes.na.collabserv.com>

An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20190125/82be53fb/attachment.htm>

From chris.schlipalius at pawsey.org.au  Sat Jan 26 01:32:59 2019
From: chris.schlipalius at pawsey.org.au (Chris Schlipalius)
Date: Sat, 26 Jan 2019 09:32:59 +0800
Subject: [gpfsug-discuss] Announcing 2019 March 11th Singapore Spectrum
 Scale User Group event - call for user case speakers
Message-ID: <FDDF90AA-8CDB-460C-AB2F-82D22AFD98C1@pawsey.org.au>

Hello,
This is the announcement for the Spectrum Scale Usergroup Singapore on Monday 11th March 2019, Suntec Convention and Exhibition Centre, Singapore. 
This event is being held in conjunction with SCA19 https://sc-asia.org/
All current Singapore Spectrum Scale User Group event details can be found here: http://bit.ly/2FRur9d
We are calling for user case speakers please ? let Ulf, Xiang or myself know if you are available to speak at this Usergroup.
Feel free to circulate this event link to all who may need it.
Please reserve your tickets now as tickets for places will close soon.
There are some great speakers and topics, for details please see the agenda on Eventbrite.
We are looking forwards to a great Usergroup in a fabulous venue.
Thanks again to NSCC and IBM for helping to arrange the venue and event booking.
Regards,
    
Chris Schlipalius
IBM Champion 2019
Team Lead, Storage Infrastructure, Data & Visualisation, The Pawsey Supercomputing Centre (CSIRO)
1 Bryce Avenue
Kensington  WA  6151
Australia
Tel  +61 8 6436 8815  
Email  chris.schlipalius at pawsey.org.au
Web  www.pawsey.org.au <http://www.pawsey.org.au/>


From Renar.Grunenberg at huk-coburg.de  Mon Jan 28 08:36:45 2019
From: Renar.Grunenberg at huk-coburg.de (Grunenberg, Renar)
Date: Mon, 28 Jan 2019 08:36:45 +0000
Subject: [gpfsug-discuss] Spectrum Scale Cygwin cmd delays
In-Reply-To: <OFAC36337F.4D74A762-ON8525838C.0064074A-8525838C.00648091@notes.na.collabserv.com>
References: <mailman.136.1548247543.1125.gpfsug-discuss@spectrumscale.org>
	<OFAC36337F.4D74A762-ON8525838C.0064074A-8525838C.00648091@notes.na.collabserv.com>
Message-ID: <528da43a668745f38d68c0a82ecb53a3@SMXRF105.msg.hukrf.de>

Hallo Truong Vu,

unfortunality the results are the same, the cmd-responce are not what we want.
Ok, we want to analyze something with the trace facility and came to following link in the knowledge center:
https://www.ibm.com/support/knowledgecenter/en/STXKQY_5.0.2/com.ibm.spectrum.scale.v5r02.doc/bl1ins_instracsupp.htm

The docu mentioned that we must copy to windows files, tracefmt.exe and tracelog.exe, but the first one are only available in the DDK-Version 7.1 (W2K3), not in the WDK Version 8 or 10. We use W2K12. Can you clarify where I can find the mentioned files.

Regards Renar.


Renar Grunenberg
Abteilung Informatik - Betrieb

HUK-COBURG
Bahnhofsplatz
96444 Coburg
Telefon:        09561 96-44110
Telefax:        09561 96-44104
E-Mail: Renar.Grunenberg at huk-coburg.de
Internet:       www.huk.de
________________________________
HUK-COBURG Haftpflicht-Unterst?tzungs-Kasse kraftfahrender Beamter Deutschlands a. G. in Coburg
Reg.-Gericht Coburg HRB 100; St.-Nr. 9212/101/00021
Sitz der Gesellschaft: Bahnhofsplatz, 96444 Coburg
Vorsitzender des Aufsichtsrats: Prof. Dr. Heinrich R. Schradin.
Vorstand: Klaus-J?rgen Heitmann (Sprecher), Stefan Gronbach, Dr. Hans Olav Her?y, Dr. J?rg Rheinl?nder (stv.), Sarah R?ssler, Daniel Thomas.
________________________________
Diese Nachricht enth?lt vertrauliche und/oder rechtlich gesch?tzte Informationen.
Wenn Sie nicht der richtige Adressat sind oder diese Nachricht irrt?mlich erhalten haben,
informieren Sie bitte sofort den Absender und vernichten Sie diese Nachricht.
Das unerlaubte Kopieren sowie die unbefugte Weitergabe dieser Nachricht ist nicht gestattet.

This information may contain confidential and/or privileged information.
If you are not the intended recipient (or have received this information in error) please notify the
sender immediately and destroy this information.
Any unauthorized copying, disclosure or distribution of the material in this information is strictly forbidden.
________________________________
Von: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] Im Auftrag von Truong Vu
Gesendet: Donnerstag, 24. Januar 2019 19:18
An: gpfsug-discuss at spectrumscale.org
Betreff: Re: [gpfsug-discuss] Spectrum Scale Cygwin cmd delays


Hi Renar,

Let's see if it is really the /bin/rm is the problem here. Can you run the command again without cleanup the temp files as follow:

DEBUG=1 keepTempFiles=1 mmgetstate -a

Thanks,
Tru.


[Inactive hide details for gpfsug-discuss-request---01/23/2019 07:46:30 AM---Send gpfsug-discuss mailing list submissions to  gp]gpfsug-discuss-request---01/23/2019 07:46:30 AM---Send gpfsug-discuss mailing list submissions to gpfsug-discuss at spectrumscale.org<mailto:gpfsug-discuss at spectrumscale.org>

From: gpfsug-discuss-request at spectrumscale.org<mailto:gpfsug-discuss-request at spectrumscale.org>
To: gpfsug-discuss at spectrumscale.org<mailto:gpfsug-discuss at spectrumscale.org>
Date: 01/23/2019 07:46 AM
Subject: gpfsug-discuss Digest, Vol 84, Issue 32
Sent by: gpfsug-discuss-bounces at spectrumscale.org<mailto:gpfsug-discuss-bounces at spectrumscale.org>

________________________________


Send gpfsug-discuss mailing list submissions to
gpfsug-discuss at spectrumscale.org<mailto:gpfsug-discuss at spectrumscale.org>

To subscribe or unsubscribe via the World Wide Web, visit
http://gpfsug.org/mailman/listinfo/gpfsug-discuss
or, via email, send a message with subject or body 'help' to
gpfsug-discuss-request at spectrumscale.org<mailto:gpfsug-discuss-request at spectrumscale.org>

You can reach the person managing the list at
gpfsug-discuss-owner at spectrumscale.org<mailto:gpfsug-discuss-owner at spectrumscale.org>

When replying, please edit your Subject line so it is more specific
than "Re: Contents of gpfsug-discuss digest..."


Today's Topics:

  1. Re: Spectrum Scale Cygwin cmd delays (Grunenberg, Renar)


----------------------------------------------------------------------

Message: 1
Date: Wed, 23 Jan 2019 12:45:39 +0000
From: "Grunenberg, Renar" <Renar.Grunenberg at huk-coburg.de<mailto:Renar.Grunenberg at huk-coburg.de>>
To: 'gpfsug main discussion list' <gpfsug-discuss at spectrumscale.org<mailto:gpfsug-discuss at spectrumscale.org>>
Subject: Re: [gpfsug-discuss] Spectrum Scale Cygwin cmd delays
Message-ID: <349cb338583a4c1d996677837fc65b6e at SMXRF105.msg.hukrf.de<mailto:349cb338583a4c1d996677837fc65b6e at SMXRF105.msg.hukrf.de>>
Content-Type: text/plain; charset="utf-8"

Hallo All,

as a point to the problem, it seems to be that all the delayes are happening here

DEBUG=1 mmgetstate ?a

??..
/bin/rm -f /var/mmfs/gen/mmsdrfs.1256 /var/mmfs/tmp/allClusterNodes.mmgetstate.1256 /var/mmfs/tmp/allQuorumNodes.mmgetstate.1256 /var/mmfs/tmp/allNonQuorumNodes.mmgetstate.1256 /var/mmfs/ssl/stage/tmpKeyData.mmgetstate.1256.pub /var/mmfs/ssl/stage/tmpKeyData.mmgetstate.1256.priv /var/mmfs/ssl/stage/tmpKeyData.mmgetstate.1256.cert /var/mmfs/ssl/stage/tmpKeyData.mmgetstate.1256.keystore /var/mmfs/tmp/nodefile.mmgetstate.1256 /var/mmfs/tmp/diskfile.mmgetstate.1256 /var/mmfs/tmp/diskNamesFile.mmgetstate.1256

Any points to this it will be fixed in the near future are welcome.

Regards Renar


Renar Grunenberg
Abteilung Informatik - Betrieb

HUK-COBURG
Bahnhofsplatz
96444 Coburg
Telefon:        09561 96-44110
Telefax:        09561 96-44104
E-Mail: Renar.Grunenberg at huk-coburg.de<mailto:Renar.Grunenberg at huk-coburg.de>
Internet:       www.huk.de<http://www.huk.de>
________________________________
HUK-COBURG Haftpflicht-Unterst?tzungs-Kasse kraftfahrender Beamter Deutschlands a. G. in Coburg
Reg.-Gericht Coburg HRB 100; St.-Nr. 9212/101/00021
Sitz der Gesellschaft: Bahnhofsplatz, 96444 Coburg
Vorsitzender des Aufsichtsrats: Prof. Dr. Heinrich R. Schradin.
Vorstand: Klaus-J?rgen Heitmann (Sprecher), Stefan Gronbach, Dr. Hans Olav Her?y, Dr. J?rg Rheinl?nder (stv.), Sarah R?ssler, Daniel Thomas.
________________________________
Diese Nachricht enth?lt vertrauliche und/oder rechtlich gesch?tzte Informationen.
Wenn Sie nicht der richtige Adressat sind oder diese Nachricht irrt?mlich erhalten haben,
informieren Sie bitte sofort den Absender und vernichten Sie diese Nachricht.
Das unerlaubte Kopieren sowie die unbefugte Weitergabe dieser Nachricht ist nicht gestattet.

This information may contain confidential and/or privileged information.
If you are not the intended recipient (or have received this information in error) please notify the
sender immediately and destroy this information.
Any unauthorized copying, disclosure or distribution of the material in this information is strictly forbidden.
________________________________
Von: Grunenberg, Renar
Gesendet: Dienstag, 22. Januar 2019 18:10
An: 'gpfsug main discussion list' <gpfsug-discuss at spectrumscale.org<mailto:gpfsug-discuss at spectrumscale.org>>
Betreff: AW: [gpfsug-discuss] Spectrum Scale Cygwin cmd delays

Hallo Roger,
first thanks fort he tip. But we decided to separate the linux-io-Cluster from the Windows client only cluster, because of security requirements and ssh management requirements. We can use at this point, local named admins on Windows and use on Linux a Deamon and an separated Admin-interface Network for pwless root ssh. Your Hint seems to be CCR related or is this a Cygwin problem.
@Spectrum Scale Team:
Point1: IP V6 can?t disabled because of applications that want to use this. But the mmcmi cmd are give us already the right ipv4 adresses.
Point2. There are no DNS-Issues
Point3: We must check these.
Any recommendations to Rogers statements?

Regards Renar

Von: gpfsug-discuss-bounces at spectrumscale.org<mailto:gpfsug-discuss-bounces at spectrumscale.org> [mailto:gpfsug-discuss-bounces at spectrumscale.org] Im Auftrag von Roger Moye
Gesendet: Dienstag, 22. Januar 2019 16:43
An: gpfsug main discussion list <gpfsug-discuss at spectrumscale.org<mailto:gpfsug-discuss at spectrumscale.org>>
Betreff: Re: [gpfsug-discuss] Spectrum Scale Cygwin cmd delays

We experienced the same issue and were advised not to use Windows for quorum nodes.   We moved our Windows nodes into the storage cluster which was entirely Linux and that solved it.   If this is not an option, perhaps adding some Linux nodes to your remote cluster as quorum nodes would help.

-Roger


From: gpfsug-discuss-bounces at spectrumscale.org<mailto:gpfsug-discuss-bounces at spectrumscale.org> [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of IBM Spectrum Scale
Sent: Monday, January 21, 2019 5:35 PM
To: gpfsug main discussion list <gpfsug-discuss at spectrumscale.org<mailto:gpfsug-discuss at spectrumscale.org>>
Subject: Re: [gpfsug-discuss] Spectrum Scale Cygwin cmd delays

Hello Renar,

A few things to try:

 1.  Make sure IPv6 is disabled. On each Windows node, run "mmcmi  host  <hostname>", with <hostname> being itself and each and every node in the cluster. Make sure mmcmi prints valid IPv4 address.

 1.  To eliminate DNS issues, try adding IPv4 entries for each cluster node in "c:\windows\system32\drivers\etc\hosts".

 1.  If any anti-virus is active, disable realtime scanning on c:\cygwin64  (wherever you installed cygwin 64-bit).

You can also try debugging a script, say: (from GPFS ksh):  DEBUG=1  mmlscluster, and see what takes time.

Regards, The Spectrum Scale (GPFS) team

------------------------------------------------------------------------------------------------------------------
If you feel that your question can benefit other users of  Spectrum Scale (GPFS), then please post it to the public IBM developerWroks Forum at https://www.ibm.com/developerworks/community/forums/html/forum?id=11111111-0000-0000-0000-000000000479.

If your query concerns a potential software error in Spectrum Scale (GPFS) and you have an IBM software maintenance contract please contact  1-800-237-5511 in the United States or your local IBM Service Center in other countries.

The forum is informally monitored as time permits and should not be used for priority messages to the Spectrum Scale (GPFS) team.


From:        "Grunenberg, Renar" <Renar.Grunenberg at huk-coburg.de<mailto:Renar.Grunenberg at huk-coburg.de>>
To:        "'gpfsug-discuss at spectrumscale.org'" <gpfsug-discuss at spectrumscale.org<mailto:gpfsug-discuss at spectrumscale.org>>
Date:        01/21/2019 08:01 AM
Subject:        [gpfsug-discuss] Spectrum Scale Cygwin cmd delays
Sent by:        gpfsug-discuss-bounces at spectrumscale.org<mailto:gpfsug-discuss-bounces at spectrumscale.org>
________________________________


Hello All,

We test spectrum scale on an windows only Client-Cluster (remote mounted to a linux Cluster) but

the execution of mm commands in cygwin is very slow.

We have tried the following adjustments to increase the execution speed.


 *   We have installed Cygwin Server as a service (cygserver-config). Unfortunately, this resulted in no faster execution.
 *   Adaptation of the hosts file: 127.0.0.1localhost cygdrive wpad
to prevent any DNS problems when accessing ?/cygdrive/...?
 *   Started them as Administrator


All adjustments have so far not led to any improvement. Are there any hints to enhance the cmd execution time on windows (w2k12 actual used)


Regards Renar


Renar Grunenberg
Abteilung Informatik - Betrieb

HUK-COBURG
Bahnhofsplatz
96444 Coburg
Telefon:

09561 96-44110

Telefax:

09561 96-44104

E-Mail:

Renar.Grunenberg at huk-coburg.de<mailto:Renar.Grunenberg at huk-coburg.de>

Internet:

www.huk.de<http://www.huk.de>


________________________________
HUK-COBURG Haftpflicht-Unterst?tzungs-Kasse kraftfahrender Beamter Deutschlands a. G. in Coburg
Reg.-Gericht Coburg HRB 100; St.-Nr. 9212/101/00021
Sitz der Gesellschaft: Bahnhofsplatz, 96444 Coburg
Vorsitzender des Aufsichtsrats: Prof. Dr. Heinrich R. Schradin.
Vorstand: Klaus-J?rgen Heitmann (Sprecher), Stefan Gronbach, Dr. Hans Olav Her?y, Dr. J?rg Rheinl?nder (stv.), Sarah R?ssler, Daniel Thomas.
________________________________
Diese Nachricht enth?lt vertrauliche und/oder rechtlich gesch?tzte Informationen.
Wenn Sie nicht der richtige Adressat sind oder diese Nachricht irrt?mlich erhalten haben,
informieren Sie bitte sofort den Absender und vernichten Sie diese Nachricht.
Das unerlaubte Kopieren sowie die unbefugte Weitergabe dieser Nachricht ist nicht gestattet.

This information may contain confidential and/or privileged information.
If you are not the intended recipient (or have received this information in error) please notify the
sender immediately and destroy this information.
Any unauthorized copying, disclosure or distribution of the material in this information is strictly forbidden.
________________________________

_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss

-----------------------------------------------------------------------------------

The information in this communication and any attachment is confidential and intended solely for the attention and use of the named addressee(s). All information and opinions expressed herein are subject to change without notice. This communication is not to be construed as an offer to sell or the solicitation of an offer to buy any security. Any such offer or solicitation can only be made by means of the delivery of a confidential private offering memorandum (which should be carefully reviewed for a complete description of investment strategies and risks). Any reliance one may place on the accuracy or validity of this information is at their own risk. Past performance is not necessarily indicative of the future results of an investment. All figures are estimated and unaudited unless otherwise noted. If you are not the intended recipient, or a person responsible for delivering this to the intended recipient, you are not authorized to and must not disclose, copy, distribute, o
r retain this message or any part of it. In this case, please notify the sender immediately at 713-333-5440
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss/attachments/20190123/eff7ad74/attachment.html>

------------------------------

_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


End of gpfsug-discuss Digest, Vol 84, Issue 32
**********************************************


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20190128/fd289504/attachment.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image001.gif
Type: image/gif
Size: 105 bytes
Desc: image001.gif
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20190128/fd289504/attachment.gif>

From scale at us.ibm.com  Tue Jan 29 00:20:47 2019
From: scale at us.ibm.com (IBM Spectrum Scale)
Date: Mon, 28 Jan 2019 16:20:47 -0800
Subject: [gpfsug-discuss] Spectrum Scale Cygwin cmd delays
In-Reply-To: <528da43a668745f38d68c0a82ecb53a3@SMXRF105.msg.hukrf.de>
References: <mailman.136.1548247543.1125.gpfsug-discuss@spectrumscale.org><OFAC36337F.4D74A762-ON8525838C.0064074A-8525838C.00648091@notes.na.collabserv.com>
	<528da43a668745f38d68c0a82ecb53a3@SMXRF105.msg.hukrf.de>
Message-ID: <OF09CD7FD6.4B6C96EE-ON88258390.0081D542-88258391.0001E777@notes.na.collabserv.com>

Hello Renar,

I have WDK 8.1 installed and it does come with trace*.exe. 

Check this out:  
https://docs.microsoft.com/en-us/windows-hardware/drivers/devtest/tracefmt

If not the WDK, did you try your SDK/VisualStudio folders as indicated in 
the above link?

Nevertheless, I have uploaded trace*.exe here for you to download:  ftp 
testcase.software.ibm.com. Login as anonymous and provide your email as 
password. cd /fromibm/aix. mget trace*.exe. This site gets scrubbed often, 
hence download soon before they get deleted.

Regards, The Spectrum Scale (GPFS) team

------------------------------------------------------------------------------------------------------------------
If you feel that your question can benefit other users of  Spectrum Scale 
(GPFS), then please post it to the public IBM developerWroks Forum at 
https://www.ibm.com/developerworks/community/forums/html/forum?id=11111111-0000-0000-0000-000000000479
. 

If your query concerns a potential software error in Spectrum Scale (GPFS) 
and you have an IBM software maintenance contract please contact 
1-800-237-5511 in the United States or your local IBM Service Center in 
other countries. 

The forum is informally monitored as time permits and should not be used 
for priority messages to the Spectrum Scale (GPFS) team.


From:   "Grunenberg, Renar" <Renar.Grunenberg at huk-coburg.de>
To:     "gpfsug-discuss at spectrumscale.org" 
<gpfsug-discuss at spectrumscale.org>
Date:   01/28/2019 12:38 AM
Subject:        Re: [gpfsug-discuss] Spectrum Scale Cygwin cmd delays
Sent by:        gpfsug-discuss-bounces at spectrumscale.org


Hallo Truong Vu,
 
unfortunality the results are the same, the cmd-responce are not what we 
want. 
Ok, we want to analyze something with the trace facility and came to 
following link in the knowledge center:
https://www.ibm.com/support/knowledgecenter/en/STXKQY_5.0.2/com.ibm.spectrum.scale.v5r02.doc/bl1ins_instracsupp.htm
 
The docu mentioned that we must copy to windows files, tracefmt.exe and 
tracelog.exe, but the first one are only available in the DDK-Version 7.1 
(W2K3), not in the WDK Version 8 or 10. We use W2K12. Can you clarify 
where I can find the mentioned files.
 
Regards Renar.
 
Renar Grunenberg
Abteilung Informatik - Betrieb

HUK-COBURG
Bahnhofsplatz
96444 Coburg
Telefon:
09561 96-44110
Telefax:
09561 96-44104
E-Mail:
Renar.Grunenberg at huk-coburg.de
Internet:
www.huk.de
HUK-COBURG Haftpflicht-Unterst?tzungs-Kasse kraftfahrender Beamter 
Deutschlands a. G. in Coburg
Reg.-Gericht Coburg HRB 100; St.-Nr. 9212/101/00021
Sitz der Gesellschaft: Bahnhofsplatz, 96444 Coburg
Vorsitzender des Aufsichtsrats: Prof. Dr. Heinrich R. Schradin.
Vorstand: Klaus-J?rgen Heitmann (Sprecher), Stefan Gronbach, Dr. Hans Olav 
Her?y, Dr. J?rg Rheinl?nder (stv.), Sarah R?ssler, Daniel Thomas.
Diese Nachricht enth?lt vertrauliche und/oder rechtlich gesch?tzte 
Informationen.
Wenn Sie nicht der richtige Adressat sind oder diese Nachricht irrt?mlich 
erhalten haben,
informieren Sie bitte sofort den Absender und vernichten Sie diese 
Nachricht.
Das unerlaubte Kopieren sowie die unbefugte Weitergabe dieser Nachricht 
ist nicht gestattet.

This information may contain confidential and/or privileged information.
If you are not the intended recipient (or have received this information 
in error) please notify the
sender immediately and destroy this information.
Any unauthorized copying, disclosure or distribution of the material in 
this information is strictly forbidden.

Von: gpfsug-discuss-bounces at spectrumscale.org [
mailto:gpfsug-discuss-bounces at spectrumscale.org] Im Auftrag von Truong Vu
Gesendet: Donnerstag, 24. Januar 2019 19:18
An: gpfsug-discuss at spectrumscale.org
Betreff: Re: [gpfsug-discuss] Spectrum Scale Cygwin cmd delays
 
Hi Renar,

Let's see if it is really the /bin/rm is the problem here. Can you run the 
command again without cleanup the temp files as follow:

DEBUG=1 keepTempFiles=1 mmgetstate -a

Thanks,
Tru.


gpfsug-discuss-request---01/23/2019 07:46:30 AM---Send gpfsug-discuss 
mailing list submissions to gpfsug-discuss at spectrumscale.org

From: gpfsug-discuss-request at spectrumscale.org
To: gpfsug-discuss at spectrumscale.org
Date: 01/23/2019 07:46 AM
Subject: gpfsug-discuss Digest, Vol 84, Issue 32
Sent by: gpfsug-discuss-bounces at spectrumscale.org


Send gpfsug-discuss mailing list submissions to
gpfsug-discuss at spectrumscale.org

To subscribe or unsubscribe via the World Wide Web, visit
http://gpfsug.org/mailman/listinfo/gpfsug-discuss
or, via email, send a message with subject or body 'help' to
gpfsug-discuss-request at spectrumscale.org

You can reach the person managing the list at
gpfsug-discuss-owner at spectrumscale.org

When replying, please edit your Subject line so it is more specific
than "Re: Contents of gpfsug-discuss digest..."


Today's Topics:

  1. Re: Spectrum Scale Cygwin cmd delays (Grunenberg, Renar)


----------------------------------------------------------------------

Message: 1
Date: Wed, 23 Jan 2019 12:45:39 +0000
From: "Grunenberg, Renar" <Renar.Grunenberg at huk-coburg.de>
To: 'gpfsug main discussion list' <gpfsug-discuss at spectrumscale.org>
Subject: Re: [gpfsug-discuss] Spectrum Scale Cygwin cmd delays
Message-ID: <349cb338583a4c1d996677837fc65b6e at SMXRF105.msg.hukrf.de>
Content-Type: text/plain; charset="utf-8"

Hallo All,

as a point to the problem, it seems to be that all the delayes are 
happening here

DEBUG=1 mmgetstate ?a

??..
/bin/rm -f /var/mmfs/gen/mmsdrfs.1256 
/var/mmfs/tmp/allClusterNodes.mmgetstate.1256 
/var/mmfs/tmp/allQuorumNodes.mmgetstate.1256 
/var/mmfs/tmp/allNonQuorumNodes.mmgetstate.1256 
/var/mmfs/ssl/stage/tmpKeyData.mmgetstate.1256.pub 
/var/mmfs/ssl/stage/tmpKeyData.mmgetstate.1256.priv 
/var/mmfs/ssl/stage/tmpKeyData.mmgetstate.1256.cert 
/var/mmfs/ssl/stage/tmpKeyData.mmgetstate.1256.keystore 
/var/mmfs/tmp/nodefile.mmgetstate.1256 
/var/mmfs/tmp/diskfile.mmgetstate.1256 
/var/mmfs/tmp/diskNamesFile.mmgetstate.1256

Any points to this it will be fixed in the near future are welcome.

Regards Renar


Renar Grunenberg
Abteilung Informatik - Betrieb

HUK-COBURG
Bahnhofsplatz
96444 Coburg
Telefon:        09561 96-44110
Telefax:        09561 96-44104
E-Mail: Renar.Grunenberg at huk-coburg.de
Internet:       www.huk.de
________________________________
HUK-COBURG Haftpflicht-Unterst?tzungs-Kasse kraftfahrender Beamter 
Deutschlands a. G. in Coburg
Reg.-Gericht Coburg HRB 100; St.-Nr. 9212/101/00021
Sitz der Gesellschaft: Bahnhofsplatz, 96444 Coburg
Vorsitzender des Aufsichtsrats: Prof. Dr. Heinrich R. Schradin.
Vorstand: Klaus-J?rgen Heitmann (Sprecher), Stefan Gronbach, Dr. Hans Olav 
Her?y, Dr. J?rg Rheinl?nder (stv.), Sarah R?ssler, Daniel Thomas.
________________________________
Diese Nachricht enth?lt vertrauliche und/oder rechtlich gesch?tzte 
Informationen.
Wenn Sie nicht der richtige Adressat sind oder diese Nachricht irrt?mlich 
erhalten haben,
informieren Sie bitte sofort den Absender und vernichten Sie diese 
Nachricht.
Das unerlaubte Kopieren sowie die unbefugte Weitergabe dieser Nachricht 
ist nicht gestattet.

This information may contain confidential and/or privileged information.
If you are not the intended recipient (or have received this information 
in error) please notify the
sender immediately and destroy this information.
Any unauthorized copying, disclosure or distribution of the material in 
this information is strictly forbidden.
________________________________
Von: Grunenberg, Renar
Gesendet: Dienstag, 22. Januar 2019 18:10
An: 'gpfsug main discussion list' <gpfsug-discuss at spectrumscale.org>
Betreff: AW: [gpfsug-discuss] Spectrum Scale Cygwin cmd delays

Hallo Roger,
first thanks fort he tip. But we decided to separate the linux-io-Cluster 
from the Windows client only cluster, because of security requirements and 
ssh management requirements. We can use at this point, local named admins 
on Windows and use on Linux a Deamon and an separated Admin-interface 
Network for pwless root ssh. Your Hint seems to be CCR related or is this 
a Cygwin problem.
@Spectrum Scale Team:
Point1: IP V6 can?t disabled because of applications that want to use 
this. But the mmcmi cmd are give us already the right ipv4 adresses.
Point2. There are no DNS-Issues
Point3: We must check these.
Any recommendations to Rogers statements?

Regards Renar

Von: gpfsug-discuss-bounces at spectrumscale.org<
mailto:gpfsug-discuss-bounces at spectrumscale.org> [
mailto:gpfsug-discuss-bounces at spectrumscale.org] Im Auftrag von Roger Moye
Gesendet: Dienstag, 22. Januar 2019 16:43
An: gpfsug main discussion list <gpfsug-discuss at spectrumscale.org<
mailto:gpfsug-discuss at spectrumscale.org>>
Betreff: Re: [gpfsug-discuss] Spectrum Scale Cygwin cmd delays

We experienced the same issue and were advised not to use Windows for 
quorum nodes.   We moved our Windows nodes into the storage cluster which 
was entirely Linux and that solved it.   If this is not an option, perhaps 
adding some Linux nodes to your remote cluster as quorum nodes would help.

-Roger


From: gpfsug-discuss-bounces at spectrumscale.org<
mailto:gpfsug-discuss-bounces at spectrumscale.org> [
mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of IBM Spectrum 
Scale
Sent: Monday, January 21, 2019 5:35 PM
To: gpfsug main discussion list <gpfsug-discuss at spectrumscale.org<
mailto:gpfsug-discuss at spectrumscale.org>>
Subject: Re: [gpfsug-discuss] Spectrum Scale Cygwin cmd delays

Hello Renar,

A few things to try:

 1.  Make sure IPv6 is disabled. On each Windows node, run "mmcmi  host 
<hostname>", with <hostname> being itself and each and every node in the 
cluster. Make sure mmcmi prints valid IPv4 address.

 1.  To eliminate DNS issues, try adding IPv4 entries for each cluster 
node in "c:\windows\system32\drivers\etc\hosts".

 1.  If any anti-virus is active, disable realtime scanning on c:\cygwin64 
 (wherever you installed cygwin 64-bit).

You can also try debugging a script, say: (from GPFS ksh):  DEBUG=1 
mmlscluster, and see what takes time.

Regards, The Spectrum Scale (GPFS) team

------------------------------------------------------------------------------------------------------------------
If you feel that your question can benefit other users of  Spectrum Scale 
(GPFS), then please post it to the public IBM developerWroks Forum at 
https://www.ibm.com/developerworks/community/forums/html/forum?id=11111111-0000-0000-0000-000000000479
.

If your query concerns a potential software error in Spectrum Scale (GPFS) 
and you have an IBM software maintenance contract please contact 
1-800-237-5511 in the United States or your local IBM Service Center in 
other countries.

The forum is informally monitored as time permits and should not be used 
for priority messages to the Spectrum Scale (GPFS) team.


From:        "Grunenberg, Renar" <Renar.Grunenberg at huk-coburg.de<
mailto:Renar.Grunenberg at huk-coburg.de>>
To:        "'gpfsug-discuss at spectrumscale.org'" 
<gpfsug-discuss at spectrumscale.org<mailto:gpfsug-discuss at spectrumscale.org
>>
Date:        01/21/2019 08:01 AM
Subject:        [gpfsug-discuss] Spectrum Scale Cygwin cmd delays
Sent by:        gpfsug-discuss-bounces at spectrumscale.org<
mailto:gpfsug-discuss-bounces at spectrumscale.org>
________________________________


Hello All,

We test spectrum scale on an windows only Client-Cluster (remote mounted 
to a linux Cluster) but

the execution of mm commands in cygwin is very slow.

We have tried the following adjustments to increase the execution speed.


 *   We have installed Cygwin Server as a service (cygserver-config). 
Unfortunately, this resulted in no faster execution.
 *   Adaptation of the hosts file: 127.0.0.1localhost cygdrive wpad
to prevent any DNS problems when accessing ?/cygdrive/...?
 *   Started them as Administrator


All adjustments have so far not led to any improvement. Are there any 
hints to enhance the cmd execution time on windows (w2k12 actual used)


Regards Renar


Renar Grunenberg
Abteilung Informatik - Betrieb

HUK-COBURG
Bahnhofsplatz
96444 Coburg
Telefon:

09561 96-44110

Telefax:

09561 96-44104

E-Mail:

Renar.Grunenberg at huk-coburg.de<mailto:Renar.Grunenberg at huk-coburg.de>

Internet:

www.huk.de


________________________________
HUK-COBURG Haftpflicht-Unterst?tzungs-Kasse kraftfahrender Beamter 
Deutschlands a. G. in Coburg
Reg.-Gericht Coburg HRB 100; St.-Nr. 9212/101/00021
Sitz der Gesellschaft: Bahnhofsplatz, 96444 Coburg
Vorsitzender des Aufsichtsrats: Prof. Dr. Heinrich R. Schradin.
Vorstand: Klaus-J?rgen Heitmann (Sprecher), Stefan Gronbach, Dr. Hans Olav 
Her?y, Dr. J?rg Rheinl?nder (stv.), Sarah R?ssler, Daniel Thomas.
________________________________
Diese Nachricht enth?lt vertrauliche und/oder rechtlich gesch?tzte 
Informationen.
Wenn Sie nicht der richtige Adressat sind oder diese Nachricht irrt?mlich 
erhalten haben,
informieren Sie bitte sofort den Absender und vernichten Sie diese 
Nachricht.
Das unerlaubte Kopieren sowie die unbefugte Weitergabe dieser Nachricht 
ist nicht gestattet.

This information may contain confidential and/or privileged information.
If you are not the intended recipient (or have received this information 
in error) please notify the
sender immediately and destroy this information.
Any unauthorized copying, disclosure or distribution of the material in 
this information is strictly forbidden.
________________________________

_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss

-----------------------------------------------------------------------------------

The information in this communication and any attachment is confidential 
and intended solely for the attention and use of the named addressee(s). 
All information and opinions expressed herein are subject to change 
without notice. This communication is not to be construed as an offer to 
sell or the solicitation of an offer to buy any security. Any such offer 
or solicitation can only be made by means of the delivery of a 
confidential private offering memorandum (which should be carefully 
reviewed for a complete description of investment strategies and risks). 
Any reliance one may place on the accuracy or validity of this information 
is at their own risk. Past performance is not necessarily indicative of 
the future results of an investment. All figures are estimated and 
unaudited unless otherwise noted. If you are not the intended recipient, 
or a person responsible for delivering this to the intended recipient, you 
are not authorized to and must not disclose, copy, distribute, o
r retain this message or any part of it. In this case, please notify the 
sender immediately at 713-333-5440
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <
http://gpfsug.org/pipermail/gpfsug-discuss/attachments/20190123/eff7ad74/attachment.html
>

------------------------------

_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


End of gpfsug-discuss Digest, Vol 84, Issue 32
**********************************************


_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=IbxtjdkPAM2Sbon4Lbbi4w&m=_PEp_I-F3uzCglEj5raDY1xo2-W6myUCIX1ysChh0lo&s=k9JU3wc7KoJj1VWVVSjjAekQcIEfeJazMkT3BBME-SY&e=


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20190128/a3e70929/attachment.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: image/gif
Size: 105 bytes
Desc: not available
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20190128/a3e70929/attachment.gif>

From cblack at nygenome.org  Tue Jan 29 17:23:49 2019
From: cblack at nygenome.org (Christopher Black)
Date: Tue, 29 Jan 2019 17:23:49 +0000
Subject: [gpfsug-discuss] Querying size of snapshots
Message-ID: <51DD7A04-B0EE-4FDE-8FF6-E888A2C51E23@nygenome.org>

We have some large filesets (PB+) and filesystems where I would like to monitor delete rates and estimate how much space we will get back as snapshots expire. We only keep 3-4 daily snapshots on this filesystem due to churn.
I?ve tried to query the sizes of snapshots using the following command:
mmlssnapshot fsname -d --block-size 1T

However, this has run for over an hour without producing any results. Metadata is all on flash and I?m not sure why this is taking so long.

Does anyone have any insight on this or alternate methods for getting estimates of snapshot sizes?

Best,
Chris

PS I am aware of the warning in docs about the -d option.
________________________________
This message is for the recipient?s use only, and may contain confidential, privileged or protected information. Any unauthorized use or dissemination of this communication is prohibited. If you received this message in error, please immediately notify the sender and destroy all copies of this message. The recipient should check this email and any attachments for the presence of viruses, as we accept no liability for any damage caused by any virus transmitted by this email.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20190129/3497aa26/attachment.htm>

From makaplan at us.ibm.com  Tue Jan 29 18:24:17 2019
From: makaplan at us.ibm.com (Marc A Kaplan)
Date: Tue, 29 Jan 2019 15:24:17 -0300
Subject: [gpfsug-discuss] Querying size of snapshots
In-Reply-To: <51DD7A04-B0EE-4FDE-8FF6-E888A2C51E23@nygenome.org>
References: <51DD7A04-B0EE-4FDE-8FF6-E888A2C51E23@nygenome.org>
Message-ID: <OFB24B7729.036C0088-ON03258391.00641C0B-03258391.006518F2@notes.na.collabserv.com>

1. First off, let's RTFM ...

-d Displays the amount of storage that is used by the snapshot.
This operation requires an amount of time that is proportional to the size 
of the file system; therefore,
it can take several minutes or even hours on a large and heavily-loaded 
file system.
This optional parameter can impact overall system performance. Avoid 
running the mmlssnapshot
command with this parameter frequently or during periods of high file 
system activity.

SOOOO.. there's that. 

2. Next you may ask, HOW is that?

Snapshots are maintained with a "COW" strategy -- They are created 
quickly, essentially just making a record that the snapshot was created 
and at such and such time -- when the snapshot is the same as the "live" 
filesystem...

Then over time, each change to a block of data in live system requires 
that a copy is made of the old data block and that is associated with the 
most recently created snapshot....   SO, as more and more changes are made 
to different blocks over time the snapshot becomes bigger and bigger. How 
big? Well it seems the current implementation does not keep a "simple 
counter" of the number of blocks -- but rather, a list of the blocks that 
were COW'ed.... So when you come and ask "How big"... GPFS has to go 
traverse the file sytem metadata and count those COW'ed blocks....

3. So why not keep a counter?  Well, it's likely not so simple. For 
starters GPFS is typically running concurrently on several or many 
nodes...  And probably was not deemed worth the effort ..... IF a 
convincing case could be made, I'd bet there is a way... to at least keep 
approximate numbers, log records, exact updates periodically, etc, etc -- 
similar to the way space allocation and accounting is done for the live 
file system...


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20190129/b7a5e327/attachment.htm>

From cblack at nygenome.org  Tue Jan 29 18:43:24 2019
From: cblack at nygenome.org (Christopher Black)
Date: Tue, 29 Jan 2019 18:43:24 +0000
Subject: [gpfsug-discuss] Querying size of snapshots
In-Reply-To: <OFB24B7729.036C0088-ON03258391.00641C0B-03258391.006518F2@notes.na.collabserv.com>
References: <51DD7A04-B0EE-4FDE-8FF6-E888A2C51E23@nygenome.org>
	<OFB24B7729.036C0088-ON03258391.00641C0B-03258391.006518F2@notes.na.collabserv.com>
Message-ID: <369E2850-0335-468D-9F86-B18D4576A04C@nygenome.org>

Thanks for the quick and detailed reply! I had read the manual and was aware of the warnings about -d (mentioned in my PS).
On systems with high churn (lots of temporary files, lots of big and small deletes along with many new files), I?ve previously used estimates of snapshot size as a useful signal on whether we can expect to see an increase in available space over the next few days as snapshots expire. I?ve used this technique on a few different more mainstream storage systems, but never on gpfs.
I?d find it useful to have a similar way to monitor ?space to be freed pending snapshot deletes? on gpfs. It sounds like there is not an existing solution for this so it would be a request for enhancement.
I?m not sure how much overhead there would be keeping a running counter for blocks changed since snapshot creation or if that would completely fall apart on large systems or systems with many snapshots. If that is a consideration even having only an estimate for the oldest snapshot would be useful, but I realize that can depend on all the other later snapshots as well. Perhaps an overall ?size of all snapshots? would be easier to manage and would still be useful to us.
I don?t need this number to be 100% accurate, but a low or floor estimate would be very useful.

Is anyone else interested in this? Do other people have other ways to estimate how much space they will get back as snapshots expire? Is there a more efficient way of making such an estimate available to admins other than running an mmlssnapshot -d every night and recording the output?

Thanks all!
Chris

From: <gpfsug-discuss-bounces at spectrumscale.org> on behalf of Marc A Kaplan <makaplan at us.ibm.com>
Reply-To: gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
Date: Tuesday, January 29, 2019 at 1:24 PM
To: gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
Subject: Re: [gpfsug-discuss] Querying size of snapshots

1. First off, let's RTFM ...

-d Displays the amount of storage that is used by the snapshot.
This operation requires an amount of time that is proportional to the size of the file system; therefore,
it can take several minutes or even hours on a large and heavily-loaded file system.
This optional parameter can impact overall system performance. Avoid running the mmlssnapshot
command with this parameter frequently or during periods of high file system activity.

SOOOO.. there's that.

2. Next you may ask, HOW is that?

Snapshots are maintained with a "COW" strategy -- They are created quickly, essentially just making a record that the snapshot was created and at such and such time -- when the snapshot is the same as the "live" filesystem...

Then over time, each change to a block of data in live system requires that a copy is made of the old data block and that is associated with the most recently created snapshot....   SO, as more and more changes are made to different blocks over time the snapshot becomes bigger and bigger.   How big? Well it seems the current implementation does not keep a "simple counter" of the number of blocks -- but rather, a list of the blocks that were COW'ed.... So when you come and ask "How big"... GPFS has to go traverse the file sytem metadata and count those COW'ed blocks....

3. So why not keep a counter?  Well, it's likely not so simple. For starters GPFS is typically running concurrently on several or many nodes...  And probably was not deemed worth the effort ..... IF a convincing case could be made, I'd bet there is a way... to at least keep approximate numbers, log records, exact updates periodically, etc, etc -- similar to the way space allocation and accounting is done for the live file system...


________________________________
This message is for the recipient?s use only, and may contain confidential, privileged or protected information. Any unauthorized use or dissemination of this communication is prohibited. If you received this message in error, please immediately notify the sender and destroy all copies of this message. The recipient should check this email and any attachments for the presence of viruses, as we accept no liability for any damage caused by any virus transmitted by this email.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20190129/8ae420bf/attachment.htm>

From janfrode at tanso.net  Tue Jan 29 19:19:12 2019
From: janfrode at tanso.net (Jan-Frode Myklebust)
Date: Tue, 29 Jan 2019 20:19:12 +0100
Subject: [gpfsug-discuss] Querying size of snapshots
In-Reply-To: <369E2850-0335-468D-9F86-B18D4576A04C@nygenome.org>
References: <51DD7A04-B0EE-4FDE-8FF6-E888A2C51E23@nygenome.org>
	<OFB24B7729.036C0088-ON03258391.00641C0B-03258391.006518F2@notes.na.collabserv.com>
	<369E2850-0335-468D-9F86-B18D4576A04C@nygenome.org>
Message-ID: <CAHwPathC3Ku6K-9=8xhN6Dfj_aFgL42ODo7s+KQuGugriP6cPA@mail.gmail.com>

You could put snapshot data in a separate storage pool. Then it should be
visible how much space it occupies, but it?s a bit hard to see how this
will be usable/manageable..


-jf
tir. 29. jan. 2019 kl. 20:08 skrev Christopher Black <cblack at nygenome.org>:

> Thanks for the quick and detailed reply! I had read the manual and was
> aware of the warnings about -d (mentioned in my PS).
>
> On systems with high churn (lots of temporary files, lots of big and small
> deletes along with many new files), I?ve previously used estimates of
> snapshot size as a useful signal on whether we can expect to see an
> increase in available space over the next few days as snapshots expire.
> I?ve used this technique on a few different more mainstream storage
> systems, but never on gpfs.
>
> I?d find it useful to have a similar way to monitor ?space to be freed
> pending snapshot deletes? on gpfs. It sounds like there is not an existing
> solution for this so it would be a request for enhancement.
>
> I?m not sure how much overhead there would be keeping a running counter
> for blocks changed since snapshot creation or if that would completely fall
> apart on large systems or systems with many snapshots. If that is a
> consideration even having only an estimate for the oldest snapshot would be
> useful, but I realize that can depend on all the other later snapshots as
> well. Perhaps an overall ?size of all snapshots? would be easier to manage
> and would still be useful to us.
>
> I don?t need this number to be 100% accurate, but a low or floor estimate
> would be very useful.
>
>
>
> Is anyone else interested in this? Do other people have other ways to
> estimate how much space they will get back as snapshots expire? Is there a
> more efficient way of making such an estimate available to admins other
> than running an mmlssnapshot -d every night and recording the output?
>
>
>
> Thanks all!
>
> Chris
>
>
>
> *From: *<gpfsug-discuss-bounces at spectrumscale.org> on behalf of Marc A
> Kaplan <makaplan at us.ibm.com>
> *Reply-To: *gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
> *Date: *Tuesday, January 29, 2019 at 1:24 PM
> *To: *gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
> *Subject: *Re: [gpfsug-discuss] Querying size of snapshots
>
>
>
> 1. First off, let's RTFM ...
>
> *-d *Displays the amount of storage that is used by the snapshot.
> This operation requires an amount of time that is proportional to the size
> of the file system; therefore,
> it can take several minutes or even hours on a large and heavily-loaded
> file system.
> This optional parameter can impact overall system performance. Avoid
> running the * mmlssnapshot*
> command with this parameter frequently or during periods of high file
> system activity.
>
> SOOOO.. there's that.
>
> 2. Next you may ask, HOW is that?
>
> Snapshots are maintained with a "COW" strategy -- They are created
> quickly, essentially just making a record that the snapshot was created and
> at such and such time -- when the snapshot is the same as the "live"
> filesystem...
>
> Then over time, each change to a block of data in live system requires
> that a copy is made of the old data block and that is associated with the
> most recently created snapshot....   SO, as more and more changes are made
> to different blocks over time the snapshot becomes bigger and bigger.   How
> big? Well it seems the current implementation does not keep a "simple
> counter" of the number of blocks -- but rather, a list of the blocks that
> were COW'ed.... So when you come and ask "How big"... GPFS has to go
> traverse the file sytem metadata and count those COW'ed blocks....
>
> 3. So why not keep a counter?  Well, it's likely not so simple. For
> starters GPFS is typically running concurrently on several or many
> nodes...  And probably was not deemed worth the effort ..... IF a
> convincing case could be made, I'd bet there is a way... to at least keep
> approximate numbers, log records, exact updates periodically, etc, etc --
> similar to the way space allocation and accounting is done for the live
> file system...
>
>
> ------------------------------
> This message is for the recipient?s use only, and may contain
> confidential, privileged or protected information. Any unauthorized use or
> dissemination of this communication is prohibited. If you received this
> message in error, please immediately notify the sender and destroy all
> copies of this message. The recipient should check this email and any
> attachments for the presence of viruses, as we accept no liability for any
> damage caused by any virus transmitted by this email.
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20190129/1234a522/attachment.htm>

From olaf.weiser at de.ibm.com  Tue Jan 29 21:37:08 2019
From: olaf.weiser at de.ibm.com (Olaf Weiser)
Date: Tue, 29 Jan 2019 22:37:08 +0100
Subject: [gpfsug-discuss] Querying size of snapshots
In-Reply-To: <CAHwPathC3Ku6K-9=8xhN6Dfj_aFgL42ODo7s+KQuGugriP6cPA@mail.gmail.com>
References: <51DD7A04-B0EE-4FDE-8FF6-E888A2C51E23@nygenome.org><OFB24B7729.036C0088-ON03258391.00641C0B-03258391.006518F2@notes.na.collabserv.com><369E2850-0335-468D-9F86-B18D4576A04C@nygenome.org>
	<CAHwPathC3Ku6K-9=8xhN6Dfj_aFgL42ODo7s+KQuGugriP6cPA@mail.gmail.com>
Message-ID: <OFD4231458.8717C34A-ONC1258391.00768225-C1258391.0076C1B4@notes.na.collabserv.com>

An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20190129/6257e599/attachment.htm>

From alvise.dorigo at psi.ch  Wed Jan 30 13:16:22 2019
From: alvise.dorigo at psi.ch (Dorigo Alvise (PSI))
Date: Wed, 30 Jan 2019 13:16:22 +0000
Subject: [gpfsug-discuss] Unbalanced pdisk free space
Message-ID: <83A6EEB0EC738F459A39439733AE8045267DF159@MBX114.d.ethz.ch>

Hello,
I've a Lenovo Spectrum Scale system DSS-G220 (software dss-g-2.0a) composed of
2x x3560 M5 IO server nodes
1x x3550 M5 client/support node
2x disk enclosures D3284
GPFS/GNR 4.2.3-7

Can anybody tell me if it is normal that all the pdisks of both my recovery groups, residing on the same physical enclosure have free space equal to (more or less) 1/3 of the free space of the pdisks residing on the other physical enclosure (see attached text files for the command line output) ?

I guess when the least free disks are fully occupied (while the others are still partially free) write performance will drop by a factor of two. Correct ?
Is there a way (considering that the system is in production) to fix (rebalance) this free space among all pdisk of both enclosures ?

Should I open a PMR to IBM ?

Many thanks,

   Alvise


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20190130/13fcf416/attachment.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: rg1
Type: application/octet-stream
Size: 13340 bytes
Desc: rg1
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20190130/13fcf416/attachment.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: rg2
Type: application/octet-stream
Size: 13340 bytes
Desc: rg2
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20190130/13fcf416/attachment-0001.obj>

From abeattie at au1.ibm.com  Wed Jan 30 14:53:47 2019
From: abeattie at au1.ibm.com (Andrew Beattie)
Date: Wed, 30 Jan 2019 14:53:47 +0000
Subject: [gpfsug-discuss] Unbalanced pdisk free space
In-Reply-To: <83A6EEB0EC738F459A39439733AE8045267DF159@MBX114.d.ethz.ch>
References: <83A6EEB0EC738F459A39439733AE8045267DF159@MBX114.d.ethz.ch>
Message-ID: <OF5CB03F8F.FA2BF6CB-ON00258392.0051A73E-00258392.0051D44E@notes.na.collabserv.com>

An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20190130/f7fd19d1/attachment.htm>

From scale at us.ibm.com  Wed Jan 30 20:25:20 2019
From: scale at us.ibm.com (IBM Spectrum Scale)
Date: Wed, 30 Jan 2019 15:25:20 -0500
Subject: [gpfsug-discuss] Unbalanced pdisk free space
In-Reply-To: <83A6EEB0EC738F459A39439733AE8045267DF159@MBX114.d.ethz.ch>
References: <83A6EEB0EC738F459A39439733AE8045267DF159@MBX114.d.ethz.ch>
Message-ID: <OF6D3E0211.DC0CF7C4-ON85258392.006EB6C4-85258392.00702F0C@notes.na.collabserv.com>

Alvise,

Could you send us the output of the following commands from both server 
nodes.
mmfsadm dump nspdclient > /tmp/dump_nspdclient.<nodeName> 
mmfsadm dump pdisk   > /tmp/dump_pdisk.<nodeName>

Regards, The Spectrum Scale (GPFS) team

------------------------------------------------------------------------------------------------------------------
If you feel that your question can benefit other users of  Spectrum Scale 
(GPFS), then please post it to the public IBM developerWroks Forum at 
https://www.ibm.com/developerworks/community/forums/html/forum?id=11111111-0000-0000-0000-000000000479
. 

If your query concerns a potential software error in Spectrum Scale (GPFS) 
and you have an IBM software maintenance contract please contact 
1-800-237-5511 in the United States or your local IBM Service Center in 
other countries. 

The forum is informally monitored as time permits and should not be used 
for priority messages to the Spectrum Scale (GPFS) team.


From:   "Dorigo Alvise (PSI)" <alvise.dorigo at psi.ch>
To:     "gpfsug-discuss at spectrumscale.org" 
<gpfsug-discuss at spectrumscale.org>
Date:   01/30/2019 08:24 AM
Subject:        [gpfsug-discuss] Unbalanced pdisk free space
Sent by:        gpfsug-discuss-bounces at spectrumscale.org


Hello,
I've a Lenovo Spectrum Scale system DSS-G220 (software dss-g-2.0a) 
composed of
2x x3560 M5 IO server nodes
1x x3550 M5 client/support node
2x disk enclosures D3284
GPFS/GNR 4.2.3-7

Can anybody tell me if it is normal that all the pdisks of both my 
recovery groups, residing on the same physical enclosure have free space 
equal to (more or less) 1/3 of the free space of the pdisks residing on 
the other physical enclosure (see attached text files for the command line 
output) ?

I guess when the least free disks are fully occupied (while the others are 
still partially free) write performance will drop by a factor of two. 
Correct ? 
Is there a way (considering that the system is in production) to fix 
(rebalance) this free space among all pdisk of both enclosures ?

Should I open a PMR to IBM ?

Many thanks,

   Alvise

[attachment "rg1" deleted by Brian Herr/Poughkeepsie/IBM] [attachment 
"rg2" deleted by Brian Herr/Poughkeepsie/IBM] 
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=IbxtjdkPAM2Sbon4Lbbi4w&m=QDZ-afehEgpYi3JGRd8q6rHgo4rb8gVu_VKQwg4MwEs&s=5bEFHRU7zk-nRK_d20vJBngQOOkSLWT1vvtcDNKD584&e=


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20190130/2803fac4/attachment.htm>

From Robert.Oesterlin at nuance.com  Wed Jan 30 20:51:49 2019
From: Robert.Oesterlin at nuance.com (Oesterlin, Robert)
Date: Wed, 30 Jan 2019 20:51:49 +0000
Subject: [gpfsug-discuss] =?utf-8?q?Node_=E2=80=98crash_and_restart?=
 =?utf-8?q?=E2=80=99_event_using_GPFS_callback=3F?=
Message-ID: <7F7F8D51-256D-4EA7-B03F-11CF7B752AFA@nuance.com>

Anyone crafted a good way to detect a node ?crash and restart? event using GPFS callbacks? I?m thinking ?preShutdown? but I?m not sure if that?s the best. What I?m really looking for is did the node shutdown (abort) and create a dump in /tmp/mmfs


Bob Oesterlin
Sr Principal Storage Engineer, Nuance

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20190130/eb8b321b/attachment.htm>

From Paul.Sanchez at deshaw.com  Wed Jan 30 21:02:26 2019
From: Paul.Sanchez at deshaw.com (Sanchez, Paul)
Date: Wed, 30 Jan 2019 21:02:26 +0000
Subject: [gpfsug-discuss] =?utf-8?q?Node_=E2=80=98crash_and_restart?=
 =?utf-8?q?=E2=80=99_event_using_GPFS_callback=3F?=
In-Reply-To: <7F7F8D51-256D-4EA7-B03F-11CF7B752AFA@nuance.com>
References: <7F7F8D51-256D-4EA7-B03F-11CF7B752AFA@nuance.com>
Message-ID: <9892d4f382cb4fbe80dbde9f87724632@mbxtoa1.winmail.deshaw.com>

There are some cases which I don?t believe can be caught with callbacks (e.g. DMS = Dead Man Switch).  But you could possibly use preStartup to check the host uptime to make an assumption if GPFS was restarted long after the host booted.  You could also peek in /tmp/mmfs and only report if you find something there.  That said, the docs say that preStartup fires after the node joins the cluster.  So if that means once the node is ?active? then you might miss out on nodes stuck in ?arbitrating? for a while due to a waiter problem.

We run a script with cron which monitors the myriad things which can go wrong and attempt to right those which are safe to fix, and raise alerts appropriately.  Something like that, outside the reach of GPFS, is often a good choice if you don?t need to know something the moment it happens.

Thx
Paul

From: gpfsug-discuss-bounces at spectrumscale.org <gpfsug-discuss-bounces at spectrumscale.org> On Behalf Of Oesterlin, Robert
Sent: Wednesday, January 30, 2019 3:52 PM
To: gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
Subject: [gpfsug-discuss] Node ?crash and restart? event using GPFS callback?

Anyone crafted a good way to detect a node ?crash and restart? event using GPFS callbacks? I?m thinking ?preShutdown? but I?m not sure if that?s the best. What I?m really looking for is did the node shutdown (abort) and create a dump in /tmp/mmfs


Bob Oesterlin
Sr Principal Storage Engineer, Nuance

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20190130/c6aedd86/attachment.htm>

From makaplan at us.ibm.com  Wed Jan 30 21:16:51 2019
From: makaplan at us.ibm.com (Marc A Kaplan)
Date: Wed, 30 Jan 2019 18:16:51 -0300
Subject: [gpfsug-discuss] =?utf-8?q?Node_=E2=80=98crash_and_restart?=
 =?utf-8?q?=E2=80=99_event_using_GPFS_callback=3F?=
In-Reply-To: <7F7F8D51-256D-4EA7-B03F-11CF7B752AFA@nuance.com>
References: <7F7F8D51-256D-4EA7-B03F-11CF7B752AFA@nuance.com>
Message-ID: <OFA150EA60.68C10849-ON03258392.0074BEA7-03258392.0074E49F@notes.na.collabserv.com>

We have (pre)shutdown and pre(startup) ...
Trap  and record both... If you see a startup without a matching shutdown 
you know the shutdown never happened, because GPFS crashed.


From:   "Oesterlin, Robert" <Robert.Oesterlin at nuance.com>
To:     gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
Date:   01/30/2019 05:52 PM
Subject:        [gpfsug-discuss] Node ?crash and restart? event using GPFS 
callback?
Sent by:        gpfsug-discuss-bounces at spectrumscale.org


Anyone crafted a good way to detect a node ?crash and restart? event using 
GPFS callbacks? I?m thinking ?preShutdown? but I?m not sure if that?s the 
best. What I?m really looking for is did the node shutdown (abort) and 
create a dump in /tmp/mmfs
 
 
Bob Oesterlin
Sr Principal Storage Engineer, Nuance
 _______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=cvpnBBH0j41aQy0RPiG2xRL_M8mTc1izuQD3_PmtjZ8&m=oBQHDWo5PVKthJjmbVrQyqSrkuFZEcMQb_tXtvcKepE&s=HfF_wArTvc-i4wLfATXbwrImRT-w0mKG8mhctBJFLCI&e=


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20190130/7497d385/attachment.htm>

From Dwayne.Hart at med.mun.ca  Wed Jan 30 21:52:48 2019
From: Dwayne.Hart at med.mun.ca (Dwayne.Hart at med.mun.ca)
Date: Wed, 30 Jan 2019 21:52:48 +0000
Subject: [gpfsug-discuss]
 =?windows-1252?q?Node_=91crash_and_restart=92_ev?=
 =?windows-1252?q?ent_using_GPFS_callback=3F?=
In-Reply-To: <9892d4f382cb4fbe80dbde9f87724632@mbxtoa1.winmail.deshaw.com>
References: <7F7F8D51-256D-4EA7-B03F-11CF7B752AFA@nuance.com>,
	<9892d4f382cb4fbe80dbde9f87724632@mbxtoa1.winmail.deshaw.com>
Message-ID: <063B3F21-8695-4454-8D1A-B1734B1AD436@med.mun.ca>

Could you get away with running ?mmdiag ?stats? and inspecting the uptime information it provides?

Best,
Dwayne
?
Dwayne Hart | Systems Administrator IV

CHIA, Faculty of Medicine
Memorial University of Newfoundland
300 Prince Philip Drive
St. John?s, Newfoundland | A1B 3V6
Craig L Dobbin Building | 4M409
T 709 864 6631

On Jan 30, 2019, at 5:32 PM, Sanchez, Paul <Paul.Sanchez at deshaw.com<mailto:Paul.Sanchez at deshaw.com>> wrote:

There are some cases which I don?t believe can be caught with callbacks (e.g. DMS = Dead Man Switch).  But you could possibly use preStartup to check the host uptime to make an assumption if GPFS was restarted long after the host booted.  You could also peek in /tmp/mmfs and only report if you find something there.  That said, the docs say that preStartup fires after the node joins the cluster.  So if that means once the node is ?active? then you might miss out on nodes stuck in ?arbitrating? for a while due to a waiter problem.

We run a script with cron which monitors the myriad things which can go wrong and attempt to right those which are safe to fix, and raise alerts appropriately.  Something like that, outside the reach of GPFS, is often a good choice if you don?t need to know something the moment it happens.

Thx
Paul

From: gpfsug-discuss-bounces at spectrumscale.org<mailto:gpfsug-discuss-bounces at spectrumscale.org> <gpfsug-discuss-bounces at spectrumscale.org<mailto:gpfsug-discuss-bounces at spectrumscale.org>> On Behalf Of Oesterlin, Robert
Sent: Wednesday, January 30, 2019 3:52 PM
To: gpfsug main discussion list <gpfsug-discuss at spectrumscale.org<mailto:gpfsug-discuss at spectrumscale.org>>
Subject: [gpfsug-discuss] Node ?crash and restart? event using GPFS callback?

Anyone crafted a good way to detect a node ?crash and restart? event using GPFS callbacks? I?m thinking ?preShutdown? but I?m not sure if that?s the best. What I?m really looking for is did the node shutdown (abort) and create a dump in /tmp/mmfs


Bob Oesterlin
Sr Principal Storage Engineer, Nuance

_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org<http://spectrumscale.org>
http://gpfsug.org/mailman/listinfo/gpfsug-discuss
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20190130/0b068d6c/attachment.htm>

From Robert.Oesterlin at nuance.com  Thu Jan 31 01:19:47 2019
From: Robert.Oesterlin at nuance.com (Oesterlin, Robert)
Date: Thu, 31 Jan 2019 01:19:47 +0000
Subject: [gpfsug-discuss] =?utf-8?q?Node_=E2=80=98crash_and_restart?=
 =?utf-8?q?=E2=80=99_event_using_GPFS_callback=3F?=
Message-ID: <554E186D-30BD-4E7D-859C-339F5DDAD442@nuance.com>

Actually, I think ?preShutdown? will do it since it passes the type of shutdown ?abnormal? for a crash to the call back - I can use that to send a Slack message.

mmaddcallback node-abort --event preShutdown --command /usr/local/sbin/callback-test.sh --parms "%eventName %reason"

and you get either:

preShutdown normal
preShutdown abnormal


Bob Oesterlin
Sr Principal Storage Engineer, Nuance


From: <gpfsug-discuss-bounces at spectrumscale.org> on behalf of Marc A Kaplan <makaplan at us.ibm.com>
Reply-To: gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
Date: Wednesday, January 30, 2019 at 3:17 PM
To: gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
Subject: [EXTERNAL] Re: [gpfsug-discuss] Node ?crash and restart? event using GPFS callback?

We have (pre)shutdown and pre(startup) ...
Trap  and record both... If you see a startup without a matching shutdown you know the shutdown never happened, because GPFS crashed.


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20190131/099a6951/attachment.htm>

From alastair.smith at ucl.ac.uk  Wed Jan 30 14:11:08 2019
From: alastair.smith at ucl.ac.uk (Smith, Alastair)
Date: Wed, 30 Jan 2019 14:11:08 +0000
Subject: [gpfsug-discuss] Job opportunity at UCL Research Data Services
Message-ID: <AM4PR0101MB2226078A9AF5BDFC06CD45CAC8900@AM4PR0101MB2226.eurprd01.prod.exchangelabs.com>

Dear List Members,

We would like to draw you attention to a job opportunity at UCL for a Senior Research Data Systems Engineer. The is a technical role in the Research Data Services Group, part of UCL's large and well-established Research IT Services team. The Senior Data Systems Engineer leads the development of technical strategy for Research Data Services at UCL. The successful applicant will ensure that appropriate technologies and workflows are used to address research data management requirements across the institution, particularly those relating to data storage and access.
The Research Data Services Group provides petabyte-scale data storage for active research projects, and is about to launch a long-term data repository service. Over the coming years, the Group will be building an integrated suite of services to support data management from planning to re-use, and the successful candidate will play an important role in the design and operation of these services.
The post comes with a competitive salary and a central London working location.
The closing date for applications it 2nd February.

Further particulars and a link to the application form are available from   https://tinyurl.com/ucljobs-rdse.


-|-|-|-|-|-|-|-|-|-|-|-|-|-

Dr Alastair Smith
Senior research data systems engineer
Research Data Services, RITS
Information Services Division
University College London
1 St Martin's- Le-Grand
London
EC1A 4AS
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20190130/de63aa52/attachment.htm>

From alvise.dorigo at psi.ch  Thu Jan 31 09:48:12 2019
From: alvise.dorigo at psi.ch (Dorigo Alvise (PSI))
Date: Thu, 31 Jan 2019 09:48:12 +0000
Subject: [gpfsug-discuss] Unbalanced pdisk free space
In-Reply-To: <OF6D3E0211.DC0CF7C4-ON85258392.006EB6C4-85258392.00702F0C@notes.na.collabserv.com>
References: <83A6EEB0EC738F459A39439733AE8045267DF159@MBX114.d.ethz.ch>,
	<OF6D3E0211.DC0CF7C4-ON85258392.006EB6C4-85258392.00702F0C@notes.na.collabserv.com>
Message-ID: <83A6EEB0EC738F459A39439733AE8045267E32C0@MBX114.d.ethz.ch>

They're attached.

Thanks!

   Alvise

________________________________
From: gpfsug-discuss-bounces at spectrumscale.org [gpfsug-discuss-bounces at spectrumscale.org] on behalf of IBM Spectrum Scale [scale at us.ibm.com]
Sent: Wednesday, January 30, 2019 9:25 PM
To: gpfsug main discussion list
Subject: Re: [gpfsug-discuss] Unbalanced pdisk free space

Alvise,

Could you send us the output of the following commands from both server nodes.

  *   mmfsadm dump nspdclient > /tmp/dump_nspdclient.<nodeName>
  *   mmfsadm dump pdisk   > /tmp/dump_pdisk.<nodeName>
  *

Regards, The Spectrum Scale (GPFS) team

------------------------------------------------------------------------------------------------------------------
If you feel that your question can benefit other users of  Spectrum Scale (GPFS), then please post it to the public IBM developerWroks Forum at https://www.ibm.com/developerworks/community/forums/html/forum?id=11111111-0000-0000-0000-000000000479.

If your query concerns a potential software error in Spectrum Scale (GPFS) and you have an IBM software maintenance contract please contact  1-800-237-5511 in the United States or your local IBM Service Center in other countries.

The forum is informally monitored as time permits and should not be used for priority messages to the Spectrum Scale (GPFS) team.


From:        "Dorigo Alvise (PSI)" <alvise.dorigo at psi.ch>
To:        "gpfsug-discuss at spectrumscale.org" <gpfsug-discuss at spectrumscale.org>
Date:        01/30/2019 08:24 AM
Subject:        [gpfsug-discuss] Unbalanced pdisk free space
Sent by:        gpfsug-discuss-bounces at spectrumscale.org
________________________________


Hello,
I've a Lenovo Spectrum Scale system DSS-G220 (software dss-g-2.0a) composed of
2x x3560 M5 IO server nodes
1x x3550 M5 client/support node
2x disk enclosures D3284
GPFS/GNR 4.2.3-7

Can anybody tell me if it is normal that all the pdisks of both my recovery groups, residing on the same physical enclosure have free space equal to (more or less) 1/3 of the free space of the pdisks residing on the other physical enclosure (see attached text files for the command line output) ?

I guess when the least free disks are fully occupied (while the others are still partially free) write performance will drop by a factor of two. Correct ?
Is there a way (considering that the system is in production) to fix (rebalance) this free space among all pdisk of both enclosures ?

Should I open a PMR to IBM ?

Many thanks,

   Alvise

[attachment "rg1" deleted by Brian Herr/Poughkeepsie/IBM] [attachment "rg2" deleted by Brian Herr/Poughkeepsie/IBM] _______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20190131/2018fa6a/attachment.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: dump_nspdclient.sf-dssio-1
Type: application/octet-stream
Size: 570473 bytes
Desc: dump_nspdclient.sf-dssio-1
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20190131/2018fa6a/attachment.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: dump_nspdclient.sf-dssio-2
Type: application/octet-stream
Size: 566924 bytes
Desc: dump_nspdclient.sf-dssio-2
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20190131/2018fa6a/attachment-0001.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: dump_pdisk.sf-dssio-1
Type: application/octet-stream
Size: 682312 bytes
Desc: dump_pdisk.sf-dssio-1
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20190131/2018fa6a/attachment-0002.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: dump_pdisk.sf-dssio-2
Type: application/octet-stream
Size: 619497 bytes
Desc: dump_pdisk.sf-dssio-2
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20190131/2018fa6a/attachment-0003.obj>

From heiner.billich at psi.ch  Thu Jan 31 14:56:21 2019
From: heiner.billich at psi.ch (Billich Heinrich Rainer (PSI))
Date: Thu, 31 Jan 2019 14:56:21 +0000
Subject: [gpfsug-discuss] Token manager - how to monitor performance?
Message-ID: <02FE0AE6-BDDC-4E10-9C41-E68EB91758AA@psi.ch>

Hello,
Sorry for coming up with this never-ending story. I know that token management is mainly autoconfigured and even the placement of token manager nodes is no longer under user control in all cases. Still I would like to monitor this component to see if we are close to some limit like memory or rpc rate. Especially as we?ll do some major changes to our setup soon.
I would like to monitor the performance of our token manager nodes to get warned _before_ we get performance issues. Any advice is welcome.
Ideally I would like collect some numbers and pass them on to influxdb or similar. I didn?t find anything in perfmon/zimon that seemed to match. I could imagine that numbers like ?number of active tokens? and ?number of token operations? per manager would be helpful. Or ?# of rpc calls per second?.  And maybe ?number of open files?, ?number of token operations?, ?number of tokens? for clients.  And maybe some percentage of used token memory ? and cache hit ratio ?
This would also help to tune ? like if a client does very many token operations or rpc calls maybe I should increase maxFilesToCache.
 The above is just to illustrate, as token management is complicated the really valuable metrics may be different.
Or am I too anxious and should wait and see instead?
cheers,
Heiner

Heiner Billich
--
Paul Scherrer Institut
Heiner Billich
System Engineer Scientific Computing
Science IT / High Performance Computing
WHGA/106
Forschungsstrasse 111
5232 Villigen PSI
Switzerland

Phone +41 56 310 36 02
heiner.billich at psi.ch<mailto:heiner.billich at psi.ch>
https://www.psi.ch


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20190131/dcfd412b/attachment.htm>

From TOMP at il.ibm.com  Thu Jan 31 15:11:24 2019
From: TOMP at il.ibm.com (Tomer Perry)
Date: Thu, 31 Jan 2019 17:11:24 +0200
Subject: [gpfsug-discuss] Token manager - how to monitor performance?
In-Reply-To: <02FE0AE6-BDDC-4E10-9C41-E68EB91758AA@psi.ch>
References: <02FE0AE6-BDDC-4E10-9C41-E68EB91758AA@psi.ch>
Message-ID: <OF84DDDE38.D7B16626-ONC2258393.00535481-C2258393.00537107@notes.na.collabserv.com>

Hi,

I agree that we should potentially add mode metrics, but for a start, I 
would look into mmdiag --memory and mmdiag --tokenmgr (the latter show 
different output on a token server).


Regards,

Tomer Perry
Scalable I/O Development (Spectrum Scale)
email: tomp at il.ibm.com
1 Azrieli Center, Tel Aviv 67021, Israel
Global Tel:    +1 720 3422758
Israel Tel:      +972 3 9188625
Mobile:         +972 52 2554625


From:   "Billich Heinrich Rainer (PSI)" <heiner.billich at psi.ch>
To:     gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
Date:   31/01/2019 16:56
Subject:        [gpfsug-discuss] Token manager - how to monitor 
performance?
Sent by:        gpfsug-discuss-bounces at spectrumscale.org


Hello,
Sorry for coming up with this never-ending story. I know that token 
management is mainly autoconfigured and even the placement of token 
manager nodes is no longer under user control in all cases. Still I would 
like to monitor this component to see if we are close to some limit like 
memory or rpc rate. Especially as we?ll do some major changes to our setup 
soon.
I would like to monitor the performance of our token manager nodes to get 
warned _before_ we get performance issues. Any advice is welcome. 
Ideally I would like collect some numbers and pass them on to influxdb or 
similar. I didn?t find anything in perfmon/zimon that seemed to match. I 
could imagine that numbers like ?number of active tokens? and ?number of 
token operations? per manager would be helpful. Or ?# of rpc calls per 
second?.  And maybe ?number of open files?, ?number of token operations?, 
?number of tokens? for clients.  And maybe some percentage of used token 
memory ? and cache hit ratio ?
This would also help to tune ? like if a client does very many token 
operations or rpc calls maybe I should increase maxFilesToCache. 
 The above is just to illustrate, as token management is complicated the 
really valuable metrics may be different.
Or am I too anxious and should wait and see instead?
cheers,
Heiner
 
Heiner Billich
--
Paul Scherrer Institut
Heiner Billich 
System Engineer Scientific Computing
Science IT / High Performance Computing 
WHGA/106 
Forschungsstrasse 111
5232 Villigen PSI
Switzerland
 
Phone +41 56 310 36 02
heiner.billich at psi.ch 
https://www.psi.ch
 
 
 _______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=mLPyKeOa1gNDrORvEXBgMw&m=J5n3Wsk1f6CsyL867jkmS3P2BYZDfkPS6GB9dShnYcI&s=YFTWUM3MQu8C1MitRnyPnYQ_wMtjj3Uwmif6gJUoLgc&e=


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20190131/f0650517/attachment.htm>

From Kevin.Buterbaugh at Vanderbilt.Edu  Wed Jan 30 21:15:48 2019
From: Kevin.Buterbaugh at Vanderbilt.Edu (Buterbaugh, Kevin L)
Date: Wed, 30 Jan 2019 21:15:48 +0000
Subject: [gpfsug-discuss] =?utf-8?q?Node_=E2=80=98crash_and_restart?=
 =?utf-8?q?=E2=80=99_event_using_GPFS_callback=3F?=
In-Reply-To: <9892d4f382cb4fbe80dbde9f87724632@mbxtoa1.winmail.deshaw.com>
References: <7F7F8D51-256D-4EA7-B03F-11CF7B752AFA@nuance.com>
	<9892d4f382cb4fbe80dbde9f87724632@mbxtoa1.winmail.deshaw.com>
Message-ID: <CBC3968C-C238-498A-A59C-707FCD124FE8@vanderbilt.edu>

Hi Bob,

We use the nodeLeave callback to detect node expels ? for what you?re wanting to do I wonder if nodeJoin might work??  If a node joins the cluster and then has an uptime of a few minutes you could go looking in /tmp/mmfs.  HTH...

--
Kevin Buterbaugh - Senior System Administrator
Vanderbilt University - Advanced Computing Center for Research and Education
Kevin.Buterbaugh at vanderbilt.edu<mailto:Kevin.Buterbaugh at vanderbilt.edu> - (615)875-9633

On Jan 30, 2019, at 3:02 PM, Sanchez, Paul <Paul.Sanchez at deshaw.com<mailto:Paul.Sanchez at deshaw.com>> wrote:

There are some cases which I don?t believe can be caught with callbacks (e.g. DMS = Dead Man Switch).  But you could possibly use preStartup to check the host uptime to make an assumption if GPFS was restarted long after the host booted.  You could also peek in /tmp/mmfs and only report if you find something there.  That said, the docs say that preStartup fires after the node joins the cluster.  So if that means once the node is ?active? then you might miss out on nodes stuck in ?arbitrating? for a while due to a waiter problem.

We run a script with cron which monitors the myriad things which can go wrong and attempt to right those which are safe to fix, and raise alerts appropriately.  Something like that, outside the reach of GPFS, is often a good choice if you don?t need to know something the moment it happens.

Thx
Paul

From: gpfsug-discuss-bounces at spectrumscale.org<mailto:gpfsug-discuss-bounces at spectrumscale.org> <gpfsug-discuss-bounces at spectrumscale.org<mailto:gpfsug-discuss-bounces at spectrumscale.org>> On Behalf Of Oesterlin, Robert
Sent: Wednesday, January 30, 2019 3:52 PM
To: gpfsug main discussion list <gpfsug-discuss at spectrumscale.org<mailto:gpfsug-discuss at spectrumscale.org>>
Subject: [gpfsug-discuss] Node ?crash and restart? event using GPFS callback?

Anyone crafted a good way to detect a node ?crash and restart? event using GPFS callbacks? I?m thinking ?preShutdown? but I?m not sure if that?s the best. What I?m really looking for is did the node shutdown (abort) and create a dump in /tmp/mmfs


Bob Oesterlin
Sr Principal Storage Engineer, Nuance

_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org<http://spectrumscale.org/>
https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fgpfsug.org%2Fmailman%2Flistinfo%2Fgpfsug-discuss&amp;data=02%7C01%7CKevin.Buterbaugh%40vanderbilt.edu%7Cccd012a939124326a53908d686f64117%7Cba5a7f39e3be4ab3b45067fa80faecad%7C0%7C0%7C636844789557921185&amp;sdata=9bMPd%2F%2B%2Babt6IdeFYcdznPBQwPrMLFsXHTBYISlyYGM%3D&amp;reserved=0

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20190130/365b023a/attachment.htm>

From makaplan at us.ibm.com  Thu Jan 31 15:40:50 2019
From: makaplan at us.ibm.com (Marc A Kaplan)
Date: Thu, 31 Jan 2019 12:40:50 -0300
Subject: [gpfsug-discuss] =?utf-8?q?Node_=E2=80=98crash_and_restart?=
 =?utf-8?q?=E2=80=99_event_using_GPFS_callback=3F?=
In-Reply-To: <CBC3968C-C238-498A-A59C-707FCD124FE8@vanderbilt.edu>
References: <7F7F8D51-256D-4EA7-B03F-11CF7B752AFA@nuance.com><9892d4f382cb4fbe80dbde9f87724632@mbxtoa1.winmail.deshaw.com>
	<CBC3968C-C238-498A-A59C-707FCD124FE8@vanderbilt.edu>
Message-ID: <OF3D8901E5.E0676BE0-ON03258393.0055D32B-03258393.00562179@notes.na.collabserv.com>

Various "leave" / join events may be interesting ... But you've got to 
consider that an abrupt failure of several nodes is not necessarily 
recorded anywhere! For example, because the would be recording devices 
might all lose power at the same time.


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20190131/94556afb/attachment.htm>

From Robert.Oesterlin at nuance.com  Thu Jan 31 15:46:38 2019
From: Robert.Oesterlin at nuance.com (Oesterlin, Robert)
Date: Thu, 31 Jan 2019 15:46:38 +0000
Subject: [gpfsug-discuss] =?utf-8?q?Node_=E2=80=98crash_and_restart?=
 =?utf-8?q?=E2=80=99_event_using_GPFS_callback=3F?=
In-Reply-To: <CBC3968C-C238-498A-A59C-707FCD124FE8@vanderbilt.edu>
References: <7F7F8D51-256D-4EA7-B03F-11CF7B752AFA@nuance.com>
	<9892d4f382cb4fbe80dbde9f87724632@mbxtoa1.winmail.deshaw.com>
	<CBC3968C-C238-498A-A59C-707FCD124FE8@vanderbilt.edu>
Message-ID: <572FF01C-A82D-45FD-AB34-A897BFE59325@nuance.com>

A better way to detect node expels is to install the expelnode into /var/mmfs/etc/ (sample in /usr/lpp/mmfs/samples/expelnode.sample) - put this on your manager nodes. It runs on every expel and you can customize it pretty easily. We generate a Slack message to a specific channel:

GPFS Node Expel nrg1 APP [1:56 AM] nrg1-gpfs01 Expelling node gnj-r05r05u30, other node cnt-r04r08u40


Bob Oesterlin
Sr Principal Storage Engineer, Nuance


From: <gpfsug-discuss-bounces at spectrumscale.org> on behalf of "Buterbaugh, Kevin L" <Kevin.Buterbaugh at Vanderbilt.Edu>
Reply-To: gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
Date: Thursday, January 31, 2019 at 9:19 AM
To: gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
Subject: [EXTERNAL] Re: [gpfsug-discuss] Node ?crash and restart? event using GPFS callback?

Hi Bob,

We use the nodeLeave callback to detect node expels ? for what you?re wanting to do I wonder if nodeJoin might work??  If a node joins the cluster and then has an uptime of a few minutes you could go looking in /tmp/mmfs.  HTH...

--
Kevin Buterbaugh - Senior System Administrator
Vanderbilt University - Advanced Computing Center for Research and Education
Kevin.Buterbaugh at vanderbilt.edu<mailto:Kevin.Buterbaugh at vanderbilt.edu> - (615)875-9633


On Jan 30, 2019, at 3:02 PM, Sanchez, Paul <Paul.Sanchez at deshaw.com<mailto:Paul.Sanchez at deshaw.com>> wrote:

There are some cases which I don?t believe can be caught with callbacks (e.g. DMS = Dead Man Switch).  But you could possibly use preStartup to check the host uptime to make an assumption if GPFS was restarted long after the host booted.  You could also peek in /tmp/mmfs and only report if you find something there.  That said, the docs say that preStartup fires after the node joins the cluster.  So if that means once the node is ?active? then you might miss out on nodes stuck in ?arbitrating? for a while due to a waiter problem.

We run a script with cron which monitors the myriad things which can go wrong and attempt to right those which are safe to fix, and raise alerts appropriately.  Something like that, outside the reach of GPFS, is often a good choice if you don?t need to know something the moment it happens.

Thx
Paul

From: gpfsug-discuss-bounces at spectrumscale.org<mailto:gpfsug-discuss-bounces at spectrumscale.org> <gpfsug-discuss-bounces at spectrumscale.org<mailto:gpfsug-discuss-bounces at spectrumscale.org>> On Behalf Of Oesterlin, Robert
Sent: Wednesday, January 30, 2019 3:52 PM
To: gpfsug main discussion list <gpfsug-discuss at spectrumscale.org<mailto:gpfsug-discuss at spectrumscale.org>>
Subject: [gpfsug-discuss] Node ?crash and restart? event using GPFS callback?

Anyone crafted a good way to detect a node ?crash and restart? event using GPFS callbacks? I?m thinking ?preShutdown? but I?m not sure if that?s the best. What I?m really looking for is did the node shutdown (abort) and create a dump in /tmp/mmfs


Bob Oesterlin
Sr Principal Storage Engineer, Nuance

_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org<https://urldefense.proofpoint.com/v2/url?u=http-3A__spectrumscale.org_&d=DwMGaQ&c=djjh8EKwHtOepW4Bjau0lKhLlu-DxM1dlgP0rrLsOzY&r=LPDewt1Z4o9eKc86MXmhqX-45Cz1yz1ylYELF9olLKU&m=ppdUpGql5rzClFCWb7wAesP1sZuy9scOloPIQsjrVao&s=O81UdWPCUrX00RF0P-UNyLZ-lbTmgIaW-PpK4VrxgHs&e=>
https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fgpfsug.org%2Fmailman%2Flistinfo%2Fgpfsug-discuss&amp;data=02%7C01%7CKevin.Buterbaugh%40vanderbilt.edu%7Cccd012a939124326a53908d686f64117%7Cba5a7f39e3be4ab3b45067fa80faecad%7C0%7C0%7C636844789557921185&amp;sdata=9bMPd%2F%2B%2Babt6IdeFYcdznPBQwPrMLFsXHTBYISlyYGM%3D&amp;reserved=0<https://urldefense.proofpoint.com/v2/url?u=https-3A__na01.safelinks.protection.outlook.com_-3Furl-3Dhttp-253A-252F-252Fgpfsug.org-252Fmailman-252Flistinfo-252Fgpfsug-2Ddiscuss-26amp-3Bdata-3D02-257C01-257CKevin.Buterbaugh-2540vanderbilt.edu-257Cccd012a939124326a53908d686f64117-257Cba5a7f39e3be4ab3b45067fa80faecad-257C0-257C0-257C636844789557921185-26amp-3Bsdata-3D9bMPd-252F-252B-252Babt6IdeFYcdznPBQwPrMLFsXHTBYISlyYGM-253D-26amp-3Breserved-3D0&d=DwMGaQ&c=djjh8EKwHtOepW4Bjau0lKhLlu-DxM1dlgP0rrLsOzY&r=LPDewt1Z4o9eKc86MXmhqX-45Cz1yz1ylYELF9olLKU&m=ppdUpGql5rzClFCWb7wAesP1sZuy9scOloPIQsjrVao&s=ZaQTKkyDzA6XWNjMVXKrblv1I7frC1VIVFQ0Y-I1f8c&e=>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20190131/d1f94d56/attachment.htm>

From chair at spectrumscale.org  Thu Jan 31 20:44:25 2019
From: chair at spectrumscale.org (Simon Thompson (Spectrum Scale User Group Chair))
Date: Thu, 31 Jan 2019 20:44:25 +0000
Subject: [gpfsug-discuss] Call for input & save the date
Message-ID: <213C4D17-C0D2-4883-834F-7E2E00B4EE3F@spectrumscale.org>

Hi All,

 
We?ve just published the main dates for 2019 Spectrum Scale meetings on the user group website at:

https://www.spectrumscaleug.org/

 
Please take a look over the list of events and pencil them in your diary! (some of those later in the year are tentative and there are a couple more that might get added in some other territories).

 
Myself, Kristy, Bob, Chris and Ulf are currently having some discussion on the topics we?d like to have covered in the various user group meetings. If you have any specific topics you?d like to hear about, then please let me know in the next few days? we can?t promise we can get a speaker, but if you don?t let us know we can?t try!

 
As usual, we?ll be looking for user speakers for all of our events. The user group events only work well if we have people talking about their uses of Spectrum Scale, so please think about offering a talk and let us know!

 
Thanks

 
Simon

UK Group Chair

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20190131/9adcae65/attachment.htm>